@Shazwazza

Shannon Deminick's blog all about web development

Examine 1.5.1 released

April 5, 2013 19:59

I’ve created a new release of Examine today, version 1.5.1. There’s nothing really new in this release, just a bunch of bug fixes. The other cool thing is that I’ve finally got Examine on Nuget now. The v1.5.1 release page is here on CodePlex with upgrade instructions… which is really just replacing the DLLs.

Its important to note that if you have installed Umbraco 6.0.1+ or 4.11.5+ then you already have Examine 1.5.0  installed (which isn’t an official release on the CodePlex page) which has 8 of these 10 bugs fixed already.

Bugs fixed

Here’s the full list of bugs fixed in this release:

UmbracoExamine

You may already know this but we’ve moved the UmbracoExamine libraries in to the core of Umbraco so that the Umbraco core team can better support the implementation. That means that only the basic Examine libraries will continue to exist @ examine.codeplex.com. The release of 1.5.1 only relates to the base Examine libraries, not the UmbracoExamine libraries, but that’s ok you can still upgrade these base libraries without issue.

Nuget

There’s 2 Examine projects up on Nuget, the basic Examine package and the Azure package if you wish to use Azure directory for your indexes.

Standard package:

PM> Install-Package Examine

Azure package:

PM> Install-Package Examine.Azure

 

Happy searching!

New Examine updates and features for Umbraco

March 6, 2013 00:42

It’s been a long while since Examine got some much needed attention and I’m pleased to say it is now happening. If you didn’t know already, we’ve moved the Umbraco Examine source in to the core of Umbraco. The underlying Examine (Examine.dll) core will remain on CodePlex but all the Umbraco bits and pieces which is found in UmbracoExamine.dll are in the Umbraco core from version 6.1+. This is great news because now we can all better support the implementation of Examine for Umbraco. More good news is that even versions prior to Umbraco 6.1 will have some bugs fixed (http://issues.umbraco.org/issue/U4-1768) ! Niels Kuhnel has also jumped aboard the Examine train and is helping out a ton by adding his amazing ‘facet’ features which will probably make it into an Umbraco release around version 6.2 (maybe 6.1, but still need to do some review, etc… to make sure its 100% backwards compatible).

One other bit of cool news is that we’re adding an official Examine Management dashboard to Umbraco 6.1. In its present state it supports optimizing indexes, rebuilding indexes and searching them. I’ve created a quick video showing its features :)

Examine management dashboard for Umbraco

Ultra fast media performance in Umbraco

April 25, 2011 02:24

There’s a few different ways to query Umbraco for media: using the new Media(int) API , using the umbraco.library.GetMedia(int, false) API or querying for media with Examine. I suppose there’s quite a few people out there that don’t use Examine yet and therefore don’t know that all of the media information is actually stored there too! The problem with the first 2 methods listed above is that they make database queries, the 2nd method is slightly better because it has built in caching, but the Examine method is by far the fastest and most efficient.

The following table shows you the different caveats that each option has:

new Media(int)

library.GetMedia(int,false)

Examine

Makes DB calls

yes

yes

no

Caches result

no

yes

no

Real time data

yes

yes

no

You might note that Examine doesn’t cache the result whereas the GetMedia call does, but don’t let this fool you because the Examine searcher that returns the result will be nearly as fast as ‘In cache’ data but won’t require the additional memory that the GetMedia cache does. The other thing to note is that Examine doesn’t have real time data. This means that if an administrator creates/saves a new media item it won’t show up in the Examine index instantaneously, instead it may take up to a minute to be ingested into the index. Lastly, its obvious that the new Media(int) API isn’t a very good way of accessing Umbraco media because it makes a few database calls per media item and also doesn’t cache the result.

Examine would be the ideal way to access your media if it was real time, so instead, we’ll combine the efforts of Examine and library.GetMedia(int,false) APIs. First will check if Examine has the data, and if not, revert to the GetMedia API. This method will do this for us and return a new object called MediaValues which simply contains a Name and Values property:

First here’s the usage of the new API below:

var media = MediaHelper.GetUmbracoMedia(1234); var mediaFile = media["umbracoFile"];

That’s a pretty easy way to access media. Now, here’s the code to make it work:

public static MediaValues GetUmbracoMedia(int id) { //first check in Examine as this is WAY faster var criteria = ExamineManager.Instance .SearchProviderCollection["InternalSearcher"] .CreateSearchCriteria("media"); var filter = criteria.Id(id); var results = ExamineManager .Instance.SearchProviderCollection["InternalSearcher"] .Search(filter.Compile()); if (results.Any()) { return new MediaValues(results.First()); } var media = umbraco.library.GetMedia(id, false); if (media != null && media.Current != null) { media.MoveNext(); return new MediaValues(media.Current); } return null; }

 

The MediaValues class definition:

public class MediaValues { public MediaValues(XPathNavigator xpath) { if (xpath == null) throw new ArgumentNullException("xpath"); Name = xpath.GetAttribute("nodeName", ""); Values = new Dictionary<string, string>(); var result = xpath.SelectChildren(XPathNodeType.Element); while(result.MoveNext()) { if (result.Current != null && !result.Current.HasAttributes) { Values.Add(result.Current.Name, result.Current.Value); } } } public MediaValues(SearchResult result) { if (result == null) throw new ArgumentNullException("result"); Name = result.Fields["nodeName"]; Values = result.Fields; } public string Name { get; private set; } public IDictionary<string, string> Values { get; private set; } }

That’s it! Now you have the benefits of Examine’s ultra fast data access and real-time data in case it hasn’t made it into Examine’s index yet.

Searching Umbraco using Razor and Examine

March 15, 2011 21:51
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
Since Razor is really just c# it’s super simple to run a search in Umbraco using Razor and Examine.  In MVC the actual searching should be left up to the controller to give the search results to your view, but in Umbraco 4.6 + , Razor is used as macros which actually ‘do stuff’. Here’s how incredibly simple it is to do a search:
@using Examine; @* Get the search term from query string *@ @{var searchTerm = Request.QueryString["search"];} <ul class="search-results"> @foreach (var result in ExamineManager.Instance.Search(searchTerm, true)) { <li> <span>@result.Score</span> <a href="@umbraco.library.NiceUrl(result.Id)"> @result.Fields["nodeName"] </a> </li> } </ul>

That’s it! Pretty darn easy.

And for all you sceptics who think there’s too much configuration involved to setup Examine, configuring Examine requires 3 lines of code. Yes its true, 3 lines, that’s it. Here’s the bare minimum setup:

1. Create an indexer under the ExamineIndexProviders section:

<add name="MyIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>

2. Create a searcher under the ExamineSearchProviders section:

<add name="MySearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>

3. Create an index set under the ExamineLuceneIndexSets config section:

<IndexSet SetName="MyIndexSet" IndexPath="~/App_Data/TEMP/MyIndex" />

This will index all of your data in Umbraco and allow you to search against all of it. If you want to search on specific subsets, you can use the FluentAPI to search and of course if you want to modify your index, there’s much more you can do with the config if you like.

With Examine the sky is the limit, you can have an incredibly simple index and search mechanism up to an incredibly complex index with event handlers, etc… and a very complex search with fuzzy logic, proximity searches, etc…  And no matter what flavour you choose it is guaranteed to be VERY fast and doesn’t matter how much data you’re searching against.

I also STRONGLY advise you to use the latest release on CodePlex: http://examine.codeplex.com/releases/view/50781 . There will also be version 1.1 coming out very soon.

Enjoy!

Examine output indexing

November 2, 2010 07:39
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
Last week Pete Gregory (@pgregorynz) and I were discussing different implementations of Examine. Particularly when you need to use Examine events to collate information from different nodes to put into the index for the page being rendered. An example of this is an FAQ engine where you might have an Umbraco content structure such as:
  • Site Container
    • Public
      • FAQs
        • FAQ Item 1
        • FAQ Item 2
        • FAQ Item 3

In this example, the page that is rendered to the end user is FAQs but the data from all 4 nodes (FAQs, FAQ Item 1 –> 4) needs to be added to the index for the FAQs page. To do this you can use Examine events, either using the GatheringNodeData of the BaseIndexProvider, or by using the DocumentWriting event of the UmbracoContentIndexer (I’ll write another post covering the difference between these two events and why they both exist). Though writing Examine event handlers to put the data from FAQ Item 1 –> 4 into the FAQs index isn’t very difficult, it would still be really cool if all of this could be done automatically.

Pete mentioned it would be cool if we could just index the output html of a page (sort of like Google) and suddenly the ideas started to flow. This concept is actually quite easy to do so within the next month or so we’ll probably release a beta of Examine Output Indexing. Here’s the way it’ll all get put together:

  • An HttpModule will be created to do 2 things:
    • Check if the current request is an Umbraco page request
      • If it is, we can easily get the current node being rendered since it’s already been added to the HttpContext items by Umbraco
      • Use the standard Examine handlers to enter the node’s data into the indexes based on the configuration you’ve specified in your Examine configuration files
    • Get the HTML output of the page before it is rendered to the end user, parse the html to get the relevant data and put it into the index for the current Umbraco page
  • We figured that it would also be cool to have an Examine node property that developers could defined called something like: examineNoIndex which we could check for when we determine that it’s an Umbraco page and if this property is set to true, we’ll not index this page.
    • This could give developers more control over what specific pages shouldn’t be indexed based directly from the CMS properties instead of writing custom events

With the above, a developer will simply need to put the HttpModule in their web.config, define an Examine index based on a new provider we create and that’s it. There will be no need to manually collate node data such as the above FAQ example. However, please note that this will work for straight forward searching so if you have complex searching & indexing requirements, I would still recommend using events since you have far more control over what information is indexed.

Any feedback is much appreciated since we haven’t started developing this quite yet.

Examine v1.0 RTM

October 22, 2010 21:46
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
We finally released Examine version 1.0 a week or so ago. You can find the latest download package from the CodePlex downloads page for Examine: http://examine.codeplex.com/releases/view/50781 

Here’s what you’ll need to know

  • There are some breaking changes from the version that is shipped with Umbraco 4.5 and also from the Examine RC3 release. The downloads tab on CodePlex contains the Release Notes for download which contains all of the information on upgrading & breaking changes
  • READ THE RELEASE NOTES BEFORE UPGRADING
  • There’s a ton of bugs fixed in this release from the version shipped with Umbraco 4.5
  • Lots of new features have been added:
    • Indexing ANY type of data easily using the LuceneEngine index/search providers
    • PDF Indexing for Umbraco
    • XSLT extensions for Umbraco
    • Data Type declarations for indexed fields
    • Date & Number range searching
  • New documentation has been added to CodePlex

Using v1.0 RTM on Umbraco 4.5

The upgrade process from the Examine version shipped with 4.5 to v1.0 RTM should be pretty seamless (unless you are using some specific API calls as noted in the release notes). However, once you drop in the new DLLs you’ll probably notice that the internal search no longer works. This is due to a bug in the Umbraco 4.5. codebase and an non-optimal implementation of Examine which has to do with case sensitivity for application aliases (i.e. Content vs content ). The work-around is simple though: all we need to do is change the Analyzer used for the internal searcher in the Examine configuration file to use the StandardAnalyzer instead of the WhitespaceAnalyzer. This is because the WhitespaceAnalyzer is case sensitive whereas the StandardAnalyzer is not. This issue is fixed in Umbraco Juno (4.6) and will continue to use the WhitespaceAnalyzer so that Examine doesn’t tokenize strings that contain punctuation. For more info on Analyzers, have a look at Aaron’s post.

Next Versions

There probably won’t be too many more changes coming for Examine v1.0 apart from any bug fixing that needs to be done and maybe some tweaks to the Fluent API. We will start working on v2.0 at some point this year or early next year which will take Examine to the next level. It will be less focused on configuration, have a smaller foot print and be much more configurable through code (such as how ASP.Net MVC works).

Searching Multi-Node Tree Picker data (or any collection) with Examine

September 23, 2010 04:10
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
With the release of uComponents recently a lot of people are starting to work with a new data type called the MultiNodeTreePicker, and with this I’ve seen a few questions around searching the data it generates using Examine.

The problem is there is a catch, if you’re using the CSV storage (which you must if you’re working with Examine) you’ll hit a problem, the Examine index will have something like this:

1011,1231,1232,1225

But how do you search on that? Searching for ‘1231’ will not return anything, because it’s prefixed with ‘,’ and postfixed with ‘,’. So this brings a problem, how do you search?

Bring on Events

As Shannon spoke about at CodeGarden 10 Examine has a number of different events you can hook into to do different things (slides and code) and this is what we’re going to need to work with.

I’ve touched on events before but this time we’re going to look at a different event, we’re going to look at the GatheringNodeData event.

GatheringNodeData event

So this event in Examine is fired while Examine is scraping the data out of an XML element which it has received. This XML could be from Umbraco (in the scenario we’re looking at here) or it could be from your own data source, and the event is raised once Examine as turned the XML into a Key/ Value representation of it.

The event that raises has custom event arguments, which has a property called Fields. This Fields property is a dictionary which contains the full Key/ Value representation of the data which will end up in Examine!

Now this dictionary is able to be manipulated, so you can add/ remove data as you see if (but that’s a topic for another blog), it also means you can change the data!

Changing the data for our needs

As I mentioned at the start of this we end up with comma-separated string from the datatype which isn’t useful for searching, so we can use an event handler to change what we’ve got. First we need to tie an event handler

public class ExamineEvents : ApplicationBase 
{
	public ExamineEvents() 
	{
		var indexer = ExamineManager.Instance.IndexProviderCollection["MyIndexer"];
		indexer.GatheringNodeData += new EventHandler(GatheringNodeDataHandler);
	}

	void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
	{
		//do stuff here
	}
}

So this is just a simple wire-up, using the ApplicationBase class in Umbraco so that it’ll be created on application start-up. Next we need to implement the event handler:

void GatheringNodeDataHandler(object sender, IndexingNodeDataEventArgs e)
{
	//grab the current data from the Fields collection
	var mntp = e.Fields["TreePicker"];
	//let's get rid of those commas!
	mntp = mntp.Replace(",", " ");
	//now put it back into the Fields so we can pretend nothing happened!
	e.Fields["TreePicker"] = mntp;
}

And you’re done! Now the data will be written into the index with spaces rather than commas meaning that you can search on each ID without the need for wildcards or any other “hacks” to get it to work.

Note: This will work in the majority of cases, the only reason it’ll fail is if you’re using an analyzer that strips out numbers before indexing. For more information about Lucene analyzers take a look at this article: http://www.aaron-powell.com/lucene-analyzer

Text casing and Examine

August 24, 2010 08:41
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
A few times I’ve seen questions posted on the Umbraco forums which ask how to deal with case insensitivity text with Examine, and it’s also something that we’ve had to handle a few times within our own company.

Here’s a scenario:

  • You have a site search
  • You use examine
  • You want to show the results looking exactly the same as it was before it went into Examine

If you’re running a standard install you’ll notice that the content always ends up lowercased!

This is a bit of a problem, page titles will be lowercase, body content will be lowercase, etc. Part of this will be due to a mistake in Examine, part of it is due to the design of Lucene.

In this article I’ll have a look at what you need to do to make it work as you’d expect.

First, some background

Before we dive directly into what to do to fix it you really should understand what is happening. If you don’t care feel free to skip over this bit though :P.

Searching is a tricky thing, and when searching the statement Examine == examine = false; To get around this searching is best done in a case insensitive manner. To make this work Examine did a forced lowercase of the content before it was pushed into Lucene.Net. This was to ensure that everything was exactly the same when it was searched against.
In hindsight this is not really a great idea, it really should be the responsibility of the Lucene Analyzer to handle this for you.

Many of the common Lucene.Net analyzers actually do automatic lowercasing of content, these analysers are:

  • StandardAnalyzer
  • StopAnalyzer
  • SimpleAnalyzer

So if you’re using the standard Examine config you’ll find yourself using the StandardAnalyzer and still have your content lowercased.

This means that there’s no need to Lucene to concern itself about case sensitivity when searching, everything is parsed by the analyzer (field terms and queries) and you’ll get more matches.

So how do I get around this?

Now that we’ve seen why all your content is generally lower case, how can we work with it in the original format and display it back to the UI?

Well we need some way in which we can have the field data stored without the analyzer screwing around with it.

Note: This doesn’t need to be done if you’re using an analyzer which doesn’t have a LowerCaseTokenizer or LowercaseFilter. If you’re using a different analyzer, like KeywordAnalyzer then this post wont cover what you’re after (since the KeywordAnalyzer isn’t lowercasing, you’re actually using an out-dated version of Examine, I recommend you grab the latest release :)). More information on Analyzers can be found at http://www.aaron-powell.com/lucene-analyzer

Luckily we’ve got some hooks into Examine to allow us to do what we need here, it’s in the form of an event on the Examine.LuceneEngine.Providers.LuceneIndexer, called DocumentWriting. Note that this event is on the LuceneIndexer, not the BaseIndexProvider. This event is Lucene.Net specific and not logical on the base class which is agnostic of any other framework.

What we can do with this event is interact directly with Lucene.Net while Examine is working with it.
You’ll need to have a bit of an understanding of how to work with a Lucene.Net Document (and for that I’d recommend having a read of this article from me: http://www.aaron-powell.com/documents-in-lucene-net), cuz what you’re able to do is play with Lucene.Net… Feel the power!

So we can attach the event handler the same way as you would do any other event in Umbraco, using an Action Handler:

public class UmbracoEvents : ApplicationBase
{
	public UmbracoEvents()
        {
            var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["DefaultIndexer"];

            indexer.DocumentWriting +=new System.EventHandler(indexer_DocumentWriting);
        }
}

To do this we’ve got to cast the indexer so we’ve got the Lucene version to work with, then we’re attaching to our event handler. Let’s have a look at the event handler

void indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
	//grab out lucene document from the event arguments
	var doc = e.Document;

	//the e.Fields dictionary is all the fields which are about to be inserted into Lucene.Net
	//we'll grab out the "bodyContent" one, if there is one to be indexed
	if(e.Fields.ContainsKey("bodyContent")) 
	{
		string content = e.Fields["bodyContent"];
		//Give the field a name which you'll be able to easily remember
		//also, we're telling Lucene to just put this data in, nothing more
		doc.Add(new Field("__bodyContent", content, Field.Store.YES, Field.Index.NOT_ANALYZED));
	}
}

And that’s how you can push data in. I’d recommend that you do a conditional check to ensure that the property you’re looking for does exist in the Fields property of the event args, unless you’re 100% sure that it appears on all the objects which you’re indexing.

Lastly we need to display that on the UI, well it’s easy, rather accessing the bodyContent property of the SearchResults, use the __bodyContent and you’ll get your unanalyzed version.

Conclusion

Here we’ve looked at how we can use the Examine events to interact with the Lucene.Net Document. We’ve decided that we want to push in unanalyzed text, but you could use this idea to really tweak your Lucene.Net document. But really playing with the Document is not recommended unless you *really* know what you’re doing ;).

Examine RC3 Released

August 19, 2010 11:30
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
Hopefully this will be a quick RC! I’m really hoping to release v1.0 RTM by early next week (latest). If you are able to help out with some testing it would be amazing!!

Here's what's new:

  • PDF Indexing
  • Easily implement custom data indexing outside of Umbraco
  • More XSLT Extensions for Umbraco
  • Some framework refactoring so a new DLL: Examine.LuceneEngine.dll which contains all of the Lucene.Net implementation
    • Because of this refactoring, if you've built your own providers, you may need to update our code to work, otherwise it is backwards compatible for most people.
  • More unit tests
  • More documentation

Get it while it’s hot! And don’t forget to read the release notes.

DOWNLOAD FROM CODEPLEX HERE

Paging with Examine

August 19, 2010 06:12
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
I’ve been asked this question a few times, how do you implement pagination in your Examine search results.

Well a fun-fact is that Lucene doesn’t have really good way to do this, it just involves skipping to the point that you want to get the results from.

Thanks to .NET and LINQ we have extension methods which handles that nicely, Skip and Take, and since that the search results are IEnumerable underneath they can be used. But there’s one problem, doing this will result in a bit of a performance hit, as your initial Skip would hydrate a bunch of entities from the underlying Lucene store, which is where you loose performance.

So we implemented our own version of Skip! So you can just use Skip and Take as standard, and they’ll be evaluated without loosing performance.

Here’s the code:

int pageNumber = string.IsNullOrEmpty(Request.QueryString["pageNum"]) ? 0 : int.Parse(Request.QueryString["pageNum"]);
int pageSize = 10; //this could be a config value or something

/* serach snipped */

var results = searcher.Search(...);

//And we'll just set a repeater with the results
Repeater.DataSource = results.Skip(pageNumber * pageSize).Take(pageSize);
Repeater.DataBind();

It's just that easy.