@Shazwazza

Shannon Deminick's blog all about web development

How to build a search query in Examine

August 13, 2010 05:14
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
Now that Examine is able to be used by a wider audience than Umbraco understanding how search works is possibly a bit more important ;).

So today while answering a question on the Umbraco forum I thought that what I was going on about is something that more people might want to hear about. And really, I do like the sound of my own (virtual) voice…

Understanding Lucene.Net from Examine

For this I’m going to be looking at the Lucene.Net implementation of Examine, and this is agnostic of whether you’re using UmbracoExamine or Examine.LuceneEngine.

To get started you should familiarize yourself a bit with the Lucene Query Parser Syntax, as that’s what we’re using internally to get the data back from Lucene.Net.

Also, we’re going to be working with the Fluent API for Examine, so fell free to read up on that here and here. With Examine we’ve made it easy to see what the Lucene query you’re building up is as you’re working with the Fluent API, in fact if you to a .ToString() call on the ISearchCriteria instance you’ll be able to see what you’re search query looks like (we’ve also got some other information in the result of that method call too).

So you can see what’s been generated, let’s build a query and dissect it:

var criteria = searcher.CreateSearchCriteria(IndexTypes.Content);
criteria = criteria.NodeName("Hello").And().NodeTypeAlias("world").Compile();

Console.WriteLine(criteria.ToString()); //+(+nodeName:Hello +nodeTypeAlias:world) +__IndexType:content

So this is what we've got, a total of three conditions… But wait, there’s only two conditions that I entered right?

Not quite, Examine has some smarts built into it around what you’re searching on and it’ll add that restriction on your behalf. This is so you don’t get results back from a different index type in your query. Note: If you don’t specify the IndexType then this condition wont be added for your.

Also, to ensure that all your entered queries don’t get killed by the IndexType (a problem in earlier Examine builds) we combine everything you enter into a GroupedAnd statement.

Let’s have a look at the parts we did request, and how they are comprised:

+nodeName:Hello +nodeTypeAlias:world

Ok, so what we've got here is our two conditions which we've added using Examine. Each is an AND (or in Lucene terminology SHOULD) and this is denoted by the +. Next we have a field name, in this case either nodeName or nodeTypeAlias. Youl’ll notice back in our Fluent API query we actually generated all of that using the build in methods, rather than having to use more magic strings. Next there is a : to indicate where the field name ends. Lastly we have the term which we’re going to search against, either Hello or world.

So essentially it’s built up of BOOLEAN_OPERATION&FieldName:Term (the & is so it’s slightly readable).

Conclusion

This was a brief look at how the Fluent API for Examine will turn your typed query into a Lucene query that is then searching.

With this knowledge you should be better able to design complex queries, use mixed conditionals and just plain go crazy.

Using Examine to index & search with ANY data source

August 11, 2010 03:38
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.

During CodeGarden 2010 a few people were asking how to use Examine to index and search on data from any data source such as custom database tables, etc… Previously, the only way to do this was to override the Umbraco Examine indexing provider, remove the Umbraco functionality embedded in there, and then do a lot of coding yourself.  …But now there’s some great news! As of now you can use all of the Examine goodness with it’s embedded Lucene.Net with any data source and you can do it VERY easily.

Some things you need to know about the new version:

  1. I haven’t made a release version of this yet as it still needs some more testing, though we are putting this into a production site next week.
  2. If you want to try this, currently you’ll need to get the latest source from Examine @ CodePlex
  3. If you are using a previous version of Examine, there’s a few breaking changes as some of the class structures have been moved, however you config file should still work as is… HOWEVER, you should update your config file to reflect the new one with the new class names
  4. There is now 3 DLLs, not just 2:
    • Examine.DLL
      • Still pretty much the same… contains the abstraction layer
    • Examine.LuceneEngine.DLL
      • The new DLL to use to work with data that is not Umbraco specific
    • UmbracoExamine.DLL
      • The DLL that the Umbraco providers are in

Ok, now on to the good stuff. First, I’ve added a demo project to this post which you can download HERE. This project is a simple console app that contains a sample XML data file that has 5 records in it. Here’s what the app does:

  1. This re-indexes all data
  2. Searches the index for node id 1
  3. Ensures one record is found in the index
  4. Updates the dateUpdated time stamp for the data record
  5. Re-indexes the record with node id 1’

So assuming that you have some custom data like a custom database table, xml file, or whatever, there’s really only 3 things that you need to do to get Examine indexing your custom data:

  1. Create your own ISimpleDataService
    • There is only 1 method to implement: IEnumerable<SimpleDataSet> GetAllData(string indexType)
    • This is the method that Examine will call to re-index your data
    • A SimpleDataSet is a simple object containing a Dictionary<string, string> and a IndexedNode object (which consists of a Node Id and a Node Type)
    • For example, if you had a database row, your SimpleDataSet object for the row would be the dictionary of the rows values, it’s node id and type … easy.
  2. Use the ToExamineXml() extension method to re-index individual nodes/records
    • Examine relies on data being in the same XML structure as Umbraco (which we might change in version 2 sometime in the future… like next year) so we need to transform simple data into the XML structure. We’ve made this quite easy for you; all you have to do is get the data from your custom data source into a Dictionary<string, string> object and use this extension method to pass the xml structure in to Examine’s ReIndexNode method.
    • For example: ExamineManager.Instance.ReIndexNode(dataSet.ToExamineXml(dataSet["Id"], "CustomData"), "CustomData");  where dataSet is a Dictionary<string, string> .
  3. Update your Examine config to use the new SimpleDataIndexer index provider and the new LuceneSearcher search provider

If you’re not using Umbraco at all, then you’ll only need to have the 2 Examine DLLs which don’t reference the Umbraco DLLs whatsoever so everything is decoupled.

I’d recommend downloading the demo app and running it as it will show you everything you need to know on how to get Examine running with custom data. However, i know that people just like to see code in blog posts, so here’s the config for the demo app:

<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="Examine" type="Examine.Config.ExamineSettings, Examine"/> <section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine.LuceneEngine"/> </configSections> <Examine> <ExamineIndexProviders> <providers> <!-- Define the indexer for our custom data. Since we're only indexing one type of data, there's only 1 indexType specified: 'CustomData', however if you have more than one type of index (i.e. Media, Content) then you just need to list them as a comma seperated list without spaces. The dataService is how Examine queries whatever data source you have, in this case it's a custom data service defined in this project. A custom data service only has to implement one method... very easy. --> <add name="CustomIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine.LuceneEngine" dataService="ExamineDemo.CustomDataService, ExamineDemo" indexTypes="CustomData" runAsync="false"/> </providers> </ExamineIndexProviders> <ExamineSearchProviders defaultProvider="CustomSearcher"> <providers> <!-- A search provider that can query a lucene index, no other work is required here --> <add name="CustomSearcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine.LuceneEngine" /> </providers> </ExamineSearchProviders> </Examine> <ExamineLuceneIndexSets> <!-- Create an index set to hold the data for our index --> <IndexSet SetName="CustomIndexSet" IndexPath="App_Data\CustomIndexSet"> <IndexUserFields> <add Name="name" /> <add Name="description" /> <add Name="dateUpdated" /> </IndexUserFields> </IndexSet> </ExamineLuceneIndexSets> </configuration>

Examine slide deck for CodeGarden 2010

June 30, 2010 09:55
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
A few people had asked during CodeGarden 2010 if I would post up the slide deck for my Examine presentation, so here it is. There’s not a heap of information in there since i think people would have soaked up most of the info during the examples and coding demos but it’s posted here regardless and hopefully it helps a few people.

I’ve included a PDF version (link at the bottom) and also the image version below (if you’re too lazy to download it :)

Slide2 Slide3 Slide4 Slide5 Slide6 Slide7 Slide8 Slide9 Slide10 Slide11 Slide12 Slide13 Slide14 Slide15

Download slide deck here

Examine RC2 posted

April 17, 2010 22:08
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
I’ve just released Examine RC2 into the while, you can download it from our CodePlex site.

RC2 fixes a bug in RC1 which wasn’t indexing user fields, only attribute fields.

There’s a few breaking changes with RC2:

  • IQuery.MultipleFields has been removed. Use IQuery.GroupedAnd, IQuery.GroupedOr, IQuery.GroupedNot or IQuery.GroupedFlexible to define how multiple fields are added
  • ISearchCriteria.RawQuery added which allows you to pass a raw query string to the underlying provider
  • ISearcher.Search returns a new interface ISearchResults (which inherits IEnumerable<SearchResult>)
  • New interface ISearchResults which exposes a Skip to support paging and TotalItemCount

 

Will be working on more documentation to explain some of the newly added and obscure features shortly :P.

Examine hits RC1

April 6, 2010 05:04
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
I’m happy to announce that Examine and UmbracoExamine have today hit RC1!FileDownload[1]

The Codeplex site also has more extensive documentation about how to get UmbracoExamine up and running within your Umbraco website.

Go, download your copy today.

Examine’s Fluent Search API – Elevator Pitch

April 1, 2010 05:08
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
I realised that with my blog post about Examine it was fairly in-depth and a lot of people were probably bored before they got to the good bits about how easy searching can be.
So I decided that a smaller, more concise post was in order.

What?

The Fluent Search API is a chainable (like jQuery) API for building complex searches for a data source, in this case Umbraco. It doesn’t require you to know any “search language”, it just works via standard .NET style method calls, with intellisense to help guide you along the way.

How?

This is achieved by combining IQuery methods (search methods) with IBooleanOperation methods (And, Or, Not) to produce something cool. For example:

var query = sc
	.NodeName("umbraco")
	.And()
	.Field("bodyText", "is awesome".Escape())
	.Or()
	.Field("bodyText", "rock".Fuzzy()); 

Examineness can be implemented to do special things to search text, like making it a wild card query, or escaping several terms to have them used as a search sentence.

 

Hopefully this more direct post will engage your attention better and make you want more Examine sexiness.

Examine’s fluent search API

March 26, 2010 06:14
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
As I mentioned in my last blog post we’ve done a lot of work to refactor Examine (and Umbraco Examine) to use a fluent search API rather than a string based search API.

The primary reason for this was to do with how we were handling the string searching and opening up the Lucene.Net search API. In the initial preview version we would take the text which you entered as a search term and then produce a Lucene.Net search against all the fields in your index. This is ok, but it’s not great. The problem came when we wanted to implement a dynamic search query. There were several different search parameters, which were to check against different fields in the index.
It was sort of possible to achieve this, but you needed to understand the internals of Examine and you also needed to understand the Lucene query language, and also that you couldn’t use the AND/ OR/ NOT operators, you had to use +, – or blank.

This is fine if you’re into search API’s, but really, how many people are actually like that? Ok, I must admit that I’m rather smitten with Lucene but I’m not exactly a good example of a normal person..

So I set about addressing this problem, we needed to get a much simpler way in which your average Joe could come and without knowing the underlying technology write complex and useful search queries.
For this we’ve build a set of interfaces which you require:

  • ISearchCriteria
  • IQuery
  • IBooleanOperation

ISearchCriteria

The ISearchCriteria interface is the real workhorse of the API, it’s the first interface you start with, and it’s the last interface you deal with. In fact, ISearchCriteria implements IQuery, meaning that all the query operations start here.

In addition to query operations there are several additional properties for such as the maximum number of results and the type of data being searched.

Because ISearchCriteria is tightly coupled with the BaseSearchProvider implementation it is actually created via a factory pattern, like so:

ISearchCriteria searchCriteria = ExamineManager.Instance.SearchProviderCollection["MySearcher"].CreateSearchCriteria(100, IndexType.Content);

What we’re doing here is requesting that our BaseSearchProvider creates an instance of an ISearchCriteria. It takes two parameters:

  • int maxResults
  • Examine.IndexType indexType

This data can/ should be then used by the search method to return what’s required.

IQuery

The IQuery interface is really the heart of the fluent API, it’s what you use to construct the search for your site. Since Examine is designed to be technology agnostic the methods which are exposed via IQuery are fairly generic. A lot of the concepts are borrowed from Lucene.Net, but they are fairly generic and should be viable for any searcher.

The IQuery API exposes the following methods:

  • IBooleanOperation Id(int id);
  • IBooleanOperation NodeName(string nodeName);
  • IBooleanOperation NodeName(IExamineValue nodeName);
  • IBooleanOperation NodeTypeAlias(string nodeTypeAlias);
  • IBooleanOperation NodeTypeAlias(IExamineValue nodeTypeAlias);
  • IBooleanOperation ParentId(int id);
  • IBooleanOperation Field(string fieldName, string fieldValue);
  • IBooleanOperation Field(string fieldName, IExamineValue fieldValue);
  • IBooleanOperation MultipleFields(IEnumerable<string> fieldNames, string fieldValue);
  • IBooleanOperation MultipleFields(IEnumerable<string> fieldNames, IExamineValue fieldValue);
  • IBooleanOperation Range(string fieldName, DateTime start, DateTime end);
  • IBooleanOperation Range(string fieldName, DateTime start, DateTime end, bool includeLower, bool includeUpper);
  • IBooleanOperation Range(string fieldName, int start, int end);
  • IBooleanOperation Range(string fieldName, int start, int end, bool includeLower, bool includeUpper);
  • IBooleanOperation Range(string fieldName, string start, string end);
  • IBooleanOperation Range(string fieldName, string start, string end, bool includeLower, bool includeUpper);

As you can see all the methods within the IQuery interface return an IBooleanOperator, this is how the fluent API works!

Hopefully it’s fairly obvious what each of the methods are, but the one you’re most likely to use is Field. Field allows you to specify any field in your index, and then provide a word to lookup within that field.

IExamineValue

You’ve probably noticed the IExamineValue parameter which is passable to a lot of the different methods, methods which take a string, but what is IExamineValue?
Well obviously it’s some-what provider dependant, so I’ll talk about it as part of Umbraco Examine, as that’s what I think most initial uptakers will want.

Because Lucene supports several different term modifiers for text we decided it would be great to have those exposed in the API for people to leverage. For this we’ve got a series of string extension methods which reside in the namespace

UmbracoExamine.SearchCriteria

So once you add a using statement for that you’ll have the following extension methods:

  • public static IExamineValue SingleCharacterWildcard(this string s)
  • public static IExamineValue MultipleCharacterWildcard(this string s)
  • public static IExamineValue Fuzzy(this string s)
  • public static IExamineValue Fuzzy(this string s, double fuzzieness)
  • public static IExamineValue Boost(this string s, double boost)
  • public static IExamineValue Proximity(this string s, double proximity)
  • public static IExamineValue Excape(this string s)
  • public static string Then(this IExamineValue vv, string s)

All of these (with the exception of Then) return an IExamineValue (which UmbracoExamine internally handles), and it tells Lucene.Net how to handle the term modifier you required.

I wont repeat what is said within the Lucene documentation, I suggest you read that to get an idea of what to use and when.
The only exceptions are Escape and Then.

Escape

If you’re wanting to search on multiple works together then Lucene requires them to be ‘escaped’, otherwise it’ll (generally) treat the space character as a break in the query. So if you wanted to search for Umbraco Rocks and didn’t escape it you’d match on both Umbraco and Rocks, where as when it’s escaped you’ll then match on the two words in sequence.

Then

The Then method just allows you to combine multiple strings or multiple IExamineValues, so you can boost your fuzzy query with a proximity of 0.1 :P.

IBooleanOpeation

IBooleanOperation allows your to join multiple IQuery methods together using:

  • IQuery And()
  • IQuery Or()
  • IQuery Not()

These are then translated into the underlying searcher so it can determine how to deal with your chaining. At the time of writing we don’t support nested conditionals (grouped OR’s operating like an And).

There’s another method on IBooleanOperation which doesn’t fall into the above, but it’s very critical to the overall idea:

  • ISearchCriteria Compile()

The Compile method will then return an ISearchCriteria which you then pass into your searcher. It’s expected that this is the last method which is called and it’s meant to prepare all search queries for execution.
The reason we’re going with this rather than passing the IQuery into the Searcher is that it means we don’t have to have the max results/ etc into every IQuery instance, it’s not something that is relevant in that scope, so it’d just introduce code smell, and no one wants that.

Bringing it all together

So now you know the basics, how do you go about producing a query?

Well the first thing you need to do is get an instance of your ISearchCriteria:

var sc = ExamineManager.Instance.CreateSearchCriteria();

Now lets do a search for a few things across a few different fields:

var query = sc.NodeName("umbraco").And().Field("bodyText", "is awesome".Escape()).Or().Field("bodyText", "rock".Fuzzy());

Now we’ve got a query across a few different fields, lastly we need to pass it to our searcher:

var results = ExamineManager.Instance.Search(query.Compile());

It’s just that simple!

 

Hopefully the fluent API is clean enough that people can build nice and complex queries and are able to search their websites with not problem. If you’ve got any feedback please leave it here, as we’re working to get an RC out soon.

Examine, but not as you knew it

March 22, 2010 06:07
This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
Almost 12 months ago Shannon blogged about Umbraco Examine a Lucene.NET indexer which works nicely with Umbraco 4.x. Since then we’ve done quite a bit of work on Examine, and as people will may be aware we’ve integrated Examine into the Umbraco core and it will be shipped out of the box with Umbraco 4.1.

Something Shannon and I had discussed a few times was that we wanted to decouple Examine from Umbraco so it could be used for indexing on sites other than Umbraco.
You’ll also notice that I keep referring to it as Examine, not Umbraco Examine which most people are more familiar with.
This is because over the last week we have achieved what we’d wanted to do, we’ve decoupled Examine from Umbraco!

So what’s Examine?

Examine is a provider based, config driven search and indexer framework. Examine provides all the methods required for indexing and searching any data source you want to use.

Examine is now agnostic of the indexer/ searcher API, as well as the data source. That’s right Examine has no references within itself to Umbraco, nor does it have any references to Lucene.NET.
We have still maintained a usage of XML internally for passing the data-to-index around, as it’s the easiest construct which we could think to work with and pass around.

You could implement the Examine framework in any solution, to index any data you want, it could be from a SQL server, or it could be from web-scraped content.

Where does that leave Umbraco Examine?

Umbraco Examine still exists, in fact it’s the primary (and currently only) implementer of Examine. Over the last week though we’ve done a lot of refactoring of Umbraco Examine to work with some changes we’ve done to the underlying Examine API.

Changes? What changes?

Last week anyone who follows me on Twitter will have seen a lot of tweets around Umbraco Examine which was about a new search API and the breaking changes we were implementing.

While looking to refactor the underlying API of a large Umbraco site we have running I found that Examine was actually not properly designed if you wanted to search for data in specific fields, or build complex search queries.

This was a real bugger, I had many different parameters I needed to optionally search on, and only in certain fields, but since Umbraco Examine works with just a raw string this wasn’t possible.

So I set about creating a new fluent search API. This has actually turned out quite well, in fact so well that we new have this as the recommended search method, not raw text (which is still available).

The fluent API is part of the Examine API so it’s also available for any implementation, not just Umbraco! Since we’ve used Lucene.NET as the initial support model the API is designed similarly to what you’d expect from Lucene.NET, but we hope that it’s generic enough to look and feel right for any indexer/ searcher.

Here’s how the fluent API looks:

searchCriteria
.Id(1080)
.Or()
.Field("headerText", "umb".Fuzzy())
.And()
.NodeTypeAlias("cws".MultipleCharacterWildcard())
.Not()
.NodeName("home");

All you have to do is pass that into your searcher. That easy, and that beautiful. I’ll do a blog post where we’ll look more deeply into the fluent API separately.

Additionally we’ve done some other changes, because of what the framework new is we’ve renamed our assemblies and namespaces:

  • Examine.dll
    • This was formally UmbracoExamine.Core.dll
    • Root namespace Examine
    • Contains all the classes and methods to create your own indexer and searcher
  • UmbracoExamine.dll
    • This was formally UmbracoExamine.Providers.dll
    • Root namespace UmbracoExamine.dll
    • Contains all the classes and methods of an Umbraco & Lucene.NET

Apologies to any existing implementations of Umbraco Examine, this will result in breaking changes but since we’ve not hit RC yet too bad :P.

There are also some changes to the config, <IndexUserFields /> has become <IndexStandardFields />, and obviously the config registrations are different with the assembly and namspace changes.

The last change is that we’ve moved to the Ms-PL license for Examine, whos source is available on codeplex.

 

Currently we’re working to tidy up the API and the documentation so that we can get the RC release out shortly, so watch this space.