Examine’s fluent search API

This post was imported from FARMCode.org which has been discontinued. These posts now exist here as an archive. They may contain broken links and images.
As I mentioned in my last blog post we’ve done a lot of work to refactor Examine (and Umbraco Examine) to use a fluent search API rather than a string based search API.

The primary reason for this was to do with how we were handling the string searching and opening up the Lucene.Net search API. In the initial preview version we would take the text which you entered as a search term and then produce a Lucene.Net search against all the fields in your index. This is ok, but it’s not great. The problem came when we wanted to implement a dynamic search query. There were several different search parameters, which were to check against different fields in the index.
It was sort of possible to achieve this, but you needed to understand the internals of Examine and you also needed to understand the Lucene query language, and also that you couldn’t use the AND/ OR/ NOT operators, you had to use +, – or blank.

This is fine if you’re into search API’s, but really, how many people are actually like that? Ok, I must admit that I’m rather smitten with Lucene but I’m not exactly a good example of a normal person..

So I set about addressing this problem, we needed to get a much simpler way in which your average Joe could come and without knowing the underlying technology write complex and useful search queries.
For this we’ve build a set of interfaces which you require:

  • ISearchCriteria
  • IQuery
  • IBooleanOperation

ISearchCriteria

The ISearchCriteria interface is the real workhorse of the API, it’s the first interface you start with, and it’s the last interface you deal with. In fact, ISearchCriteria implements IQuery, meaning that all the query operations start here.

In addition to query operations there are several additional properties for such as the maximum number of results and the type of data being searched.

Because ISearchCriteria is tightly coupled with the BaseSearchProvider implementation it is actually created via a factory pattern, like so:

ISearchCriteria searchCriteria = ExamineManager.Instance.SearchProviderCollection["MySearcher"].CreateSearchCriteria(100, IndexType.Content);

What we’re doing here is requesting that our BaseSearchProvider creates an instance of an ISearchCriteria. It takes two parameters:

  • int maxResults
  • Examine.IndexType indexType

This data can/ should be then used by the search method to return what’s required.

IQuery

The IQuery interface is really the heart of the fluent API, it’s what you use to construct the search for your site. Since Examine is designed to be technology agnostic the methods which are exposed via IQuery are fairly generic. A lot of the concepts are borrowed from Lucene.Net, but they are fairly generic and should be viable for any searcher.

The IQuery API exposes the following methods:

  • IBooleanOperation Id(int id);
  • IBooleanOperation NodeName(string nodeName);
  • IBooleanOperation NodeName(IExamineValue nodeName);
  • IBooleanOperation NodeTypeAlias(string nodeTypeAlias);
  • IBooleanOperation NodeTypeAlias(IExamineValue nodeTypeAlias);
  • IBooleanOperation ParentId(int id);
  • IBooleanOperation Field(string fieldName, string fieldValue);
  • IBooleanOperation Field(string fieldName, IExamineValue fieldValue);
  • IBooleanOperation MultipleFields(IEnumerable<string> fieldNames, string fieldValue);
  • IBooleanOperation MultipleFields(IEnumerable<string> fieldNames, IExamineValue fieldValue);
  • IBooleanOperation Range(string fieldName, DateTime start, DateTime end);
  • IBooleanOperation Range(string fieldName, DateTime start, DateTime end, bool includeLower, bool includeUpper);
  • IBooleanOperation Range(string fieldName, int start, int end);
  • IBooleanOperation Range(string fieldName, int start, int end, bool includeLower, bool includeUpper);
  • IBooleanOperation Range(string fieldName, string start, string end);
  • IBooleanOperation Range(string fieldName, string start, string end, bool includeLower, bool includeUpper);

As you can see all the methods within the IQuery interface return an IBooleanOperator, this is how the fluent API works!

Hopefully it’s fairly obvious what each of the methods are, but the one you’re most likely to use is Field. Field allows you to specify any field in your index, and then provide a word to lookup within that field.

IExamineValue

You’ve probably noticed the IExamineValue parameter which is passable to a lot of the different methods, methods which take a string, but what is IExamineValue?
Well obviously it’s some-what provider dependant, so I’ll talk about it as part of Umbraco Examine, as that’s what I think most initial uptakers will want.

Because Lucene supports several different term modifiers for text we decided it would be great to have those exposed in the API for people to leverage. For this we’ve got a series of string extension methods which reside in the namespace

UmbracoExamine.SearchCriteria

So once you add a using statement for that you’ll have the following extension methods:

  • public static IExamineValue SingleCharacterWildcard(this string s)
  • public static IExamineValue MultipleCharacterWildcard(this string s)
  • public static IExamineValue Fuzzy(this string s)
  • public static IExamineValue Fuzzy(this string s, double fuzzieness)
  • public static IExamineValue Boost(this string s, double boost)
  • public static IExamineValue Proximity(this string s, double proximity)
  • public static IExamineValue Excape(this string s)
  • public static string Then(this IExamineValue vv, string s)

All of these (with the exception of Then) return an IExamineValue (which UmbracoExamine internally handles), and it tells Lucene.Net how to handle the term modifier you required.

I wont repeat what is said within the Lucene documentation, I suggest you read that to get an idea of what to use and when.
The only exceptions are Escape and Then.

Escape

If you’re wanting to search on multiple works together then Lucene requires them to be ‘escaped’, otherwise it’ll (generally) treat the space character as a break in the query. So if you wanted to search for Umbraco Rocks and didn’t escape it you’d match on both Umbraco and Rocks, where as when it’s escaped you’ll then match on the two words in sequence.

Then

The Then method just allows you to combine multiple strings or multiple IExamineValues, so you can boost your fuzzy query with a proximity of 0.1 :P.

IBooleanOpeation

IBooleanOperation allows your to join multiple IQuery methods together using:

  • IQuery And()
  • IQuery Or()
  • IQuery Not()

These are then translated into the underlying searcher so it can determine how to deal with your chaining. At the time of writing we don’t support nested conditionals (grouped OR’s operating like an And).

There’s another method on IBooleanOperation which doesn’t fall into the above, but it’s very critical to the overall idea:

  • ISearchCriteria Compile()

The Compile method will then return an ISearchCriteria which you then pass into your searcher. It’s expected that this is the last method which is called and it’s meant to prepare all search queries for execution.
The reason we’re going with this rather than passing the IQuery into the Searcher is that it means we don’t have to have the max results/ etc into every IQuery instance, it’s not something that is relevant in that scope, so it’d just introduce code smell, and no one wants that.

Bringing it all together

So now you know the basics, how do you go about producing a query?

Well the first thing you need to do is get an instance of your ISearchCriteria:

var sc = ExamineManager.Instance.CreateSearchCriteria();

Now lets do a search for a few things across a few different fields:

var query = sc.NodeName("umbraco").And().Field("bodyText", "is awesome".Escape()).Or().Field("bodyText", "rock".Fuzzy());

Now we’ve got a query across a few different fields, lastly we need to pass it to our searcher:

var results = ExamineManager.Instance.Search(query.Compile());

It’s just that simple!

 

Hopefully the fluent API is clean enough that people can build nice and complex queries and are able to search their websites with not problem. If you’ve got any feedback please leave it here, as we’re working to get an RC out soon.

Author

Administrator (1)

comments powered by Disqus