@Shazwazza

Shannon Deminick's blog all about web development

Paging with Examine

November 3, 2017 10:28

Paging with Lucene and Examine requires some specific API usage. It's very easy to get wrong by using Linq's Skip/Take methods and when doing this you'll inadvertently end up loading in all search results from Lucene and then filtering in memory when what you really want to do is have Lucene only create the minimal search result objects that you are interested in.

There are 2 important parts to this:

  • The Skip method on the ISearchResults object
  • The Search overload on the BaseSearchProvider where you can specify maxResults

ISearchResults.Skip

This is very different from the Linq Skip method so you need to be sure you are using the Skip method on the ISearchResults object. This tells Lucene to skip over a specific number of results without allocating the result objects. If you use Linq’s Skip method on the underlying IEnumerable<SearchResult> of ISearchResults, this will allocate all of the result objects and then filter them in memory which is what you don’t want to do.

Search with maxResults

Lucene isn’t perfect for paging because it doesn’t natively support the Linq equivalent to “Skip/Take”. It understands Skip (as above) but doesn’t understand Take, instead it only knows how to limit the max results so that it doesn’t allocate every result, most of which you would probably not need when paging.

With the combination of ISearchResult.Skip and maxResults, we can tell Lucene to:

  • Skip over a certain number of results without allocating them and tell Lucene
  • only allocate a certain number of results after skipping

Show me the code

//for example purposes, we want to show page #4 (which is pageIndex of 3)
var pageIndex = 3;   
//for this example, the page size is 10 items
var pageSize = 10;
var searchResult = searchProvider.Search(criteria, 
   //don't return more results than we need for the paging
   //this is the 'trick' - we need to load enough search results to fill
   //all pages from 1 to the current page of 4
   maxResults: pageSize*(pageIndex + 1));
//then we use the Skip method to tell Lucene to not allocate search results
//for the first 3 pages
var pagedResults = searchResult.Skip(pageIndex*pageSize);
var totalResults = searchResult.TotalItemCount;

So that is the correct way to do paging with Examine and Lucene which ensures max performance and minimal object allocations.

For wildcard queries in Lucene that you would like to have the results ordered by Score, there’s a trick that you need to do otherwise all of your scores will come back the same. The reason for this is because the default behavior of wildcard queries uses CONSTANT_SCORE_AUTO_REWRITE_DEFAULT which as the name describes is going to give a constant score. The code comments describe why this is the default:

a) Runs faster

b) Does not have the scarcity of terms unduly influence score

c) Avoids any "TooManyBooleanClauses" exceptions

Without fully understanding Lucene that doesn’t really mean a whole lot but the Lucene docs give a little more info

NOTE: if setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) is either CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE or SCORING_BOOLEAN_QUERY_REWRITE, you may encounter a BooleanQuery.TooManyClauses exception during searching, which happens when the number of terms to be searched exceeds BooleanQuery.getMaxClauseCount(). Setting setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) to CONSTANT_SCORE_FILTER_REWRITE prevents this.

The recommended rewrite method is CONSTANT_SCORE_AUTO_REWRITE_DEFAULT: it doesn't spend CPU computing unhelpful scores, and it tries to pick the most performant rewrite method given the query. If you need scoring (like FuzzyQuery, use MultiTermQuery.TopTermsScoringBooleanQueryRewrite which uses a priority queue to only collect competitive terms and not hit this limitation. Note that org.apache.lucene.queryparser.classic.QueryParser produces MultiTermQueries using CONSTANT_SCORE_AUTO_REWRITE_DEFAULT by default.

So the gist is, unless you are ordering by Score this shouldn’t be changed because it will consume more CPU and depending on how many terms you are querying against you might get an exception (though I think that is rare).

So how do you change the default?

That’s super easy, it’s just this line of code:

QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

But there’s a catch! You must set this flag before you parse any queries with the query parser otherwise it won’t work. All this really does is instruct the query parser to apply this scoring method to any MultiTermQuery or FuzzyQuery implementations it creates. So what if you don’t know if this change should be made before you use the query parser? One scenario might be: At the time of using the query parser, you are unsure if the user constructing the query is going to be sorting by score. In this case you want to change the scoring mechanism just before executing the search but after creating your query.

Setting the value lazily

The good news is that you can set this value lazily just before you execute the search even after you’ve used the query parser to create parts of your query. There’s only 1 class type that we need to check for that has this API: MultiTermQuery however not all implementations of it support rewriting so we have to check for that. So given an instance of a Query we can recursively update every query contained within it and manually apply the rewrite method like:

protected void SetScoringBooleanQueryRewriteMethod(Query query)
{
	if (query is MultiTermQuery mtq)
	{
		try
		{
			mtq.SetRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
		}
		catch (NotSupportedException)
		{
			//swallow this, some implementations of MultiTermQuery don't support this like FuzzyQuery
		}
	}
	if (query is BooleanQuery bq)
	{
		foreach (BooleanClause clause in bq.Clauses())
		{
			var q = clause.GetQuery();
			//recurse
			SetScoringBooleanQueryRewriteMethod(q);
		}
	}
}

So you can call this method just before you execute your search and it will still work without having to eagerly use QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); before you use the query parser methods.

Happy searching!

Examine 1.5.1 released

April 5, 2013 19:59

I’ve created a new release of Examine today, version 1.5.1. There’s nothing really new in this release, just a bunch of bug fixes. The other cool thing is that I’ve finally got Examine on Nuget now. The v1.5.1 release page is here on CodePlex with upgrade instructions… which is really just replacing the DLLs.

Its important to note that if you have installed Umbraco 6.0.1+ or 4.11.5+ then you already have Examine 1.5.0  installed (which isn’t an official release on the CodePlex page) which has 8 of these 10 bugs fixed already.

Bugs fixed

Here’s the full list of bugs fixed in this release:

UmbracoExamine

You may already know this but we’ve moved the UmbracoExamine libraries in to the core of Umbraco so that the Umbraco core team can better support the implementation. That means that only the basic Examine libraries will continue to exist @ examine.codeplex.com. The release of 1.5.1 only relates to the base Examine libraries, not the UmbracoExamine libraries, but that’s ok you can still upgrade these base libraries without issue.

Nuget

There’s 2 Examine projects up on Nuget, the basic Examine package and the Azure package if you wish to use Azure directory for your indexes.

Standard package:

PM> Install-Package Examine

Azure package:

PM> Install-Package Examine.Azure

 

Happy searching!

It’s been a long while since Examine got some much needed attention and I’m pleased to say it is now happening. If you didn’t know already, we’ve moved the Umbraco Examine source in to the core of Umbraco. The underlying Examine (Examine.dll) core will remain on CodePlex but all the Umbraco bits and pieces which is found in UmbracoExamine.dll are in the Umbraco core from version 6.1+. This is great news because now we can all better support the implementation of Examine for Umbraco. More good news is that even versions prior to Umbraco 6.1 will have some bugs fixed (http://issues.umbraco.org/issue/U4-1768) ! Niels Kuhnel has also jumped aboard the Examine train and is helping out a ton by adding his amazing ‘facet’ features which will probably make it into an Umbraco release around version 6.2 (maybe 6.1, but still need to do some review, etc… to make sure its 100% backwards compatible).

One other bit of cool news is that we’re adding an official Examine Management dashboard to Umbraco 6.1. In its present state it supports optimizing indexes, rebuilding indexes and searching them. I’ve created a quick video showing its features :)

Examine management dashboard for Umbraco