@Shazwazza

Shannon Deminick's blog all about web development

For wildcard queries in Lucene that you would like to have the results ordered by Score, there’s a trick that you need to do otherwise all of your scores will come back the same. The reason for this is because the default behavior of wildcard queries uses CONSTANT_SCORE_AUTO_REWRITE_DEFAULT which as the name describes is going to give a constant score. The code comments describe why this is the default:

a) Runs faster

b) Does not have the scarcity of terms unduly influence score

c) Avoids any "TooManyBooleanClauses" exceptions

Without fully understanding Lucene that doesn’t really mean a whole lot but the Lucene docs give a little more info

NOTE: if setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) is either CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE or SCORING_BOOLEAN_QUERY_REWRITE, you may encounter a BooleanQuery.TooManyClauses exception during searching, which happens when the number of terms to be searched exceeds BooleanQuery.getMaxClauseCount(). Setting setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) to CONSTANT_SCORE_FILTER_REWRITE prevents this.

The recommended rewrite method is CONSTANT_SCORE_AUTO_REWRITE_DEFAULT: it doesn't spend CPU computing unhelpful scores, and it tries to pick the most performant rewrite method given the query. If you need scoring (like FuzzyQuery, use MultiTermQuery.TopTermsScoringBooleanQueryRewrite which uses a priority queue to only collect competitive terms and not hit this limitation. Note that org.apache.lucene.queryparser.classic.QueryParser produces MultiTermQueries using CONSTANT_SCORE_AUTO_REWRITE_DEFAULT by default.

So the gist is, unless you are ordering by Score this shouldn’t be changed because it will consume more CPU and depending on how many terms you are querying against you might get an exception (though I think that is rare).

So how do you change the default?

That’s super easy, it’s just this line of code:

QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

But there’s a catch! You must set this flag before you parse any queries with the query parser otherwise it won’t work. All this really does is instruct the query parser to apply this scoring method to any MultiTermQuery or FuzzyQuery implementations it creates. So what if you don’t know if this change should be made before you use the query parser? One scenario might be: At the time of using the query parser, you are unsure if the user constructing the query is going to be sorting by score. In this case you want to change the scoring mechanism just before executing the search but after creating your query.

Setting the value lazily

The good news is that you can set this value lazily just before you execute the search even after you’ve used the query parser to create parts of your query. There’s only 1 class type that we need to check for that has this API: MultiTermQuery however not all implementations of it support rewriting so we have to check for that. So given an instance of a Query we can recursively update every query contained within it and manually apply the rewrite method like:

protected void SetScoringBooleanQueryRewriteMethod(Query query)
{
	if (query is MultiTermQuery mtq)
	{
		try
		{
			mtq.SetRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
		}
		catch (NotSupportedException)
		{
			//swallow this, some implementations of MultiTermQuery don't support this like FuzzyQuery
		}
	}
	if (query is BooleanQuery bq)
	{
		foreach (BooleanClause clause in bq.Clauses())
		{
			var q = clause.GetQuery();
			//recurse
			SetScoringBooleanQueryRewriteMethod(q);
		}
	}
}

So you can call this method just before you execute your search and it will still work without having to eagerly use QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); before you use the query parser methods.

Happy searching!

Examine 1.5.1 released

April 5, 2013 19:59

I’ve created a new release of Examine today, version 1.5.1. There’s nothing really new in this release, just a bunch of bug fixes. The other cool thing is that I’ve finally got Examine on Nuget now. The v1.5.1 release page is here on CodePlex with upgrade instructions… which is really just replacing the DLLs.

Its important to note that if you have installed Umbraco 6.0.1+ or 4.11.5+ then you already have Examine 1.5.0  installed (which isn’t an official release on the CodePlex page) which has 8 of these 10 bugs fixed already.

Bugs fixed

Here’s the full list of bugs fixed in this release:

UmbracoExamine

You may already know this but we’ve moved the UmbracoExamine libraries in to the core of Umbraco so that the Umbraco core team can better support the implementation. That means that only the basic Examine libraries will continue to exist @ examine.codeplex.com. The release of 1.5.1 only relates to the base Examine libraries, not the UmbracoExamine libraries, but that’s ok you can still upgrade these base libraries without issue.

Nuget

There’s 2 Examine projects up on Nuget, the basic Examine package and the Azure package if you wish to use Azure directory for your indexes.

Standard package:

PM> Install-Package Examine

Azure package:

PM> Install-Package Examine.Azure

 

Happy searching!

It’s been a long while since Examine got some much needed attention and I’m pleased to say it is now happening. If you didn’t know already, we’ve moved the Umbraco Examine source in to the core of Umbraco. The underlying Examine (Examine.dll) core will remain on CodePlex but all the Umbraco bits and pieces which is found in UmbracoExamine.dll are in the Umbraco core from version 6.1+. This is great news because now we can all better support the implementation of Examine for Umbraco. More good news is that even versions prior to Umbraco 6.1 will have some bugs fixed (http://issues.umbraco.org/issue/U4-1768) ! Niels Kuhnel has also jumped aboard the Examine train and is helping out a ton by adding his amazing ‘facet’ features which will probably make it into an Umbraco release around version 6.2 (maybe 6.1, but still need to do some review, etc… to make sure its 100% backwards compatible).

One other bit of cool news is that we’re adding an official Examine Management dashboard to Umbraco 6.1. In its present state it supports optimizing indexes, rebuilding indexes and searching them. I’ve created a quick video showing its features :)

Examine management dashboard for Umbraco