How to lazily set the multi-term rewrite method on queries in Lucene

How to lazily set the multi-term rewrite method on queries in Lucene

For wildcard queries in Lucene that you would like to have the results ordered by Score, there’s a trick that you need to do otherwise all of your scores will come back the same. The reason for this is because the default behavior of wildcard queries uses CONSTANT_SCORE_AUTO_REWRITE_DEFAULT which as the name describes is going to give a constant score. The code comments describe why this is the default:

a) Runs faster

b) Does not have the scarcity of terms unduly influence score

c) Avoids any "TooManyBooleanClauses" exceptions

Without fully understanding Lucene that doesn’t really mean a whole lot but the Lucene docs give a little more info

NOTE: if setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) is either CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE or SCORING_BOOLEAN_QUERY_REWRITE, you may encounter a BooleanQuery.TooManyClauses exception during searching, which happens when the number of terms to be searched exceeds BooleanQuery.getMaxClauseCount(). Setting setRewriteMethod(org.apache.lucene.search.MultiTermQuery.RewriteMethod) to CONSTANT_SCORE_FILTER_REWRITE prevents this.

The recommended rewrite method is CONSTANT_SCORE_AUTO_REWRITE_DEFAULT: it doesn't spend CPU computing unhelpful scores, and it tries to pick the most performant rewrite method given the query. If you need scoring (like FuzzyQuery, use MultiTermQuery.TopTermsScoringBooleanQueryRewrite which uses a priority queue to only collect competitive terms and not hit this limitation. Note that org.apache.lucene.queryparser.classic.QueryParser produces MultiTermQueries using CONSTANT_SCORE_AUTO_REWRITE_DEFAULT by default.

So the gist is, unless you are ordering by Score this shouldn’t be changed because it will consume more CPU and depending on how many terms you are querying against you might get an exception (though I think that is rare).

So how do you change the default?

That’s super easy, it’s just this line of code:

QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

But there’s a catch! You must set this flag before you parse any queries with the query parser otherwise it won’t work. All this really does is instruct the query parser to apply this scoring method to any MultiTermQuery or FuzzyQuery implementations it creates. So what if you don’t know if this change should be made before you use the query parser? One scenario might be: At the time of using the query parser, you are unsure if the user constructing the query is going to be sorting by score. In this case you want to change the scoring mechanism just before executing the search but after creating your query.

Setting the value lazily

The good news is that you can set this value lazily just before you execute the search even after you’ve used the query parser to create parts of your query. There’s only 1 class type that we need to check for that has this API: MultiTermQuery however not all implementations of it support rewriting so we have to check for that. So given an instance of a Query we can recursively update every query contained within it and manually apply the rewrite method like:

protected void SetScoringBooleanQueryRewriteMethod(Query query)
{
	if (query is MultiTermQuery mtq)
	{
		try
		{
			mtq.SetRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
		}
		catch (NotSupportedException)
		{
			//swallow this, some implementations of MultiTermQuery don't support this like FuzzyQuery
		}
	}
	if (query is BooleanQuery bq)
	{
		foreach (BooleanClause clause in bq.Clauses())
		{
			var q = clause.GetQuery();
			//recurse
			SetScoringBooleanQueryRewriteMethod(q);
		}
	}
}

So you can call this method just before you execute your search and it will still work without having to eagerly use QueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); before you use the query parser methods.

Happy searching!

Author

Shannon Thompson

I'm a Senior Software Engineer working full time at Microsoft. Previously, I was working at Umbraco HQ for about 10 years. I maintain several open source projects (many related to Umbraco) such as Articulate, Examine and Smidge, and I also have a commercial software offering called ExamineX. Welcome to my blog :)

comments powered by Disqus