Spatial Search with Examine and Lucene

Spatial Search with Examine and Lucene

I was asked about how to do Spatial search with Examine recently which sparked my interest on how that should be done so here’s how it goes…

Examine’s default implementation is Lucene so by default whatever you can do in Lucene you can achieve in Examine by exposing the underlying Lucene bits. If you want to jump straight to code, I’ve created a couple of unit tests in the Examine project.

Source code as documentation

Lucene.Net and Lucene (Java) are more or less the same. There’s a few API and naming conventions differences but at the end of the day Lucene.Net is just a .NET port of Lucene. So pretty much any of the documentation you’ll find for Lucene will work with Lucene.Net just with a bit of tweaking. Same goes for code snippets in the source code and Lucene and Lucene.Net have tons of examples of how to do things. In fact for Spatial search there’s a specific test example for that.

So we ‘just’ need to take that example and go with it.

Firstly we’ll need the Lucene.Net.Contrib package:

Install-Package Lucene.Net.Contrib -Version 3.0.3

Indexing

The indexing part doesn't really need to do anything out of the ordinary from what you would normally do. You just need to get either latitude/longitude or x/y (numerical) values into your index. This can be done directly using a ValueSet when you index and having your field types set as numeric or it could be done with the DocumentWriting event which gives you direct access to the underlying Lucene document. 

Strategies

For this example I’m just going to stick with simple Geo Spatial searching with simple x/y coordinates. There’s different “stategies” and you can configure these to handle different types of spatial search when it’s not just as simple as an x/y distance calculation. I was shown an example of a Spatial search that used the “PointVectorStrategy” but after looking into that it seems like this is a semi deprecated strategy and even one of it’s methods says: “//TODO this is basically old code that hasn't been verified well and should probably be removed” and then I found an SO article stating that “RecursivePrefixTreeStrategy” was what should be used instead anyways and as it turns out that’s exactly what the java example uses too.

If you need some more advanced Spatial searching then I’d suggest researching some of the strategies available, reading the docs  and looking at the source examples. There’s unit tests for pretty much everything in Lucene and Lucene.Net.

Get the underlying Lucene Searcher instance

If you need to do some interesting Lucene things with Examine you need to gain access to the underlying Lucene bits. Namely you’d normally only need access to the IndexWriter which you can get from LuceneIndex.GetIndexWriter() and the Lucene Searcher which you can get from LuceneSearcher.GetSearcher().

// Get an index from the IExamineManager
if (!examineMgr.TryGetIndex("MyIndex", out var index))
    throw new InvalidOperationException("No index found with name MyIndex");
            
// We are expecting this to be a LuceneIndex
if (!(index is LuceneIndex luceneIndex))
    throw new InvalidOperationException("Index MyIndex is not a LuceneIndex");

// If you wanted a LuceneWriter, here's how:
//var luceneWriter = luceneIndex.GetIndexWriter();

// Need to cast in order to expose the Lucene bits
var searcher = (LuceneSearcher)luceneIndex.GetSearcher();

// Get the underlying Lucene Searcher instance
var luceneSearcher = searcher.GetLuceneSearcher();

Do the search

Important! Latitude/Longitude != X/Y

The Lucene GEO Spatial APIs take an X/Y coordinates, not latitude/longitude and a common mistake is to just use them in place but that’s incorrect and they are actually opposite so be sure you swap tham. Latitude = Y, Longitude = X. Here’s a simple function to swap them:

private void GetXYFromCoords(double lat, double lng, out double x, out double y)
{
    // change to x/y coords, longitude = x, latitude = y
    x = lng;
    y = lat;
}

Now that we have the underlying Lucene Searcher instance we can search however we want:

// Create the Geo Spatial lucene objects
SpatialContext ctx = SpatialContext.GEO;
int maxLevels = 11; //results in sub-meter precision for geohash
SpatialPrefixTree grid = new GeohashPrefixTree(ctx, maxLevels);
RecursivePrefixTreeStrategy strategy = new RecursivePrefixTreeStrategy(grid, GeoLocationFieldName);

// lat/lng of Sydney Australia
var latitudeSydney = -33.8688;
var longitudeSydney = 151.2093;
            
// search within 100 KM
var searchRadiusInKm = 100;

// convert to X/Y
GetXYFromCoords(latitudeSydney, longitudeSydney, out var x, out var y);

// Make a circle around the search point
var args = new SpatialArgs(
    SpatialOperation.Intersects,
    ctx.MakeCircle(x, y, DistanceUtils.Dist2Degrees(searchRadiusInKm, DistanceUtils.EARTH_MEAN_RADIUS_KM)));

// Create the Lucene Filter
var filter = strategy.MakeFilter(args);

// Create the Lucene Query
var query = strategy.MakeQuery(args);

// sort on ID
Sort idSort = new Sort(new SortField(LuceneIndex.ItemIdFieldName, SortField.INT));
TopDocs docs = luceneSearcher.Search(query, filter, MaxResultDocs, idSort);

// iterate raw lucene results
foreach(var doc in docs.ScoreDocs)
{
    // TODO: Do something with result
}

Filter vs Query?

The above code creates both a Filter and a Query that is being used to get the results but the SpatialExample just uses a “MatchAllDocsQuery” instead of what is done above. Both return the same results so what is happening with “strategy.MakeQuery”? It’s creating a ConstantScoreQuery which means that the resulting document “Score” will be empty/same for all results. That’s really all this does so it’s optional but really when searching on only locations with no other data Score doesn’t make a ton of sense. It is possible however to mix Spatial search filters with real queries.

Next steps

You’ll see above that the ordering is by Id but probably in a lot of cases you’ll want to sort by distance. There’s examples of this in the Lucene SpatialExample linked above and there’s a reference to that in this SO article too, the only problem is those examples are for later Lucene versions than the current Lucene.Net 3.x. But if there’s a will there’s a way and I’m sure with some Googling, code researching and testing you’ll be able to figure it out :)

The Examine docs pages need a little love and should probably include this info. The docs pages are just built in Jekyll and located in the /docs folder of the Examine repository. I would love any help with Examine’s docs if you’ve got a bit of time :)

As far as Examine goes though, there’s actually custom search method called “LuceneQuery” on the “LuceneSearchQueryBase” which is the object created when creating normal Examine queries with CreateQuery(). Using this method you can pass in a native Lucene Query instance like the one created above and it will manage all of the searching/paging/sorting/results/etc… for you so you don’t have to do some of the above manual work. However there is currently no method allowing a native Lucene Filter instances to be passed in like the one created above. Once that’s in place then some of the lucene APIs above wont be needed and this can be a bit nicer. Then it’s probably worthwhile adding another Nuget project like Examine.Extensions which can contain methods and functionality for this stuff, or maybe the community can do something like that just like Callum has done for Examine Facets.  What do you think?

Author

Shannon Thompson

I'm a Senior Software Engineer working full time at Microsoft. Previously, I was working at Umbraco HQ for about 10 years. I maintain several open source projects (many related to Umbraco) such as Articulate, Examine and Smidge, and I also have a commercial software offering called ExamineX. Welcome to my blog :)