Can I disable Examine indexes on Umbraco front-end servers?

Can I disable Examine indexes on Umbraco front-end servers?

In Umbraco v8, Examine and Lucene are only used for the back office searches, unless you specifically use those APIs for your front-end pages. I recently had a request to know if it’s possible to disable Examine/Lucene for front-end servers since they didn’t use Examine/Lucene APIs at all on their front-end pages… here’s the answer

Why would you want this?

If you are running a Load Balancing setup in Azure App Service then you have the option to scale out (and perhaps you do!). In this case, you need to have the Examine configuration option of:

<add key="Umbraco.Examine.LuceneDirectoryFactory" 
          value="Examine.LuceneEngine.Directories.TempEnvDirectoryFactory, Examine" />

This is because each scaled out worker is running from the same network share file system. Without this setting (or with the SyncTempEnvDirectoryFactory setting) it would mean that each worker will be trying to write Lucene file based indexes to the same location which will result in corrupt indexes and locked files. Using the TempEnvDirectoryFactory means that the indexes will only be stored on the worker's local 'fast drive' which is in it's %temp% folder on the local (non-network share) hard disk.

When a site is moved or provisioned on a new worker the local %temp% location will be empty so Lucene indexes will be rebuilt on startup for that worker. This will occur when Azure moves a site or when a new worker comes online from a scale out action. When indexes are rebuilt, the worker will query the database for the data and depending on how much data you have in your Umbraco installation, this could take a few minutes which can be problematic. Why? Because Umbraco v8 uses distributed SQL locks to ensure data integrity and during these queries a content lock will be created which means other back office operations on content will need to wait. This can end up with SQL Lock timeout issues. An important thing to realize is that these rebuild queries will occur for all new workers, so if you scaled out from 1 to 10, that is 9 new workers coming online at the same time.

How to avoid this problem?

If you use Examine APIs on your front-end, then you cannot just disable Examine/Lucene so the only reasonable solution is to use an Examine implementation that uses a hosted search service like ExamineX

If you don't use Examine APIs on your front-ends then it is a reasonable solution to disable Examine/Lucene on the front-ends to avoid this issue. To do that, you would change the default Umbraco indexes to use an in-memory only store and prohibit data from being put into the indexes. Then disable the queries that execute when Umbraco tries to re-populate the indexes.

Show me the code

First thing is to replace the default index factory. This new one will change the underlying Lucene directory for each index to be a RAMDirectory and will also disable the default Umbraco event handling that populates the indexes. This means Umbraco will not try to update the index based on content, media or member changes.

public class InMemoryExamineIndexFactory : UmbracoIndexesCreator
{
    public InMemoryExamineIndexFactory(
        IProfilingLogger profilingLogger,
        ILocalizationService languageService,
        IPublicAccessService publicAccessService,
        IMemberService memberService,
        IUmbracoIndexConfig umbracoIndexConfig)
        : base(profilingLogger, languageService, publicAccessService, memberService, umbracoIndexConfig)
    {
    }

    public override IEnumerable<IIndex> Create()
    {
        return new[]
        {
            CreateInternalIndex(),
            CreateExternalIndex(),
            CreateMemberIndex()
        };
    }

    // all of the below is the same as Umbraco defaults, except
    // we are using an in-memory Lucene directory.

    private IIndex CreateInternalIndex()
        => new UmbracoContentIndex(
            Constants.UmbracoIndexes.InternalIndexName,
            new RandomIdRamDirectory(), // in-memory dir
            new UmbracoFieldDefinitionCollection(),
            new CultureInvariantWhitespaceAnalyzer(),
            ProfilingLogger,
            LanguageService,
            UmbracoIndexConfig.GetContentValueSetValidator())
        {
            EnableDefaultEventHandler = false
        };

    private IIndex CreateExternalIndex()
        => new UmbracoContentIndex(
            Constants.UmbracoIndexes.ExternalIndexName,
            new RandomIdRamDirectory(), // in-memory dir
            new UmbracoFieldDefinitionCollection(),
            new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30),
            ProfilingLogger,
            LanguageService,
            UmbracoIndexConfig.GetPublishedContentValueSetValidator())
        {
            EnableDefaultEventHandler = false
        };

    private IIndex CreateMemberIndex()
        => new UmbracoMemberIndex(
            Constants.UmbracoIndexes.MembersIndexName,
            new UmbracoFieldDefinitionCollection(),
            new RandomIdRamDirectory(), // in-memory dir
            new CultureInvariantWhitespaceAnalyzer(),
            ProfilingLogger,
            UmbracoIndexConfig.GetMemberValueSetValidator())
        {
            EnableDefaultEventHandler = false
        };

    // required so that each ram dir has a different ID
    private class RandomIdRamDirectory : RAMDirectory
    {
        private readonly string _lockId = Guid.NewGuid().ToString();
        public override string GetLockId()
        {
            return _lockId;
        }
    }
}

The next thing to do is to create no-op index populators to replace the Umbraco default ones. All these do is ensure they are not associated with any index and then just to be sure, does not execute any logic for population.

public class DisabledMemberIndexPopulator : MemberIndexPopulator
{
    public DisabledMemberIndexPopulator(
        IMemberService memberService,
        IValueSetBuilder<IMember> valueSetBuilder)
        : base(memberService, valueSetBuilder)
    {
    }

    public override bool IsRegistered(IIndex index) => false;
    public override bool IsRegistered(IUmbracoMemberIndex index) => false;
    protected override void PopulateIndexes(IReadOnlyList<IIndex> indexes) { }
}

public class DisabledContentIndexPopulator : ContentIndexPopulator
{
    public DisabledContentIndexPopulator(
        IContentService contentService,
        ISqlContext sqlContext,
        IContentValueSetBuilder contentValueSetBuilder)
        : base(contentService, sqlContext, contentValueSetBuilder)
    {
    }

    public override bool IsRegistered(IIndex index) => false;
    public override bool IsRegistered(IUmbracoContentIndex2 index) => false;
    protected override void PopulateIndexes(IReadOnlyList<IIndex> indexes) { }
}

public class DisabledPublishedContentIndexPopulator : PublishedContentIndexPopulator
{
    public DisabledPublishedContentIndexPopulator(
        IContentService contentService,
        ISqlContext sqlContext,
        IPublishedContentValueSetBuilder contentValueSetBuilder)
        : base(contentService, sqlContext, contentValueSetBuilder)
    {
    }

    public override bool IsRegistered(IIndex index) => false;
    public override bool IsRegistered(IUmbracoContentIndex2 index) => false;
    protected override void PopulateIndexes(IReadOnlyList<IIndex> indexes) { }
}

public class DisabledMediaIndexPopulator : MediaIndexPopulator
{
    public DisabledMediaIndexPopulator(
        IMediaService mediaService,
        IValueSetBuilder<IMedia> mediaValueSetBuilder) : base(mediaService, mediaValueSetBuilder)
    {
    }

    public override bool IsRegistered(IIndex index) => false;
    public override bool IsRegistered(IUmbracoContentIndex index) => false;
    protected override void PopulateIndexes(IReadOnlyList<IIndex> indexes) { }
}

Lastly, we just need to enable these services:

public class DisabledExamineComposer : IUserComposer
{
    public void Compose(Composition composition)
    {
        // replace the default
        composition.RegisterUnique<IUmbracoIndexesCreator, InMemoryExamineIndexFactory>();

        // replace the default populators
        composition.Register<MemberIndexPopulator, DisabledMemberIndexPopulator>(Lifetime.Singleton);
        composition.Register<ContentIndexPopulator, DisabledContentIndexPopulator>(Lifetime.Singleton);
        composition.Register<PublishedContentIndexPopulator, DisabledPublishedContentIndexPopulator>(Lifetime.Singleton);
        composition.Register<MediaIndexPopulator, DisabledMediaIndexPopulator>(Lifetime.Singleton);
    }
}

With that all in place, it means that no data will ever be looked up to rebuild indexes and Umbraco will not send data to be indexed. There is nothing here preventing data from being indexed though. For example, if you use the Examine APIs to update the index directly, that data will be indexed in memory. If you wanted to absolutely make sure no data ever went into the index, you would have to override some methods on the RAMDirectory.

Can I run Examine with RAMDirectory with data?

You might have realized with the above that if you don't replace the populators, you will essentially have Examine indexes in Umbraco running from RAMDirectory. Is this ok? Yes absolutely, but that entirely depends on your data set. If you have a large index, that means it will consume large amounts of memory which is typically not a good idea. But if you have a small data set, or you filter the index so that it remains small enough, then yes! You can certainly run Examine with an in-memory directory but this would still only be advised on your front-end/replica servers.

Author

Shannon Thompson

I'm a Senior Software Engineer working full time at Microsoft. Previously, I was working at Umbraco HQ for about 10 years. I maintain several open source projects (many related to Umbraco) such as Articulate, Examine and Smidge, and I also have a commercial software offering called ExamineX. Welcome to my blog :)

comments powered by Disqus