Writing a DocFx markdown plugin

Writing a DocFx markdown plugin

What is DocFx? It’s a static site generator mainly used for creating API documentation for your code. But it can be used for any static sites. We use this for the Lucene.Net project’s website and documentation. The end result is API docs that look and feel a little bit familiar, kind of like Microsoft’s own API documentation website. I’m not entirely sure if their docs are built with DocFx but I suspect it is but with some highly customized builds and plugins … but that’s just my own assumption.

Speaking of customizing DocFx, it is certainly possible. That said the ironic part about DocFx is that it’s own documentation is not great. One of the markdown customizations we needed for the Lucene.Net project was to add a customized note that some APIs are experimental. This tag is based on the converted Java Lucene docs and looks like: “@ lucene.experimental ”. So we wanted to detect that string and convert it to a nice looking note similar to the DocFx markdown note. Luckily there is some docs on how to do that although they’re not at all succinct but the example pretty much covers exactly what we wanted to do.

Block markdown token

This example is a block level token since it exists on it’s own line and not within other text. This is also the example DocFx provides in it’s docs. It’s relatively easy to do:

  • Register a IDfmEngineCustomizer to insert/add a “Block Rule”
  • Create a new “Block Rule” which in it’s simplistic form is a regex that parses the current text block and if it matches it returns an instance of a custom “Token” class
  • Create a custom “Token” class to store the information about what you’ve parsed
  • Create a custom “Renderer” to write out actual HTML result you want
  • Register a IDfmCustomizedRendererPartProvider to expose your “Renderer”

This all uses MEF to wire everything up. You can see the Lucene.Net implementation of a custom markdown block token here: https://github.com/apache/lucenenet/tree/master/src/docs/LuceneDocsPlugins

Inline markdown token

The above was ‘easy’ because it’s more or less following the DocFx documentation example. So the next challenge is that I wanted to be able to render an Environment Variable value within the markdown… sounds easy enough? Well the code result is actually super simple but my journey to get there was absolutely not!

There’s zero documentation about customizing the markdown engine for inline markdown and there’s almost zero documentation in the codebase about what is going on too which makes things a little interesting. I tried following the same steps above for the block markdown token and realized in the code that it’s using a MarkdownBlockContext instance so I discovered there’s a MarkdownInlineContext so thought, we’ll just swap that out … but that doesn’t work. I tried inserting my inline rule at the beginning, end, middle, etc… of the DfmEngineBuilder.InlineInlineRules within my IDfmEngineCustomizer but nothing seemed to happen. Hrm. So I cloned the DocFx repo and started diving into the tests and breaking pointing, etc…

So here’s what I discovered:

  • Depending on the token and if a token can contain other tokens, its the tokens responsibility to recurse the parsing
  • There’s a sort of ‘catch all’ rule called MarkdownTextInlineRule and that will ‘eat’ characters that don’t match the very specific markdown chars that it’s not looking for.
    • This means that if you have an inline token that is delimited by chars that this doesn’t ‘eat’, then your rule will not match. So your rule can only begin with certain chars: \<!\[*`
  • Your rule must run before this one
  • For inline rules you don’t need a “Renderer” (i.e. IDfmCustomizedRendererPartProvider)
  • Inline rule regex needs to match at the beginning of the string with the hat ^ symbol. This is a pretty critical part of how DocFx parses it’s inline content.

Now that I know that, making this extension is super simple:

  • I’ll make a Markdown token: [EnvVar:MyEnvironmentVar] which will parse to just render the value of the environment variable with that name, in this example: MyEnvironmentVariable.
  • I’ll insert my rule to the top of the list so it doesn’t come after the catch-all rule
// customize the engine
public class LuceneDfmEngineCustomizer : IDfmEngineCustomizer
    public void Customize(DfmEngineBuilder builder, IReadOnlyDictionary<string, object> parameters)
        // insert inline rule at the top
        builder.InlineRules = builder.InlineRules.Insert(0, new EnvironmentVariableInlineRule());

// define the rule
public class EnvironmentVariableInlineRule : IMarkdownRule
    // give it a name
    public string Name => "EnvVarToken";

    // define my regex to match
    private static readonly Regex _envVarRegex = new Regex(@"^\[EnvVar:(\w+?)\]", RegexOptions.Compiled);

    // process the match
    public IMarkdownToken TryMatch(IMarkdownParser parser, IMarkdownParsingContext context)
        var match = _envVarRegex.Match(context.CurrentMarkdown);
        if (match.Length == 0) return null;

        var envVar = match.Groups[1].Value;
        var text = Environment.GetEnvironmentVariable(envVar);
        if (text == null) return null;

        // 'eat' the characters of the current markdown token so they aren't re-processed
        var sourceInfo = context.Consume(match.Length);

        // return a docfx token that just returns the text passed to it
        return new MarkdownTextToken(this, parser.Context, text, sourceInfo);

In the end, that’s actually pretty simple! But don’t go trying to create a fancy token that doesn’t start with those magic characters since it’s not going to work.


Shannon Thompson

I'm a Senior Software Engineer working full time at Microsoft. Previously, I was working at Umbraco HQ for about 10 years. I maintain several open source projects (many related to Umbraco) such as Articulate, Examine and Smidge, and I also have a commercial software offering called ExamineX. Welcome to my blog :)

comments powered by Disqus