Good enough syntax highlight for MDX in Neovim using Treesitter - Phelipe Teles

Good enough syntax highlight for MDX in Neovim using Treesitter

3 min.
Source code

In this blog post, I’m gonna show how to get a good enough syntax highlighting for MDX files in Neovim.

We can get pretty far by just adding some queries to our configuration, using the standard Treesitter parser for markdown (CommonMark) — no need to write a new parser, which is more challenging. Let’s see how.

Supporting .mdx files

First, let’s configure nvim to use the mdx filetype for every file with the .mdx extension:

~/.config/nvim/filetype.lua
vim.filetype.add({  extension = {    mdx = 'mdx'  }})

Then, let’s configure the nvim-treesitter plugin to use the markdown parser for mdx filetypes:

lua
local ft_to_parser = require("nvim-treesitter.parsers").filetype_to_parsernameft_to_parser.mdx = "markdown"

Now we have markdown highlighting in our mdx files, but of cours it still doesn’t highlight any JS/JSX we wrote:

After configuring Treesitter to use Markdown highlight for MDX files
There is no syntax highlight of MDX files in Neovim out of the box

Injecting JS/JSX into markdown

To highlight the JS/JSX part, we’ll use Treesitter’s language injection.

We do this by writing a Treesitter query to select the nodes in the tree we want to highlight using another parser. It’s similar to CSS selectors — but it has completely different syntax, a Scheme-like language.

How can we write JavaScript/JSX in MDX?

We can write JavaScript inside curly braces, like this:

MDX
The current year is {new Date().getFullYear()}

Lines beginning with with import/export are also interpreted as JavaScript:

MDX
import {getCurrentYear} from './date.js'export const currentYear = getCurrentYear() # The current year is {currentYear}

We can write JSX anywhere, similar to how you can write HTML in Markdown.

MDX
import {Box} from 'design-system' <Box>  Here's some text</Box>

And even inline, inside of a heading:

MDX
export const Thing = () => <>World</> # Hello <Thing />

Writing Treesitter queries

Let’s first write a query to select those nodes that start with import or export. We can get a nice representation of the abstract syntax tree built by Treesitter with :InspectTree command, making it easy to see where the text we want to select is in the tree: section -> paragraph -> inline:

The simplest query that would match that node is:

scheme
(section (paragraph (inline)))

But that’s not what we want — we need to select only the nodes that start with import or export. We can use Treesitter query predicates (see :h treesitter-predicates) to conditionally select nodes — for instance, the #match predicate will allow us to select nodes with text that matches a Vim regexp.

scheme
((inline) @_inline (#match? @_inline "^\s*\(import\|export\)"))

Using Vim regexp is expensive though, so it’d be preferred to use Lua patterns in this case:

scheme
((inline) @_inline (#lua-match? @_inline "^%s*import"))((inline) @_inline (#lua-match? @_inline "^%s*export"))

Extending Treesitter injection queries

Now that we know this query correctly matches what we expect, we need to highlight them as JavaScript.

For this, we’ll have to write these queries in a file called injections.scm in a queries directory in our runtime path (see Adding Queries).

Then we need to capture the node with a name that matches the injected language parser’s name, in our case it’s tsx:

~/.config/nvim/after/queries/markdown/injections.scm
; extends((inline) @injection.content  (#lua-match? @injection.content "^%s*import")  (#set! injection.language "typescript"))((inline) @injection.content  (#lua-match? @injection.content "^%s*export")  (#set! injection.language "typescript"))

Disabling spell checking

You may have noticed the red squiggly lines — this is because I have spell checking turned on, but it is often not desirable to have our code spell checked. We can disable it for a specific node by capturing it with @nospell with a query in highlights.scm:

~/.config/nvim/after/queries/markdown/highlights.scm
; extends((inline) @_inline (#lua-match? @_inline "^%s*import")) @nospell((inline) @_inline (#lua-match? @_inline "^%s*export")) @nospell

Conclusion

This is much better than what we started with, but there is still room for improvements:

  • JavaScript inside curly braces is not highlighted at all. I couldn’t find a way to achieve this with language injection.
  • Closing tags are not properly highlighted (because Treesitter sees a closing tag without a corresponding opening tag as an error).

I’m not sure if these issues could be resolved with queries alone — maybe it’d require a Treesitter parser similar to the parser for embedded template languages like EJS.