Good enough syntax highlight for MDX in Neovim using Treesitter
In this blog post, I’m gonna show how to get a good enough syntax highlighting for MDX files in Neovim.
We can get pretty far by just adding some queries to our configuration, using the standard Treesitter parser for markdown (CommonMark) — no need to write a new parser, which is more challenging. Let’s see how.
Supporting .mdx
files
First, let’s configure nvim
to use the mdx
filetype for every file with the
.mdx
extension:
vim.filetype.add({ extension = { mdx = 'mdx' }})
Then, let’s configure the
nvim-treesitter
plugin
to use the markdown
parser for mdx
filetypes:
local ft_to_parser = require("nvim-treesitter.parsers").filetype_to_parsernameft_to_parser.mdx = "markdown"
Now we have markdown highlighting in our mdx files, but of cours it still doesn’t highlight any JS/JSX we wrote:
Injecting JS/JSX into markdown
To highlight the JS/JSX part, we’ll use Treesitter’s language injection.
We do this by writing a Treesitter query to select the nodes in the tree we want to highlight using another parser. It’s similar to CSS selectors — but it has completely different syntax, a Scheme-like language.
How can we write JavaScript/JSX in MDX?
We can write JavaScript inside curly braces, like this:
The current year is {new Date().getFullYear()}
Lines beginning with with import/export are also interpreted as JavaScript:
import {getCurrentYear} from './date.js'export const currentYear = getCurrentYear() # The current year is {currentYear}
We can write JSX anywhere, similar to how you can write HTML in Markdown.
import {Box} from 'design-system' <Box> Here's some text</Box>
And even inline, inside of a heading:
export const Thing = () => <>World</> # Hello <Thing />
Writing Treesitter queries
Let’s first write a query to select those nodes that start with import
or
export
. We can get a nice representation of the abstract syntax tree built by
Treesitter with :InspectTree
command, making it easy to see where the
text we want to select is in the tree: section
-> paragraph
-> inline
:
The simplest query that would match that node is:
(section (paragraph (inline)))
But that’s not what we want — we need to select only the nodes that start with
import
or export
. We can use Treesitter query predicates (see :h treesitter-predicates
) to conditionally select nodes — for instance, the
#match
predicate will allow us to select nodes with text that matches a Vim
regexp.
((inline) @_inline (#match? @_inline "^\s*\(import\|export\)"))
Using Vim regexp is expensive though, so it’d be preferred to use Lua patterns in this case:
((inline) @_inline (#lua-match? @_inline "^%s*import"))((inline) @_inline (#lua-match? @_inline "^%s*export"))
Extending Treesitter injection queries
Now that we know this query correctly matches what we expect, we need to highlight them as JavaScript.
For this, we’ll have to write these queries in a file called injections.scm
in a queries
directory in our runtime path (see Adding
Queries).
Then we need to capture the node with a name that matches the injected language
parser’s name, in our case it’s tsx
:
; extends((inline) @injection.content (#lua-match? @injection.content "^%s*import") (#set! injection.language "typescript"))((inline) @injection.content (#lua-match? @injection.content "^%s*export") (#set! injection.language "typescript"))
Disabling spell checking
You may have noticed the red squiggly lines — this is because I have spell
checking turned on, but it is often not desirable to have our code spell
checked. We can disable it for a specific node by capturing it with @nospell
with a query in highlights.scm
:
; extends((inline) @_inline (#lua-match? @_inline "^%s*import")) @nospell((inline) @_inline (#lua-match? @_inline "^%s*export")) @nospell
Conclusion
This is much better than what we started with, but there is still room for improvements:
- JavaScript inside curly braces is not highlighted at all. I couldn’t find a way to achieve this with language injection.
- Closing tags are not properly highlighted (because Treesitter sees a closing tag without a corresponding opening tag as an error).
I’m not sure if these issues could be resolved with queries alone — maybe it’d require a Treesitter parser similar to the parser for embedded template languages like EJS.