Amit's Thoughts: Emacs Tree-sitter custom highlighting, part 2

Tuesday, March 04, 2025

In the last post I described my initial attempt at customizing font-lock using Emacs 29 tree-sitter. Regular font locking uses regular expression matching. Tree sitter allows combining regular expression matching with parse tree matching, like "variable declaration" or "function call". The ideas I came up with were mostly things that I could have implemented with regular expressions alone, but I wanted to experiment with the tree sitter approach. Here's the effect of conventional highlighting, using color for keywords and other syntax:

Screenshot showing no use of color — Screenshots showing the use of color for syntax highlighting

Screenshot showing color for syntax — Screenshots showing the use of color for syntax highlighting

I wanted to instead mark variable names. In some projects I'm working with geometry, and I sometimes make mistakes putting the wrong variables together. I decided to mark horizontal and vertical variables differently. First, I defined a regular expression to guess that a variable represents something "horizontal":

(defvar amitp/regexp-horizontal-identifiers
  (rx
   (seq
    string-start
    (or "x" "dx" "width" "left" "right" "q" "col" "cols" "columns")
    (zero-or-more digit)
    string-end)))

(defvar amitp/regexp-vertical-identifiers
  (rx
   (seq
    string-start
    (or "y" "dy" "height" "top" "bottom" "up" "down" "r" "row" "rows")
    (zero-or-more digit)
    string-end)))

I had started out with a string regexp but switched to the rx syntax. Over time I added more patterns to this, including all caps and camelcase patterns.

Unlike regular expression font locking, I can limit these regular expressions to match within certain types of parse tree nodes:

(defvar amitp/treesit-font-lock-typescript-axis-variables
  (treesit-font-lock-rules
   :language 'typescript
   :feature 'semantic-identifier
   :override t
   `(
     ([(identifier) (property_identifier) (shorthand_property_identifier_pattern)]
      @variable-horizontal-face
      (:match ,amitp/regexp-horizontal-identifiers @variable-horizontal-face))
     ([(identifier) (property_identifier) (shorthand_property_identifier_pattern)]
      @variable-vertical-face
      (:match ,amitp/regexp-vertical-identifiers @variable-vertical-face))
     )))

(add-hook
 'typescript-ts-mode-hook
 (lambda ()
   (setq-local treesit-font-lock-settings
               (append treesit-font-lock-settings
                       amitp/treesit-font-lock-typescript-axis-variables))))

I think the restriction doesn't help much in this case, but it was fun learning how to use treesit-font-lock-rules for this. Here's the result:

Do you notice the bug?

This was a fun experiment. But it's a bit too noisy, especially if I have both variable and syntax highlighting. I think it'd be better if I could highlight only the problematic subexpression (the y2-dx). I decided to try it with a simpler problem.

In some of my projects, I use geometry meshes that have integer IDs for regions (r), sides (s), and triangles (t). I want to avoid bug where an array is indexed by a triangle ID, but I accidentally index it by a triangle ID. Some languages have a feature called "newtype" to make these separate types, but I don't have them in Javascript. Instead, I use a naming convention to help me catch these:

x_foo is a value storing type x
x_foo_y is a function or array indexed by type y and stores values of type x

I was hoping to use tree-sitter to find subscript expressions, and then check if the array name ends with the same type _x as the index expression x_ starts with. I implemented it for simple cases:

(defvar amitp/treesit-font-lock-typescript-indexing
  (treesit-font-lock-rules
   :language 'typescript
   :feature 'semantic-identifier
   :override t
   `(((subscript_expression) @error
      (:match "_r\\[_?[st]\\|_s\\[_?[rt]\\|_t\\[_?[sr]" @error)))
   ))

And it does work:

Screenshot — Highlighting potential indexing error

However, tree-sitter plays almost no role in this. The real work is done by the regular expression. It doesn't catch these cases:

elevation_t[obj.t]       // good
elevation_t[obj.r]       // bug
elevation_t[t_from_r(r)] // good
elevation_t[r_from_t(t)] // bug
elevation_t[x.t_fn(x)]   // good
elevation_t[x.r_fn(x)]   // bug
elevation_t[x.t_arr[i]]  // good
elevation_t[x.r_arr[i]]  // bug

How would I also handle these cases? I attempted it, but it took more time than I had originally planned for, so that'll be in the next blog post.

Labels: emacs

– Amit – Tuesday, March 04, 2025

Emacs Tree-sitter custom highlighting, part 2
From Amit’s Thoughts

0 comments: