In the last post I described my initial attempt at customizing font-lock using Emacs 29 tree-sitter. Regular font locking uses regular expression matching. Tree sitter allows combining regular expression matching with parse tree matching, like "variable declaration" or "function call". The ideas I came up with were mostly things that I could have implemented with regular expressions alone, but I wanted to experiment with the tree sitter approach. Here's the effect of conventional highlighting, using color for keywords and other syntax:


I wanted to instead mark variable names. In some projects I'm working with geometry, and I sometimes make mistakes putting the wrong variables together. I decided to mark horizontal and vertical variables differently. First, I defined a regular expression to guess that a variable represents something "horizontal":
(defvar amitp/regexp-horizontal-identifiers (rx (seq string-start (or "x" "dx" "width" "left" "right" "q" "col" "cols" "columns") (zero-or-more digit) string-end))) (defvar amitp/regexp-vertical-identifiers (rx (seq string-start (or "y" "dy" "height" "top" "bottom" "up" "down" "r" "row" "rows") (zero-or-more digit) string-end)))
I had started out with a string regexp but switched to the rx syntax. Over time I added more patterns to this, including all caps and camelcase patterns.
Unlike regular expression font locking, I can limit these regular expressions to match within certain types of parse tree nodes:
(defvar amitp/treesit-font-lock-typescript-axis-variables (treesit-font-lock-rules :language 'typescript :feature 'semantic-identifier :override t `( ([(identifier) (property_identifier) (shorthand_property_identifier_pattern)] @variable-horizontal-face (:match ,amitp/regexp-horizontal-identifiers @variable-horizontal-face)) ([(identifier) (property_identifier) (shorthand_property_identifier_pattern)] @variable-vertical-face (:match ,amitp/regexp-vertical-identifiers @variable-vertical-face)) ))) (add-hook 'typescript-ts-mode-hook (lambda () (setq-local treesit-font-lock-settings (append treesit-font-lock-settings amitp/treesit-font-lock-typescript-axis-variables))))
I think the restriction doesn't help much in this case, but it was fun
learning how to use treesit-font-lock-rules
for this. Here's the result:

Do you notice the bug?
This was a fun experiment. But it's a bit too noisy, especially if I have both variable and syntax highlighting. I think it'd be better if I could highlight only the problematic subexpression (the y2-dx). I decided to try it with a simpler problem.
In some of my projects, I use geometry meshes that have integer IDs for
regions (r
), sides (s
), and triangles (t
). I want to avoid bug where an
array is indexed by a triangle ID, but I accidentally index it by a
triangle ID. Some languages have a feature called "newtype" to make these
separate types, but I don't have them in Javascript. Instead, I use a
naming convention to help me catch these:
x_foo
is a value storing type xx_foo_y
is a function or array indexed by type y and stores values of type x
I was hoping to use tree-sitter to find subscript expressions, and
then check if the array name ends with the same type _x
as the index
expression x_
starts with. I implemented it for simple cases:
(defvar amitp/treesit-font-lock-typescript-indexing (treesit-font-lock-rules :language 'typescript :feature 'semantic-identifier :override t `(((subscript_expression) @error (:match "_r\\[_?[st]\\|_s\\[_?[rt]\\|_t\\[_?[sr]" @error))) ))
And it does work:

However, tree-sitter plays almost no role in this. The real work is done by the regular expression. It doesn't catch these cases:
elevation_t[obj.t] // good elevation_t[obj.r] // bug elevation_t[t_from_r(r)] // good elevation_t[r_from_t(t)] // bug elevation_t[x.t_fn(x)] // good elevation_t[x.r_fn(x)] // bug elevation_t[x.t_arr[i]] // good elevation_t[x.r_arr[i]] // bug
How would I also handle these cases? I attempted it, but it took more time than I had originally planned for, so that'll be in the next blog post.
Labels: emacs
Post a Comment