In the last post I described how, for a particular project with certain naming conventions, I wanted to use emacs tree-sitter font-locking to highlight variables that might be used in the wrong context:

I found a way to make it work for simple expressions but it was relying on regular expression matching rather than tree-sitter parse trees. I set a goal of handling these types of expressions:
elevation_t[t] // good elevation_t[r] // bug elevation_t[t_from_r[r]] // good elevation_t[r_from_t[t]] // bug elevation_t[obj.t] // good elevation_t[obj.r] // bug elevation_t[t_from_r(r)] // good elevation_t[r_from_t(t)] // bug elevation_t[x.t_fn(x)] // good elevation_t[x.r_fn(x)] // bug elevation_t[x.t_arr[i]] // good elevation_t[x.r_arr[i]] // bug elevation_t[(t)] // good elevation_t[(r)] // bug
What are my options for using tree-sitter to handle more cases? I decided to try
the :pred
feature of tree-sitter font lock. Using M-x treesit-explore-mode,
I learned that the subscript expression looks like:
(subscript_expression object: (_) [ index: (_) ] )
That means I can match it with a rule that captures both the array and index as separate nodes:
(defun amitp-check-subscript (array index) (message "debug: %S %S" array index)) (defvar amitp/treesit-font-lock-typescript-indexing (treesit-font-lock-rules :language 'typescript :feature 'semantic-identifier :override t `(((subscript_expression object: (identifier) @array index: (expression_statement) @index (:pred amitp-check-subscript @array @index))))))
Those two arguments are treesit parse nodes. What can I do with them? The
emacs manual has a page about how to navigate the parse tree and another
page about querying the parse tree. In the docs found a function
treesit-query-capture
that can pattern match on a node and extract the
parts.
I used that function to look for the identifier that most likely tells me
what geometry type is being returned (r
for region, s
for side, t
for
triangle). Examples:
- in
t_foo_r(r)
, the identifiert_foo_r
tells me this is likely at
(triangle) - in
t_obj[i].r_elevation
, the identifierr_elevation
tells me this is likely anr
(region) - in
map[i].s_inner_t(t_queue[0])
, the identifiers_inner_t
tells me this is likely ans
(side)
I think this is where tree-sitter can do better than regular expression based matching. I wrote a function to find the key identifier:
(defun amitp/treesit-top-typescript-node (node) "Try to find the identifier that represents a node's return value" (let ((results (treesit-query-capture node '(((identifier) @identifier) ((member_expression property: (property_identifier) @property)) ((subscript_expression object: (_) @expr)) ((call_expression function: (_) @expr)) ((parenthesized_expression (_) @expr)))))) (if (assoc 'expr results) (amitp/treesit-top-typescript-node (cdr (assoc 'expr results))) ;; prefer the property if tree-sitter matches both (or (cdr (assoc 'property results)) (cdr (assoc 'identifier results))))))
I wrote a function detect a mismatch:
(defun amitp-check-subscript-mismatch (array index) ;; note: my usual naming convention is amitp/foo but tree-sitter won't accept that (let* ((array-name (buffer-substring (treesit-node-start array) (treesit-node-end array))) (top (amitp/treesit-top-typescript-node index)) (index-name (buffer-substring (treesit-node-start top) (treesit-node-end top)))) (and (or (= (length array-name) 1) (equal "_" (substring array-name -2 -1))) (or (= (length index-name) 1) (equal "_" (substring index-name 1 2))) (not (equal (substring array-name -1) (substring index-name 0 1))))))
Then I used the tree-sitter :pred
feature to only highlight if there's a mismatch:
(defface array-error) (defface index-error) (defvar amitp/treesit-font-lock-typescript-indexing (treesit-font-lock-rules :language 'typescript :feature 'semantic-identifier :override t '(((subscript_expression object: (_) @array-error index: (_) @index-error (:pred amitp-check-subscript-mismatch @array-error @index-error)))))) ;; todo: add call_expression
Does it work? Yes!

But there's something a little weird here. Why do r+1
and 1+r
get flagged
even though I never actually handle binary operators in my
amitp/treesit-top-typescript-node
function? Something to investigate later.
The next feature I wanted was to highlight only the "type" character
instead of the entire identifier. I couldn't figure out how, so I asked on
Reddit, where /u/eleven_cupfuls
gave me a solution: the @face-name can be a
@function-name instead. Thank you!
Instead of using :pred
to tell font-lock whether to highlight something, I
need to highlight it myself. I changed the treesit font lock rules to call
my function instead of applying a face:
(defvar amitp/treesit-font-lock-typescript-indexing (treesit-font-lock-rules :language 'typescript :feature 'semantic-identifier :override t '(((subscript_expression) @amitp-highlight-subscript-mismatch) ((call_expression) @amitp-highlight-subscript-mismatch))))
My function now receives the entire expression instead of receiving
the array and index parts separately like the :pred
version. So I need to
find a way to extract the right information. I can use
treesit-query-capture
like before, right?
It was at this point that I learned treesit-query-capture
pattern matches
against the entire tree and not only the top node. That explains why it was
finding r+1
and 1+r
. It was looking for any identifier anywhere in the
subtree. Oops.
I rewrote amitp/treesit-top-typescript-node
to use treesit-node-type
instead of treesit-query-capture
:
(defun amitp/treesit-top-typescript-node (node) "Try to find the identifier that represents a node's return value" (pcase (treesit-node-type node) ("identifier" node) ("member_expression" (treesit-node-child-by-field-name node "property")) ("subscript_expression" (amitp/treesit-top-typescript-node (treesit-node-child-by-field-name node "object"))) ("call_expression" (amitp/treesit-top-typescript-node (treesit-node-child-by-field-name node "function"))) ("parenthesized_expression" (amitp/treesit-top-typescript-node (treesit-node-child node 1))) (_ nil)))
It would've been cleaner if pcase could pattern match against these nodes, but it's not too bad.
And then I wrote a new amitp-highlight-subscript-mismatch
function to
analyze subscript nodes (and function calls too) to make sure the names
match:
(defun amitp-highlight-subscript-mismatch (node override start end &rest _) "Highlight mismatching variable names in amitp's geometry mesh code" (let* ((parts (pcase (treesit-node-type node) ("subscript_expression" (list (treesit-node-child-by-field-name node "object") (treesit-node-child-by-field-name node "index"))) ("call_expression" (list (treesit-node-child-by-field-name node "function") (treesit-node-child (treesit-node-child-by-field-name node "arguments") 1)))))) (when parts (let* ((array (nth 0 parts)) (index (nth 1 parts)) (array-name (treesit-node-text array)) (top (amitp/treesit-top-typescript-node index)) (index-name (or (treesit-node-text array) "")) (face (if (and (or (= (length array-name) 1) (equal "_" (substring array-name -2 -1))) (or (= (length index-name) 1) (s-match "[_0-9]" (substring index-name 1 2)))) (if (equal (substring array-name -1) (substring index-name 0 1)) 'index-good 'index-bad) nil))) (when face (treesit-fontify-with-override (- (treesit-node-end array) 2) (treesit-node-end array) face override) (treesit-fontify-with-override (treesit-node-start top) (min (treesit-node-end top) (+ 2 (treesit-node-start top))) face override))))))
It's a little bit long but it seems to work:

However, there are things I haven't figured out, like why certain expressions don't trigger the (subscript_expression)
font lock rule:
r => mesh.foo_r[r]); // matches (r => mesh.foo_r[r]); // matches foo(mesh.foo_r[r]); // matches foo(r => mesh.foo_r[r]); // doesn't match
That's a problem for another day. I also want to:
- apply these rules only to files in which I use that naming convention
- tighten the matching to apply only to the letters
r
,s
,t
- remove the green "it's ok" color and only leave the red "warning" color
This was a fun adventure. I'm not sure this is the best approach but it was a nice excuse to play with emacs tree-sitter.
Labels: emacs
Post a Comment