WIP: Syntax Highlighting #16

th0rex · 2019-12-13T19:16:58Z

I've implemented a basic version of syntax highlighting and am looking for feedback (whether the overall approach is good).

Highlighting with tree sitter:

Highlighting with font lock:

Currently I'm not really taking advantage of the more context sensitive tree sitter information, and am just using the default font lock faces.

I'm not sure how to best distribute this highlights.scm file, currently users need to specify the path to this_repo/grammars, from which the highlights.scm file for the given language is then read.

~~The only major bug I've noticed, is that while typing, sometimes code isn't immediatetly highlighted. For example, when I type~~

union X {
    a: i32,
    b: i32;
}
// or
fn foo() {
    let z = 1337;
}

Manually calling tree-sitter-highlight--highlight fixes that however, so I'm thinking it might have something to do with the comment on the after-change function in tree-sitter.el. My tree-sitter-highlight--handle-change isn't being called for unions or lets inside functions, but if I type the same let on the top level it is, I'm not quite sure what's causing that as I'm not familiar with tree sitter.
Edit: I've fixed this, changed_ranges had a bug. Even if this PR isn't merged, that one commit (6bed1e1) should definetly be merged.

My current config for using this looks like this:

(use-package f)

(add-to-list 'load-path "/vm/dev/github.com/th0rex/emacs-tree-sitter/lisp")

(require 'tree-sitter)
(seq-do #'ts-require-language '(c cpp go python rust))

;; Path to <this_repo>/grammars
(setq tree-sitter-highlight-query-dir "/vm/dev/github.com/th0rex/emacs-tree-sitter/grammars")
(require 'tree-sitter-highlight)

(defun enable-ts-hl ()
  (font-lock-mode -1) ;; Disable font-lock mode
  (tree-sitter-highlight-mode))

(add-hook 'c-mode-common-hook #'enable-ts-hl)
(add-hook 'python-mode-hook #'enable-ts-hl)
(add-hook 'rust-mode-hook #'enable-ts-hl)

yyoncho · 2019-12-14T22:34:29Z

tree-sitter-highlight.el

 (defun tree-sitter-highlight--after-change (old-tree)
-  (tree-sitter-highlight--highlight old-tree (window-start) (window-end)))
+  (tree-sitter-highlight--highlight old-tree (window-start) (window-end))


Just curious, have you considered providing tree-sitter implementations for font-lock-fontify-region-function(and related) ? Thus you will be able to share the logic related to calculating what has to be highlighted.

I think if we would want to do that we would want to choose jit-lock instead, font-lock relies on that (afaik) for knowing when to fontify what.
I've considered it, but didn't implement it yet, since this way is easier. I can add it as a toggle maybe, to see if it would improve things (mainly that currently I can either delay highlighting in case of errors, or eagerly highlight but have to clear (remove all faces from) the whole window first, which causes the part after the error to not be highlighted (which you technically can't do correctly because there's an error after all) and which also causes "flickering" if you type fast enough when you rapidly produce/remove errors in the syntax tree). However I think it's unlikely that jit-lock will help with that specific problem, because we already know what changed, and if i only clear/highlight what changed, then sometimes things are not highlighted at all.

Update: I've tried both font-lock and jit-lock. font-lock seems to work better, so I've only kept the code for that. I was also wrong about it being easier to just implement myself, so thanks for the suggestion.

jakejx · 2019-12-16T15:15:48Z

This PR looks great! I've been playing around with it and it works quite nicely. One thing I have noticed is that for some situations multiple faces are being applied to the same region. For example in Python:

p = Package(...)

The Package should be highlighted as Constructor based on the Python highlights.scm but when I inspect, the word is highlighted with (tree-sitter-variable-face tree-sitter-function-face tree-sitter-constructor-face nil). I will be digging into the code over the coming days and see whether I can help :)

Edit: it seems that this issue is likely due to overlapping queries. This may have to be fixed at the highlights.scm level where queries are ordered based on their specificity.

ubolonton · 2019-12-16T16:22:53Z

Thanks, this is great progress!

I changed the parameter order of ts-changed-ranges's and updated its docstring so that it's consistent with the Rust/C APIs.

I also pushed the fixes for ts-make-query, ts-query-captures, ts-query-matches.

I'm not sure how to best distribute this highlights.scm file, currently users need to specify the path to this_repo/grammars, from which the highlights.scm file for the given language is then read.

Ideally this file should be distributed together with the compiled language description (e.g. rust.so). Initially it could be a separate package that bundles multiple languages together. If/when tree-sitter becomes established enough that there are major modes based on it, it will be the job of the major modes to distribute these artifacts.

ubolonton · 2019-12-16T16:32:02Z

tree-sitter-highlight.el

+  "Syntax highlighting with tree sitter."
+  :init-value nil
+  :lighter "tree-sitter-highlight"
+  ;; TODO: idk if this file should do this.


Dependencies between minor modes are tricky: https://emacs.stackexchange.com/questions/36225/global-minor-mode-that-requires-another-during-its-lifetime.

We can either use a single minor mode, or make tree-sitter-mode aware of its possible dependent tree-sitter- minor modes, so that turning them on/off work properly.

ubolonton · 2019-12-16T16:36:50Z

tree-sitter-highlight.el

+
+(require 'tree-sitter)
+
+(require 'f)


This dependency should be declared in Cask.

However, I think tree-sitter is fundamental/low-level enough that fewer external dependencies are desirable. If we just want 1-2 functions from f, we can reimplement them.

I've added it to Cask for now, I'm just using f-join currently, we can re-implement that before we merge this. It's probably not needed if we bundle all the highlights.scm somewhere.

src/types.rs

th0rex · 2019-12-16T18:01:19Z

This PR looks great! I've been playing around with it and it works quite nicely. One thing I have noticed is that for some situations multiple faces are being applied to the same region. For example in Python:
p = Package(...)
The Package should be highlighted as Constructor based on the Python highlights.scm but when I inspect, the word is highlighted with (tree-sitter-variable-face tree-sitter-function-face tree-sitter-constructor-face nil). I will be digging into the code over the coming days and see whether I can help :)

I know that multiple faces are added, but that's actually intended. For me when I try doing x = Package(...), then the constructor face is added first ((tree-sitter-constructor-face tree-sitter-constant-face tree-sitter-function-face tree-sitter-variable-face nil)).
The reason I'm wanting to add multiple properties is that highlighted regions generally may overlap:

#[derive(Debug)]

the Debug is both a constructor but also inside an attribute, I'm sorting the query results such that larger things are added first (i.e. the attribute in this case), and more specific things later.
If I were using put-text-property (i.e. only having one face per region of text) instead of add-face-text-property, then the [ and ] wouldn't be highlighted with the attribute face (since they currently don't really have an aliased face, they're punctuation, but using put-text-property would thus also clear the attribute face from that. If everything had a proper face and everyone would want punctuation to be highlighted, we could use put-text-property instead, but I don't think thats the case, and instead of having non-highlighted punctuation in this case it's better to have it highlighted with the attribute face instead). And if I wouldn't sort the results at all, the Debug wouldn't get highlighted differently, but I think we want that to be highlighted differently.

At least for the rust implementation of querying the regions to highlight the sort is stable, i.e. if two things apply for the exact same range, the order is kept as given to use by the tree sitter api, which should (I think) reflect the order they're declared in a highlights.scm file.

If you could paste more of the python code for which the faces are being applied in the wrong order (because it doesn't happen in my python files/your code that i just pasted) that would be great. Maybe it's also caused by some incremental highlighting thing, see what happens after you've just openend that buffer and it was highlighted completely, whether the order is the same etc.

Edit:
I've added (and enabled by default for now) some debug output to the rust code, it would be nice if you could start emacs in a terminal and find what it's outputting for you. This is the relevant output for me, and the faces are added in that order (the order defined in highlights.scm):

25344 - 25347 ((776, 8) - (776, 11)) -> constructor
25344 - 25347 ((776, 8) - (776, 11)) -> constant
25344 - 25347 ((776, 8) - (776, 11)) -> function
25344 - 25347 ((776, 8) - (776, 11)) -> variable

jakejx · 2019-12-17T15:35:47Z

I was obtaining a different result as I was using the elisp implementation. Using the Rust implementation, I obtain the expected results.

I believe the difference in behaviour is due to this portion in highlights.rs:

for (i, (node, capture_index)) in vec.into_iter().rev().enumerate() {
    v.set(i, env.vector((capture_index, root_node.map(|_| node)))?)?;
}

The captures are reversed before they are passed to add-face-text-property. In the elisp version, the order of the captures are not reversed. I have tested adding a reverse call to tree-sitter-highlight--lisp at L240 and it seems to have fixed the issue.

I'm quite new to all of this so please let me know if I have misunderstood the code somewhere.

th0rex · 2019-12-17T16:05:27Z

Oh yeah seems like I've just forgotten that in the elisp version then, I only use that to play around with changes anyway because I don't need to restart emacs to reload the rust code.

rirze · 2019-12-31T17:19:57Z

Just wondering, what is the state of the branch atm? I tried to load it in this morning, but I'm unable to get updated code to appear formatted by tree-sitter. For example, activating tree-sitter-highlight-mode colors the file but also outputs this in the messages buffer:

helm-M-x: Wrong number of arguments: (2 . 2), 4Invalid face reference: nil
Invalid face reference: nil [727 times]

And then anytime code is updated or even navigated, the Invalid face reference: nil error count increments wildly. Is this a error I can mitigate on my end?

th0rex · 2019-12-31T18:07:28Z

The code works decently well for me, I've been using it for the past two weeks in Rust and C without any huge problems. I've never seen that error, it could be my fault, I don't really know how faces work in emacs, but settings the face to nil should be allowed. In what language are you getting that error? What version of emacs are you using? I've also briefly been using helm/ivy with this and didn't have any problems there.

rirze · 2019-12-31T21:50:54Z

Using Emacs 26.3, tried Python highlighting and ran into this error.

I started with my existing setup, downloaded your fork, ran make build and make ensure/python. Both succeeded just fine. Along with the relevant setup code you gave in the first comment in this PR, I was able to load the fork just fine.

When I opened up a Python file however, it would not automatically turn on tree-sitter-highlight-mode even though I added it as a hook. That's fine, I can just turn it on manually. As soon as I do M-x tree-sitter-highlight-mode, I run into those errors above.

I'm not really sure how to debug this.

I'll try testing this on a clean Emacs setup and see if it is Helm related or something else on my end.

rirze · 2020-01-02T15:07:25Z

@th0rex I turned on debugging, and this is the trace I get:

Debugger entered--Lisp error: (wrong-number-of-arguments (2 . 2) 4)
  #f(compiled-function (var val) "Set variable VAR to value VAL in current buffer." #<bytecode 0x2238cf>)(font-lock-fontify-buffer-function (lambda nil (tree-sitter-highlight--jit (point-min) (point-max))) font-lock-fontify-region-function (lambda (start end _verbose) (tree-sitter-highlight--jit start end)))
  (setq-local font-lock-fontify-buffer-function (lambda nil (tree-sitter-highlight--jit (point-min) (point-max))) font-lock-fontify-region-function (lambda (start end _verbose) (tree-sitter-highlight--jit start end)))
  tree-sitter-highlight--enable()
  (prog1 (tree-sitter-highlight--enable) (setq err nil))
  (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))
  (let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil)))))
  (if tree-sitter-highlight-mode (let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))) (tree-sitter-highlight--disable))
  (let ((last-message (current-message))) (setq tree-sitter-highlight-mode (if (eq arg (quote toggle)) (not tree-sitter-highlight-mode) (> (prefix-numeric-value arg) 0))) (tree-sitter-mode) (if tree-sitter-highlight-mode (let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))) (tree-sitter-highlight--disable)) (run-hooks (quote tree-sitter-highlight-mode-hook) (if tree-sitter-highlight-mode (quote tree-sitter-highlight-mode-on-hook) (quote tree-sitter-highlight-mode-off-hook))) (if (called-interactively-p (quote any)) (progn nil (if (and (current-message) (not (equal last-message (current-message)))) nil (let ((local " in current buffer")) (message "Tree-Sitter-Highlight mode %sabled%s" (if tree-sitter-highlight-mode "en" "dis") local))))))
  tree-sitter-highlight-mode()
  enable-ts-hl()
  run-hooks(change-major-mode-after-body-hook prog-mode-hook python-mode-hook)
  apply(run-hooks (change-major-mode-after-body-hook prog-mode-hook python-mode-hook))
  run-mode-hooks(python-mode-hook)
  python-mode()
  set-auto-mode-0(python-mode nil)
  set-auto-mode()
  normal-mode(t)
  after-find-file(nil t)
  find-file-noselect-1(#<buffer min_spaces_composed_str.py> "~/cart/codereview/min_spaces_composed_str.py" nil nil "~/cart/codereview/min_spaces_composed_str.py" (3937650 66306))
  find-file-noselect("~/cart/codereview/min_spaces_composed_str.py" nil nil t)
  find-file("~/cart/codereview/min_spaces_composed_str.py" t)
  funcall-interactively(find-file "~/cart/codereview/min_spaces_composed_str.py" t)
  call-interactively(find-file nil nil)
  command-execute(find-file)

I'm wondering if I missed some installation step with regards to the rust/elisp code. This is my setup in a clean Emacs launch:

(require 'f)
(add-to-list 'load-path "/somefolder/repos/emacs-tree-sitter")
(require 'tree-sitter)
(ts-require-language 'python)

(setq tree-sitter-highlight-query-dir "/somefolder/repos/emacs-tree-sitter/grammars")
(require 'tree-sitter-highlight)

(defun enable-ts-hl ()
  (tree-sitter-highlight-mode))

(add-hook 'python-mode-hook 'enable-ts-hl)

Before loading this, I do the usual make build, make ensure/python commands in the tree-sitter directory. Any insight would be appreciated!

Edit: I think I may be using the elisp code rather than the rust code-- how do I enable this switch?

th0rex · 2020-01-02T18:41:30Z

Seems like setq-local only supports one variable and value form instead of arbitrary many in all released emacs versions, and I've been using emacs compiled from git directly and thus had support for arbitrary many. That should be fixed now with the latest commit, thanks for finding this.

The elisp version referred to the actual highlighting (which now is all in rust), I've since removed that as I've only used it to quickly experiment with changes.

rirze · 2020-01-02T20:12:04Z

@th0rex Thanks for figuring that out! I guess I should eventually upgrade to 27, but that's good to know.

I now have the highlighting working, and am really enjoying it. I'll let you know if I find anything else.

ubolonton · 2020-01-19T10:35:18Z

font-lock/jit-lock use after-change-functions hook to compute the region to refontify .

The specific functions are font-lock-extend-after-change-region-function/font-lock-extend-jit-lock-region-after-change. They seem to do very little, relying on the font-lock-default-fontify-region function to safely extend the region, e.g. to cover multiline patterns previously fontified (with regex). In other words, they seem to expect font-lock-fontify-region-function to be stateful.

Since the tree-sitter-based highlighting function is stateless (which is better), I think we should replace these two functions with one that uses ts-changed-ranges, even if we still use the rest of font-lock/jit-lock.

rirze · 2020-02-08T02:02:50Z

A suggestion for default behavior-- I prefer operators to be highlighted the same as keywords. Correct me if I'm wrong, but I believe this is how highlighters other this PR would treat them as well. I've made the following change to get the behavior I like:

(defface tree-sitter-operator-face '((default :inherit font-lock-keyword-face))
  "Face used for operator"
  :group 'tree-sitter-highlight-faces)

ubolonton · 2020-02-25T12:04:29Z

Just a heads-up, if you are still working on this: I reorganized the directory structure a bit, moving all Lisp code to lisp/. You may want to rebase this branch.

I also reworked the types in #26.

th0rex · 2020-02-26T12:31:48Z

Currently I'm not working on this anymore, but I will again in the future (at most ~6 months from now) and I hope to get a better version done then. You were correct regarding the comment about font-lock, in fact sometimes highlighting got out of sync and some regions were left unfontified during the time I was using this but font-lock thought they were fontified, which is likely a result of font-lock and tree-sitter disagreeing on what region needs to be updated.

In general I'm still not quite sure how to clear highlights correctly after a change without causing "flickering", which was the reason for trying to use font/jit-lock. Using after-change-hook and immediately clearing ts-changed-ranges and then re-applying the faces according to the query results is correct, however as soon as your code has syntax errors in it (which happens often during editing), a really large region of text was cleared and the faces were only re-applied to the point at which the syntax error was.

maxbrunsfeld · 2020-02-26T17:19:41Z

Using after-change-hook and immediately clearing ts-changed-ranges and then re-applying the faces according to the query results is correct

Yes, this sounds like the right approach to me; you should always be able to highlight based on the current syntax tree, regardless of whether there's a syntax error or what. That's what Atom does. There should be no need for any state besides the tree itself.

however as soon as your code has syntax errors in it (which happens often during editing), a really large region of text was cleared

I'm curious about this. Syntax errors shouldn't cause noticeable problems for syntax highlighting. If they are, you might be hitting a bug in the query system, or a parsing bug. What language are you using? What highlights are you seeing? If you get a chance, could you try to reproduce the problem with the tree-sitter command line tool, either via tree-sitter parse, or tree-sitter web-ui?

th0rex · 2020-02-26T17:45:36Z

I'm curious about this. Syntax errors shouldn't cause noticeable problems for syntax highlighting. If they are, you might be hitting a bug in the query system, or a parsing bug. What language are you using? What highlights are you seeing? If you get a chance, could you try to reproduce the problem with the tree-sitter command line tool, either via tree-sitter parse, or tree-sitter web-ui?

@maxbrunsfeld

I'm fairly certain this isn't a bug but rather my fault, however I wasn't (and still am not) sure how to do this correctly. This was basically the code in question:

(defun tree-sitter-highlight--handle-change (change)
  "Called for each CHANGE after an edit."
  (let* ((start-byte (aref change 0))
         (end-byte   (aref change 1))
         (node       (ts-get-descendant-for-byte-range (ts-root-node tree-sitter-tree) start-byte end-byte))
         (matches    (ts-query-captures tree-sitter-highlight--queries node tree-sitter-highlight--query-cursor nil #'ts-node-text)))
    (set-text-properties (ts-node-start-position node) (ts-node-end-position node) nil)
    (seq-do #'tree-sitter-highlight--apply-match matches)))

(defun tree-sitter-highlight--after-change (old-tree)
  "Called with the OLD-TREE after the buffer has changed."
  (let* ((changes (ts-changed-ranges tree-sitter-tree old-tree)))
    (seq-do #'tree-sitter-highlight--handle-change changes)))

As far as I can remember, this worked well (it was always correct), however was rather slow since on (some) syntax errors in rust (which I tested this most on since I was mostly writing rust) the ts-get-descendant-for-byte-range would return a node that spans the whole buffer or most of the buffer, which resulted in a noticeable slowdown (at least on my fairly old processor) while typing. Thus I tried implementing that in Rust, however that also didn't speed things up too much (maybe emacs isn't too fast with removing and applying faces on these large ranges). I then tried various ways of somehow only highlighting the visible region, but most of them were broken in one way or the other (i.e. after some amount of editing all of them resulted in a region of code always being unhighlighted), this was also my fault.

That is the whole reason why I even tried integrating with jit-lock/font-lock, I hoped they would somehow help improve this, but they also had their own problems.

The syntax errors do not (as far as I noticed) cause any errors in the tree that propagate too far or result in broken queries etc, but they caused me (since I don't know of a better way) to re-highlight most of the buffer most of the time, which was slow. I do not know how to correctly determine which part I should clear and run queries on after changes, that do not result in the whole buffer being rehighlighted on some syntax errors. I've also tried looking at how atom does this but couldn't really find anything.

maxbrunsfeld · 2020-02-26T17:50:51Z

@th0rex Thanks for the reply.

I haven't checked how this is exposed to elisp, but I would think that you could use one of these APIs to confine your query to the visible screen range, so that they query would not take too long.

And then, on top of that, I think you could optionally cache the result of highlighting individual lines, and invalidate/splice that cache based on the ts-changed-ranges. That way, for a given frame, you would only run the query on the portion of the syntax tree that is both:

not yet cached
on screen

I'm not familiar with the emacs APIs, so there might be some details that make this difficult that I'm not taking into account.

th0rex · 2020-02-26T20:52:43Z

Ok so I've just retried this, now using the "proper" (I think) way to determine what's visible after scrolling took place. Even without caching the performance (in pure lisp now) seems to be a lot better, and it also seems to always correctly highlight now. I don't really remember why I didn't do this the first time but seems like I should have. Thanks @maxbrunsfeld

I'll still eventually implement caching however.

I've also "rebased".

yyoncho · 2020-02-26T21:05:57Z

lisp/tree-sitter-highlight.el

+                (puthash key value table)))
+        tree-sitter-highlight-default-faces)
+      table))
+  (let ((x (tree-sitter-highlight--make-queries (alist-get major-mode


@ubolonton not related to this PR but IMO tree-sitter should not be on top of the existing major modes but it should be a replacement for them. So instead of having major-mode -> language mapping IMO it should be "file-regex" -> language.

When there are tree-sitter-based major modes, the mapping mechanism would simply be auto-mode-alist.

However, that's likely too far away in the future. For adoption, I think it's better to piggyback on major modes.

I think we can just use tree-sitter-language here though, and leave the mapping for tree-sitter-mode.

Also, maybe we can define a tree-sitter-highlight-queries var, as the replacement of font-lock-keywords.

Language-specific minor modes (e.g. tree-sitter-rust) would then be responsible for loading/populating the queries however they see fit.

Just want to raise one more point: major-mode -> language mapping is not N -> 1 but it is N -> N. We had that issue in lsp-mode - for example, web-mode could be html, javascript, typescript, and so on.

Also, why do you think that it is too far in the future? I mean except highlighting and eventually indentation what are the other things that will be missing to have tree-sitter centric major modes?

Just want to raise one more point: major-mode -> language mapping is not N -> 1 but it is N -> N. We had that issue in lsp-mode - for example, web-mode could be html, javascript, typescript, and so on.

Yeah, that's definitely an issue. I haven't thought about multi-language files in-depth.

Also, why do you think that it is too far in the future? I mean except highlighting and eventually indentation what are the other things that will be missing to have tree-sitter centric major modes?

I think the main technical barrier is that many Emacs distributions have dynamic module support turned off. Other than that, I was mainly thinking about adoption/mindshare.

This should fix a kind of rare bug, where still some things would not get highlighted at all. I've also documented the code more and done some general cleanup. Additionally, Stefan Monnier notified me per email that I should be using add-hook and remove-hook for window-scroll-functions.

th0rex · 2020-03-29T20:46:35Z

Just a quick update: I've rebased on the latest master again, cleaned up the code, fixed two other bugs (although this makes scrolling slower now, but its fine if the file is byte compiled) and added changes suggested Stefan Monnier.

This only applies to scrolling and only in some cases, and highlighting is usually quickly correct again after the syntax error is fixed. Thus it's off by default, since it improves performance a lot this way.

rirze · 2020-03-30T19:33:10Z

Is it possible to use the compiled language binaries (eg. python.so) instead of the .scm file for highlighting? More generally, is it possible to combine both functions to use the same set of language specifications? It seems like a possible redundancy to distribute both scm and so files in the eventual future of this package.

maxbrunsfeld · 2020-03-30T20:04:13Z

The .so files are parsers, and the .scm files are "tree queries" that specify how syntax highlighting should be applied to the syntax tree. They serve two different purposes, and I think both are needed for syntax highlighting.

rirze · 2020-03-30T20:24:13Z

Thanks for clarifying, I figured that was the case.

th0rex · 2020-03-30T20:31:13Z

I've now switched to using font-lock again (as per the suggestion of Stefan Monnier), however there are still lags for large buffers where ts-changed-ranges or
ts-get-descendant-for-position-range return a range spanning the whole buffer, but it still is at least always correctly highlighted now. This has nothing to do with font-lock however, and I'm absolutely not sure how to fix this. I can reliably makets-changed-ranges produce the whole buffer as a range by just inserting a ( in one rust source file at line ~1000 of ~1500, and I don't really have the time to find out how to work around that. Everything I've tried is either incorrect or slows down in that file (it's not some kind of stress test file btw, just one of the only ones where I've noticed it until now).
Initially I thought tree-sitter would somehow always emit only "small" changed ranges
(I didn't expect to have to implement caching on top of tree-sitter and thought it would only emit "small" updates when possible, instead of wanting to re-query all 1500 lines). I know this is a hard problem to solve and I don't mean to shit on the tree-sitter devs, but like I said I currently do not have any more time for this.

If anyone wants to continue this work, they're free to use all my code (if it's useful anyway) in any way they'd like. Like I said, the highlighting with font-lock should at least always be correct now, but may be slow for some files while doing some edits. Maybe this is also a problem with how the grammar is specified and not with tree-sitter itself and it would be working perfectly with other files, or I'm still using tree-sitter wrong, I'm not sure.

maxbrunsfeld · 2020-03-30T20:34:39Z

I think the solution would be to only highlight ranges that are both: 1. changed and 2. on screen.

th0rex · 2020-03-30T20:36:37Z

I've tried that in many different variations, the problems were always with things like

r#"

// screen becomes visible here (i.e. this is the first line in the window), because we scrolled down

// we new close the raw string literal:
"#;

As far as I can see I need to run the query on the whole buffer (or on whatever ts-changed-ranges reports, which would be the whole buffer in that case, and also lead us back to the same problem with the lag) to highlight that correctly. Running the query on the window doesn't produce highlights for the raw string literal (I guess because the whole node isn't contained in the window?).

Also to be clear: only highlighting the visible lines isn't the problem, actually running the query on the whole buffer is. It is kind of slow, maybe it's also the interop between rust and emacs, I don't know.

ubolonton · 2020-03-31T12:11:58Z

@th0rex Thank you for all your hard work experimenting with this!

I'm sorry to bother you with one more request. I understand that you are frustrated and don't want to work on this anymore. I just want to get more info to continue on this. Can you send me these:

The example Rust file with the issue, and the location to insert (. I get the same behavior on one of the project's files, but I want to have as similar a starting point as possible.
The patches that Stefan sent you.
If possible, the code you tried to limit the ranges to rehighlight.

Running the query on the window doesn't produce highlights for the raw string literal (I guess because the whole node isn't contained in the window?).

I think extending the window range a bit with ts-get-descendant-for-position-range should work.

It is kind of slow, maybe it's also the interop between rust and emacs

This is an issue with Emacs dynamic's modules in general, more so with the Rust binding (because it includes a workaround for a safety bug in Emacs's GC, which will supposedly be fixed in Emacs 27). Calling ts--query-cursor-matches with index-only seems to be significantly faster in my quick test. I'll try making it return indexes and byte offsets only: no capture names, no nodes.

th0rex · 2020-03-31T16:12:58Z

The example Rust file with the issue, and the location to insert (. I get the same behavior on one of the project's files, but I want to have as similar a starting point as possible.

Sadly I can't since it's from a project for work.

The patches that Stefan sent you.

https://gist.github.com/th0rex/64a48204f5129c0cd293bd90932bd018

That patch didn't work immediately, but the current state of this PR should be similar enough to it.

If possible, the code you tried to limit the ranges to rehighlight.

https://github.com/th0rex/emacs-tree-sitter/blob/only_hl_visible_query_reported/lisp/tree-sitter-highlight.el
It does some maybe hacky things related to the issue with multiline string literals I mentioned above, doesn't use font-lock and in my testing also seemed to always be correct, but it still lags for those certain edits. (That branch only highlights what's completely visible plus some more things, i.e. precisely those nodes that also either end or start in our window, or the one node (if it exists) whose start and end lies outside of the window, but which spans the window (i.e. if the raw string literal starts and ends before the window and the window looks into the middle part of it), and runs the queries on what ts-changed-ranges or ts-get-descendant-for-position-range (when scrolling) reports)

Calling ts--query-cursor-matches with index-only seems to be significantly faster in my quick test. I'll try making it return indexes and byte offsets only: no capture names, no nodes.

If this would indeed fix it, it would be very nice.

Thank you for looking into this as well.

maxbrunsfeld · 2020-03-31T21:00:10Z

Running the query on the window doesn't produce highlights for the raw string literal (I guess because the whole node isn't contained in the window?).

That's surprising to me. That's not the intended behavior of ts_query_cursor_set_byte_range and ts_query_cursor_set_point_range, and it's not the behavior that I see in the web playground. The TSQueryCursor is supposed to return matches for any nodes that overlap the specified range. The nodes shouldn't need to be fully contained by the range.

It sounds like you're hitting a bug in Tree-sitter, or the Tree-sitter rust binding maybe? If anyone is interested and has time, it would be great to get a reproducible bug report for this.

ubolonton · 2020-04-11T16:54:21Z

I'm continuing the work here.

I'm using it locally, and haven't noticed any slowdown so far. It feels than font-lock-mode sometimes.

Here are the updates to some of the previous issues.

The main slowness was due to the highlighting code adding the same face multiple times. As a result, the face attribute becomes a very long list. Emacs's display is very slow when this happens.

I can reliably make ts-changed-ranges produce the whole buffer as a range by just inserting a (

I got this with Rust code as well. It seems specific to the Rust grammar. I could never reproduce it in C code. Scoping the queries to the invalidated regions (roughly the visible region) works for me though.

Running the query on the window doesn't produce highlights for the raw string literal (I guess because the whole node isn't contained in the window?).

I couldn't reproduce this. I did get large chunks of text not being fontified, but the cause turned out to be the intricate interactions between Emacs's font-lock and jit-lock. These interactions are gnarly and poorly documented, probably because they have been effectively "internal" for so many years. For example, returning (jit-lock-bounds beg . end) is the correct way, not setting 'fontified directly.

That's surprising to me. That's not the intended behavior of ts_query_cursor_set_byte_range and ts_query_cursor_set_point_range, and it's not the behavior that I see in the web playground. The TSQueryCursor is supposed to return matches for any nodes that overlap the specified range. The nodes shouldn't need to be fully contained by the range.

This is the behavior I've observed so far (no full containment needed). However, I did get some other interesting cases involving ts_tree_get_changed_ranges and ts_query_cursor_set_point_range. They are very edge cases though, as they happen for queries more complex than those currently used in the language repos. One of them happens only under the situation created by this bug in the Rust grammar. I'm still investigating them, and will open an issue on tree-sitter later.

th0rex force-pushed the my_highlight branch from 6bed1e1 to 1403844 Compare December 13, 2019 21:01

yyoncho reviewed Dec 14, 2019

View reviewed changes

ubolonton reviewed Dec 16, 2019

View reviewed changes

src/types.rs Outdated Show resolved Hide resolved

th0rex force-pushed the my_highlight branch from d1abf1d to 5a0606c Compare December 16, 2019 18:24

th0rex force-pushed the my_highlight branch from 00bf4e9 to 715035a Compare February 26, 2020 20:49

yyoncho reviewed Feb 26, 2020

View reviewed changes

shackra mentioned this pull request Mar 26, 2020

solution for multi-language files #33

Open

th0rex added 2 commits March 29, 2020 22:35

test highlighting without font-lock

c624004

Only apply the first matching property

a3c7f45

th0rex added 3 commits March 29, 2020 22:35

Fix wrong group for some faces

6a74d18

Make tree-sitter-operator-face inherit from font-lock-keyword-face

1489d3e

th0rex force-pushed the my_highlight branch from 43509f3 to d52289c Compare March 29, 2020 20:44

th0rex added 4 commits March 29, 2020 22:52

Make correct highlighting optional

4adfd7e

This only applies to scrolling and only in some cases, and highlighting is usually quickly correct again after the syntax error is fixed. Thus it's off by default, since it improves performance a lot this way.

Test highlighting with fontlock mode

0eb5bb1

Move common expressions to variables

9504e63

Remove debug messages

16642f2

th0rex closed this Mar 30, 2020

ubolonton mentioned this pull request Apr 11, 2020

Syntax highlighting that plays well with font-lock #36

Merged

12 tasks

ubolonton mentioned this pull request Apr 16, 2020

Documentation for ts_tree_get_changed_ranges and ts_query_cursor_set_byte_range tree-sitter/tree-sitter#598

Open

WIP: Syntax Highlighting #16

WIP: Syntax Highlighting #16

Conversation

th0rex commented Dec 13, 2019 • edited Loading

Choose a reason for hiding this comment

th0rex Dec 14, 2019 • edited Loading

Choose a reason for hiding this comment

th0rex Dec 15, 2019 • edited Loading

Choose a reason for hiding this comment

jakejx commented Dec 16, 2019 • edited Loading

ubolonton commented Dec 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

th0rex commented Dec 16, 2019 • edited Loading

jakejx commented Dec 17, 2019

th0rex commented Dec 17, 2019

rirze commented Dec 31, 2019

th0rex commented Dec 31, 2019 • edited Loading

rirze commented Dec 31, 2019 • edited Loading

rirze commented Jan 2, 2020 • edited Loading

th0rex commented Jan 2, 2020 • edited Loading

rirze commented Jan 2, 2020

ubolonton commented Jan 19, 2020

rirze commented Feb 8, 2020

ubolonton commented Feb 25, 2020 • edited Loading

th0rex commented Feb 26, 2020

maxbrunsfeld commented Feb 26, 2020

th0rex commented Feb 26, 2020 • edited Loading

maxbrunsfeld commented Feb 26, 2020 • edited Loading

th0rex commented Feb 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

th0rex commented Mar 29, 2020

rirze commented Mar 30, 2020

maxbrunsfeld commented Mar 30, 2020

rirze commented Mar 30, 2020

th0rex commented Mar 30, 2020

maxbrunsfeld commented Mar 30, 2020

th0rex commented Mar 30, 2020 • edited Loading

ubolonton commented Mar 31, 2020

th0rex commented Mar 31, 2020 • edited Loading

maxbrunsfeld commented Mar 31, 2020 • edited Loading

ubolonton commented Apr 11, 2020 • edited Loading

th0rex commented Dec 13, 2019 •

edited

Loading

th0rex Dec 14, 2019 •

edited

Loading

th0rex Dec 15, 2019 •

edited

Loading

jakejx commented Dec 16, 2019 •

edited

Loading

th0rex commented Dec 16, 2019 •

edited

Loading

th0rex commented Dec 31, 2019 •

edited

Loading

rirze commented Dec 31, 2019 •

edited

Loading

rirze commented Jan 2, 2020 •

edited

Loading

th0rex commented Jan 2, 2020 •

edited

Loading

ubolonton commented Feb 25, 2020 •

edited

Loading

th0rex commented Feb 26, 2020 •

edited

Loading

maxbrunsfeld commented Feb 26, 2020 •

edited

Loading

th0rex commented Mar 30, 2020 •

edited

Loading

th0rex commented Mar 31, 2020 •

edited

Loading

maxbrunsfeld commented Mar 31, 2020 •

edited

Loading

ubolonton commented Apr 11, 2020 •

edited

Loading