-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
an incremental change can corrupt the parse tree #1444
Comments
I'm pretty sure from the debug logging that this is the block that's causing the wrap in ERROR Lines 1291 to 1300 in 67de943
(haven't been able to dedicate much time but I'm still looking into this :) |
tree-sitter-rust: EvgeniyPeshkov/syntax-highlighter#46 (comment)
so tree-sitter is missing one of its goals
would be interesting to compare other incremental parsers: lezer, papa-carlo, parsley, sparse, scanner, ... |
From lezer's docs/guid:
From papa-carlo's #error-recovering:
From parsley's README
This is one of several researches cited on tree-sitter home. I don't think scanner is comparable with tree-sitter. I don't know what algorithm tree-sitter uses for error recovery. I've been wanting to look carefully at the code, but haven't found time yet. Form a quick glance, the code does look similar to snippets from Efficient and Flexible Incremental Parsing. By the way, error recovery does not seems to be fast, according to benchmarks from Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers, the mean time for recovery are in tens of milliseconds (on a Xeon with 32GB of ram, they ran sequentially, thought). Since it is not fast, and does fails sometimes, I've wondered if a Graph Structured Network (that seems to be quite good at matching trees) couldn't be used to provide the suggestion faster and with less error. But implementing those would likely be a premature optimization. I think tree-sitter should try to get stable before implementing any new shiny features. Addendum. The time tree-sitter takes to highlight never bothered me. Latency in 10ms isn't perceptible (my head is at another timescale right now, thus the comment about not being fast...). I do find annoying empty string triggering the error recovery and messing up the highlighting. But this is an issue with the grammar, not tree-sitter itself. Maybe we could draft a standard on how to write grammars? |
Thanks very much for the report @the-mikedavis; this is definitely a bug in Tree-sitter. I'd be very curious if it turns out to be reproducible with any other grammars, and if not, what it is about the Elixir grammar that triggers it. I know that it may be a lot of work to answer both of these questions. |
Ok, I investigated this a bit. In the Elixir grammar, there is a token called Tree-sitter should allow you to do this, but it is currently not handling it properly. I think the generated C code is good, but the runtime library is not correctly managing the reusability of that token. Specifically, when the Thanks so much for creating such a small, isolated reproduction case @the-mikedavis. |
Ok, I think this is fixed on master. |
I'm going to cut new releases of the library and CLI, because this was a pretty severe bug. |
I published 0.20.1. |
see helix-editor/helix#1338 (comment) Leading whitespace is important when injecting diff highlights into messages trailing the scissors. Without this change, some adverserial context lines can end up being mistakenly parsed as non-$.context rules. For example, in the screenshot of the linked issue comment, a context line is being parsed by a tricky link that looks like a malformed $.similarity rule. Because NEWLINE is not a child of $._line but $.source in tree-sitter-git-diff, part of the line is re-parsed into another valid $._line rule, namely $.addition in this case. For an example COMMIT_EDITMSG which has a second diff line 6 characters to the right of the newline, this commit changes the start column of the parsed $.message node to include the whitespace: (source [0, 0] - [7, 0] (subject [0, 0] - [0, 53]) (comment [2, 0] - [4, 38] (scissors [2, 0] - [4, 38])) (message [5, 0] - [5, 36]) - (message [6, 5] - [6, 80])) + (message [6, 0] - [6, 80])) This change probably restricts this grammar to tree-sitter 0.20.1+ because the WHITE_SPACE 'extra' is now used as an extra and within a rule (see tree-sitter/tree-sitter#1444 (comment)) but trailing diffs are not meant to be edited anyways, so it's probably not a big deal.
👋 hello!
I'm working with the elixir grammar and I noticed that the parse tree can get into a wonky state with an incremental change.
Starting with some valid Elixir like so:
we have a good tree:
then we edit our Elixir a bit to something invalid (replace
A.B
withA,B
):and the tree recognizes that this expression is invalid:
Now we edit our snippet back to the original valid code, but the tree becomes corrupted:
In this particular case we can kick the tree back into valid state by deleting the
.
betweenB
andc
and then typing it back in.I can reproduce the behavior using the playground building the repo from master:
and in the helix editor (see helix-editor/helix#830 (comment)) which I believe uses the rust bindings.
Most minimally:
I haven't been able to reproduce the behavior with other grammars yet, but it was a pretty random chance that I found this case here. I'll go and try to mangle some other grammars' trees too 🕵️
The text was updated successfully, but these errors were encountered: