Fix the behavior of Lexer.get_column #978

maxbrunsfeld · 2021-03-11T19:52:02Z

Fixes #516
Fixes #589
Closes #640
Fixes #144

This PR finally makes the Lexer.get_column (needed for languages like Haskell, Elm) API work properly.

Adds unit test coverage for get_column, using a test fixture language with Haskell-like layout rules.
Accounts for tokens' position-dependence when editing a tree, so that incremental parsing is fully correct with respect to these "layout" tokens. This requires tracking which subtrees depend on their start column:
- any external token where the scanner called get_column()
- recursively, any parent node that has a child node on its first line that depends on its own start column
⚠️ Changes the meaning of get_column so that it returns a byte count, not a character count ⚠️

Regarding this last item - We originally returned a character count from get_column because it seemed more technically correct, since GHC counted the unicode characters (as opposed to the bytes) in its implementation of layout. But this adds so much complexity (and perf cost) to our code that I really don't think it's worth it. There could be some slightly incorrect layout-parsing if somebody uses a mixture of different non-ascii whitespace characters for their layout, but this seems like a very uncommon situation.

/cc @razzeee @tek @bglgwyng @banacorn

razzeee · 2021-03-12T00:22:17Z

Anything you want us to do at this point or are you happy with us trying this, when it reaches the the released state?

maxbrunsfeld · 2021-03-12T00:42:34Z

I ran all of the tree-sitter-elm tests, so I think we're good. Just let me know if you see any problems in your language server after upgrading.

razzeee · 2021-03-12T00:44:56Z

Did you remove the eof check for that? https://github.com/elm-tooling/tree-sitter-elm/blob/main/src/scanner.cc#L262

Is my understanding right, that that's obsolete now?

maxbrunsfeld · 2021-03-12T00:46:43Z

Yeah, tests pass without that.

maxbrunsfeld · 2021-03-12T00:48:14Z

Actually, without that check, one of the example files in elm-ui has a parse error. So maybe you want the eof() check anyway?

razzeee · 2021-03-12T00:49:40Z

Will have to check, I just realized, that I never added the examples (elm-tooling/elm-language-server#527 (comment)) to our test suite. I probably thought, it's fixed now and we will never move that code again 😆

razzeee · 2021-03-12T00:53:53Z

Added in elm-tooling/tree-sitter-elm@569336c

Fix behavior of Lexer.get_column when at EOF

e29d371

maxbrunsfeld force-pushed the fix-get-column-at-eof branch from e77b303 to e29d371 Compare March 11, 2021 20:11

When editing, properly invalidate trees that depend on get_column

a40045a

maxbrunsfeld changed the title ~~Fix the behavior of Lexer.get_column when at EOF~~ Fix the behavior of Lexer.get_column Mar 11, 2021

maxbrunsfeld mentioned this pull request Mar 11, 2021

Tree-sitter 1.0 Checklist #930

Open

30 tasks

maxbrunsfeld merged commit d366356 into master Mar 12, 2021

maxbrunsfeld deleted the fix-get-column-at-eof branch March 12, 2021 00:42

maxbrunsfeld mentioned this pull request Mar 17, 2021

scanner counts newlines multiple times tree-sitter/tree-sitter-haskell#31

Closed

jmbockhorst mentioned this pull request May 12, 2021

web-tree-sitter failing while other bindings are working #1099

Closed

maxbrunsfeld mentioned this pull request Sep 23, 2021

get_column seems to return byte count, not codepoint count #1405

Closed

ahelwer mentioned this pull request Jan 11, 2022

get_column now counts codepoints instead of bytes #1581

Merged

dundargoc added this to the 1.0 milestone Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the behavior of Lexer.get_column #978

Fix the behavior of Lexer.get_column #978

maxbrunsfeld commented Mar 11, 2021 •

edited

razzeee commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

razzeee commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

razzeee commented Mar 12, 2021

razzeee commented Mar 12, 2021

Fix the behavior of Lexer.get_column #978

Fix the behavior of Lexer.get_column #978

Conversation

maxbrunsfeld commented Mar 11, 2021 • edited

razzeee commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

razzeee commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

maxbrunsfeld commented Mar 12, 2021

razzeee commented Mar 12, 2021

razzeee commented Mar 12, 2021

maxbrunsfeld commented Mar 11, 2021 •

edited