Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the behavior of Lexer.get_column #978

Merged
merged 2 commits into from
Mar 12, 2021
Merged

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Mar 11, 2021

Fixes #516
Fixes #589
Closes #640
Fixes #144

This PR finally makes the Lexer.get_column (needed for languages like Haskell, Elm) API work properly.

  • Adds unit test coverage for get_column, using a test fixture language with Haskell-like layout rules.
  • Accounts for tokens' position-dependence when editing a tree, so that incremental parsing is fully correct with respect to these "layout" tokens. This requires tracking which subtrees depend on their start column:
    • any external token where the scanner called get_column()
    • recursively, any parent node that has a child node on its first line that depends on its own start column
  • ⚠️ Changes the meaning of get_column so that it returns a byte count, not a character count ⚠️

Regarding this last item - We originally returned a character count from get_column because it seemed more technically correct, since GHC counted the unicode characters (as opposed to the bytes) in its implementation of layout. But this adds so much complexity (and perf cost) to our code that I really don't think it's worth it. There could be some slightly incorrect layout-parsing if somebody uses a mixture of different non-ascii whitespace characters for their layout, but this seems like a very uncommon situation.

/cc @razzeee @tek @bglgwyng @banacorn

@maxbrunsfeld maxbrunsfeld changed the title Fix the behavior of Lexer.get_column when at EOF Fix the behavior of Lexer.get_column Mar 11, 2021
@maxbrunsfeld maxbrunsfeld mentioned this pull request Mar 11, 2021
30 tasks
@razzeee
Copy link
Contributor

razzeee commented Mar 12, 2021

Anything you want us to do at this point or are you happy with us trying this, when it reaches the the released state?

@maxbrunsfeld
Copy link
Contributor Author

I ran all of the tree-sitter-elm tests, so I think we're good. Just let me know if you see any problems in your language server after upgrading.

@maxbrunsfeld maxbrunsfeld merged commit d366356 into master Mar 12, 2021
@maxbrunsfeld maxbrunsfeld deleted the fix-get-column-at-eof branch March 12, 2021 00:42
@razzeee
Copy link
Contributor

razzeee commented Mar 12, 2021

Did you remove the eof check for that? https://github.com/elm-tooling/tree-sitter-elm/blob/main/src/scanner.cc#L262

Is my understanding right, that that's obsolete now?

@maxbrunsfeld
Copy link
Contributor Author

Yeah, tests pass without that.

@maxbrunsfeld
Copy link
Contributor Author

Actually, without that check, one of the example files in elm-ui has a parse error. So maybe you want the eof() check anyway?

@razzeee
Copy link
Contributor

razzeee commented Mar 12, 2021

Will have to check, I just realized, that I never added the examples (elm-tooling/elm-language-server#527 (comment)) to our test suite. I probably thought, it's fixed now and we will never move that code again 😆

@razzeee
Copy link
Contributor

razzeee commented Mar 12, 2021

Added in elm-tooling/tree-sitter-elm@569336c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants