Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-sitter get wrong point when rules contains '\s' #2558

Closed
starwing opened this issue Aug 24, 2023 · 3 comments
Closed

tree-sitter get wrong point when rules contains '\s' #2558

starwing opened this issue Aug 24, 2023 · 3 comments

Comments

@starwing
Copy link

With this test grammar:

module.exports = grammar({
  name: 'test',

  rules: {
    // TODO: add the actual grammar rules
    source_file: $ => $.program,

    program: ($) =>
      prec(16,
        repeat1(choice(
          $._statement,
          $._document,
        ))
      ),

    _statement: ($) =>
      prec.right(15,
        seq($.identifier)
      ),
   
      identifier: ($) => /[a-zA-Z_][a-zA-Z0-9_]*/,
      _document: ($) => /foo/,  // <-- notice this
  }
});

and the test input:

a
b
c

tree-sitter produces this result (correct):

$ tree-sitter parse test
(source_file [0, 0] - [2, 1]
  (program [0, 0] - [2, 1]
    (identifier [0, 0] - [0, 1])
    (identifier [1, 0] - [1, 1])
    (identifier [2, 0] - [2, 1])))

but if I changed document to this:

      _document: ($) => /\sfoo/, // <-- notice this

this result changed to this:

$ tree-sitter parse test
(source_file [0, 0] - [2, 1]
  (program [0, 0] - [2, 1]
    (identifier [0, 0] - [0, 1])
    (identifier [0, 1] - [1, 1])
    (identifier [1, 1] - [2, 1])))

even there is not any "foo" in test input. the "newline" before identifiers is included into identifier.

@amaanq
Copy link
Member

amaanq commented Aug 24, 2023

newlines are an extra by default (always skipped if possible wherever), if you don't want that then specify your extras property as an empty array

I misunderstood your question, this is interesting

I believe it has to do w/ the regex starting to process \s, then failing and trying to recover. See the output of the good and bad tree:

Good:
image

Bad:
image

Regexes are always terminal, and I'd like to say the usage of \s here causes the parser to try and not treat ws in possible identifier/document scenarios as an extra

@starwing
Copy link
Author

@amaanq Thanks for the explain, so it is a bug or just a feature?

@amaanq
Copy link
Member

amaanq commented Aug 28, 2023

Neither, feature isn't the right word imo, more like intended behavior

@amaanq amaanq closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
@ahlinc ahlinc removed the bug label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@starwing @ahlinc @amaanq and others