tree-sitter get wrong point when rules contains '\s' #2558

starwing · 2023-08-24T04:14:40Z

With this test grammar:

module.exports = grammar({
  name: 'test',

  rules: {
    // TODO: add the actual grammar rules
    source_file: $ => $.program,

    program: ($) =>
      prec(16,
        repeat1(choice(
          $._statement,
          $._document,
        ))
      ),

    _statement: ($) =>
      prec.right(15,
        seq($.identifier)
      ),
   
      identifier: ($) => /[a-zA-Z_][a-zA-Z0-9_]*/,
      _document: ($) => /foo/,  // <-- notice this
  }
});

and the test input:

a
b
c

tree-sitter produces this result (correct):

$ tree-sitter parse test
(source_file [0, 0] - [2, 1]
  (program [0, 0] - [2, 1]
    (identifier [0, 0] - [0, 1])
    (identifier [1, 0] - [1, 1])
    (identifier [2, 0] - [2, 1])))

but if I changed document to this:

      _document: ($) => /\sfoo/, // <-- notice this

this result changed to this:

$ tree-sitter parse test
(source_file [0, 0] - [2, 1]
  (program [0, 0] - [2, 1]
    (identifier [0, 0] - [0, 1])
    (identifier [0, 1] - [1, 1])
    (identifier [1, 1] - [2, 1])))

even there is not any "foo" in test input. the "newline" before identifiers is included into identifier.

The text was updated successfully, but these errors were encountered:

amaanq · 2023-08-24T04:17:38Z

~~newlines are an extra by default (always skipped if possible wherever), if you don't want that then specify your extras property as an empty array~~

I misunderstood your question, this is interesting

I believe it has to do w/ the regex starting to process \s, then failing and trying to recover. See the output of the good and bad tree:

Good:

Bad:

Regexes are always terminal, and I'd like to say the usage of \s here causes the parser to try and not treat ws in possible identifier/document scenarios as an extra

starwing · 2023-08-24T06:07:56Z

@amaanq Thanks for the explain, so it is a bug or just a feature?

amaanq · 2023-08-28T19:15:57Z

Neither, feature isn't the right word imo, more like intended behavior

starwing mentioned this issue Aug 24, 2023

fix incorrect line numbers in start_point tjdevries/tree-sitter-lua#51

Open

ahlinc added bug c-library parser-generation parser Related to parsing labels Aug 24, 2023

amaanq closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023

ahlinc removed the bug label Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tree-sitter get wrong point when rules contains '\s' #2558

tree-sitter get wrong point when rules contains '\s' #2558

starwing commented Aug 24, 2023

amaanq commented Aug 24, 2023 •

edited

starwing commented Aug 24, 2023

amaanq commented Aug 28, 2023

tree-sitter get wrong point when rules contains '\s' #2558

tree-sitter get wrong point when rules contains '\s' #2558

Comments

starwing commented Aug 24, 2023

amaanq commented Aug 24, 2023 • edited

starwing commented Aug 24, 2023

amaanq commented Aug 28, 2023

amaanq commented Aug 24, 2023 •

edited