Comment node includes trailing `\r` #36

AndreasArvidsson · 2023-08-10T10:53:38Z

When using CRLF line endings comments will include a trailing \r

# hello
foo: "bar"

node.text: # hello\r

The text was updated successfully, but these errors were encountered:

wenkokke · 2023-08-10T15:01:39Z

This is another scanner bug, unfortunately.

AndreasArvidsson · 2023-08-11T14:19:56Z

Could you elaborate on that? I'm not quite sure what the scanner means in this context.

This is not the first bug where we've had leading or trailing whitespaces on a node. Would it be worth doing a unit test that checks for leading and/or trailing whitespaces?

wenkokke · 2023-08-11T18:00:46Z

Scanner means the code that does the lexing; see scanner.cc. It's a bunch of C++ code that implements a custom lexer for TalonScript, and it's where you need to handle any features of the language that are tricky to express as grammars—e.g., indentation sensitivity or lookahead.

wenkokke · 2023-08-11T18:01:15Z

Could you elaborate on that? I'm not quite sure what the scanner means in this context.

I'd be happy to accept a PR with such tests?

pokey · 2023-08-12T15:35:57Z

Can you not just tweak the comment regex?

tree-sitter-talon/grammar.js

Line 44 in fd20268

comment: ($) => token(/#.*?/),

wenkokke · 2023-08-12T16:15:01Z

I'm not sure what purpose that serves, because afaik comment tokens are lexed by the scanner. I guess you could try replacing . by [^\r\n]?

pokey · 2023-08-12T19:42:28Z

Yeah I was thinking something like that

pokey · 2023-11-07T19:13:14Z

so is this fixed by #42 ?

wolfmanstout · 2023-11-19T00:13:31Z

@pokey I'm not sure ... @AndreasArvidsson can you retest this?

I considered adding a unit test for this but it's not easy to capture using the built-in tree-sitter testing system, which doesn't include tests for node contents. I think we'd need to set up a separate unit test, e.g. using the Node.js API -- I'm sure this is easy but I'm just not very familiar with Node.js so it wasn't trivial for me.

AndreasArvidsson · 2023-11-19T00:30:31Z

@wolfmanstout The problem is still there, but slightly changed. node.text is now "# hello\r\n"

Should definitely be doable with node

wolfmanstout · 2023-11-19T02:06:58Z

FWIW @wenkokke suggestion above would probably work. Despite the fact that comments are declared as an external they are still parsed by that regex. FWIW this is following the Python implementation pattern. I guess there is some subtle difference, assuming Python doesn't have the same behavior.

wolfmanstout · 2023-11-24T22:43:46Z

Okay, I have a draft of a fix out:
#45

Before this is merged, I want to point out that the Python tree-sitter grammar has the exact same behavior (I tested it). Should Cursorless just be robust to this instead?

wenkokke · 2023-11-25T01:27:05Z

Before this is merged, I want to point out that the Python tree-sitter grammar has the exact same behavior (I tested it). Should Cursorless just be robust to this instead?

I based the scanner and grammar on the Python grammar, so they might actually welcome your changes there as well.

Might be less "this behavior is endemic" and more "that's where I copied it from".

That said, probably makes sense for Cursorless to be robust to this.

pokey mentioned this issue Oct 16, 2023

Fix CRLF issues by moving parsing from scanner.cc to grammar.js #42

Merged

wolfmanstout mentioned this issue Nov 26, 2023

Exclude carriage return from comment nodes. #45

Merged

wenkokke closed this as completed in #45 Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment node includes trailing `\r` #36

Comment node includes trailing `\r` #36

AndreasArvidsson commented Aug 10, 2023

wenkokke commented Aug 10, 2023

AndreasArvidsson commented Aug 11, 2023

wenkokke commented Aug 11, 2023

wenkokke commented Aug 11, 2023

pokey commented Aug 12, 2023

wenkokke commented Aug 12, 2023

pokey commented Aug 12, 2023

pokey commented Nov 7, 2023

wolfmanstout commented Nov 19, 2023

AndreasArvidsson commented Nov 19, 2023

wolfmanstout commented Nov 19, 2023

wolfmanstout commented Nov 24, 2023

wenkokke commented Nov 25, 2023 •

edited

Loading

Comment node includes trailing \r #36

Comment node includes trailing \r #36

Comments

AndreasArvidsson commented Aug 10, 2023

wenkokke commented Aug 10, 2023

AndreasArvidsson commented Aug 11, 2023

wenkokke commented Aug 11, 2023

wenkokke commented Aug 11, 2023

pokey commented Aug 12, 2023

wenkokke commented Aug 12, 2023

pokey commented Aug 12, 2023

pokey commented Nov 7, 2023

wolfmanstout commented Nov 19, 2023

AndreasArvidsson commented Nov 19, 2023

wolfmanstout commented Nov 19, 2023

wolfmanstout commented Nov 24, 2023

wenkokke commented Nov 25, 2023 • edited Loading

Comment node includes trailing `\r` #36

Comment node includes trailing `\r` #36

wenkokke commented Nov 25, 2023 •

edited

Loading