Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%w/%W/%i/%I children should be strings/symbols #6

Closed
robrix opened this issue Feb 21, 2016 · 3 comments
Closed

%w/%W/%i/%I children should be strings/symbols #6

robrix opened this issue Feb 21, 2016 · 3 comments
Labels

Comments

@robrix
Copy link
Member

robrix commented Feb 21, 2016

Ruby’s %w/%W, and %i/%I syntaxes are array literals containing strings and symbols respectively. However, the parse tree won’t contain nodes with the appropriate names, even with the changes described in tree-sitter/tree-sitter#29.

I would like this source:

%w(hello world)

to end up with this parse tree:

(program (expression.literal.array (expression.literal.string) (expression.literal.string)))

However, note that the elements of the array would not be parsed via the string production; they’d be parsed via the array and named string sort of by fiat.

@maxbrunsfeld: Is there any way to assign names like that? API like this might do it:

rules: {
  array: choice(
    seq('[', commaSep($.expression), ']'),
    seq('%w(', sep({ string: /[^\s]+/ }, /\s+/), ')'),
    
  ),}

Motivated by #4.

@robrix robrix mentioned this issue Feb 21, 2016
61 tasks
@maxbrunsfeld
Copy link
Contributor

I think that having some way of naming pieces of a rule would be cool. I think in the meantime though, you could just do something like:

array: $ => choice(
  // ...
  seq('%w(', sep($.bare_string, /\s+/), ')')
),

bare_string: $ => (/[^\s]+/),

So that you get this parse tree:

(program (expression.literal.array (bare_string) (bare_string)))

That way, at least you get a node for each string in the array. They're semantically strings, but syntactically a different construct, so I think it is at least somewhat reasonable for them to appear differently in the parse tree.

@maxbrunsfeld
Copy link
Contributor

I was just testing with ripper, to see how ruby's own parser treats this.

> Ripper.sexp "['one']"
=> [:program, [[:array, [[:string_literal, [:string_content, [:@tstring_content, "one", [1, 2]]]]]]]]

> Ripper.sexp "%w(one)"
=> [:program, [[:array, [[:@tstring_content, "one", [1, 3]]]]]]

It represents the %w strings as 'string content' nodes 😕. Makes sense, I guess.

@robrix
Copy link
Member Author

robrix commented Feb 22, 2016

Oh, interesting find—thanks for looking into that!

philipturnbull added a commit that referenced this issue Jun 30, 2017
isalnum is only defined if the input is in the ASCII range. Nothing bad happens
on mac afaict but will cause an out-of-bounds read on Linux:

==11146== ERROR: libFuzzer: deadly signal
    #0 0x649030 in fuzzer::Fuzzer::CrashCallback() /src/libfuzzer/FuzzerLoop.cpp:195:5
    #1 0x648fd9 in fuzzer::Fuzzer::StaticCrashSignalCallback() /src/libfuzzer/FuzzerLoop.cpp:179:6
    #2 0x7f118dd6638f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1138f)
    #3 0x7f118d396d0d in isalnum (/lib/x86_64-linux-gnu/libc.so.6+0x2dd0d)
    #4 0x42d44a in (anonymous namespace)::Scanner::scan_symbol_identifier(TSLexer*) /src/tree-sitter/tree-sitter-ruby/src/scanner.cc:250:12
    #5 0x42be29 in (anonymous namespace)::Scanner::scan(TSLexer*, bool const*) /src/tree-sitter/tree-sitter-ruby/src/scanner.cc:727:18
    #6 0x62eb83 in parser__lex /src/tree-sitter/src/runtime/parser.c:271:30
    #7 0x62a05c in parser__advance /src/tree-sitter/src/runtime/parser.c:1097:21
    #8 0x629789 in parser_parse /src/tree-sitter/src/runtime/parser.c:1298:9
    #9 0x62617d in ts_document_parse_with_options /src/tree-sitter/src/runtime/document.c:136:16
    #10 0x42b053 in LLVMFuzzerTestOneInput /src/tree-sitter/../fuzzer.cc:21:3
    #11 0x64a164 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:460:13
    #12 0x64a4be in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:399:3
    #13 0x63c890 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:268:6
    #14 0x64090b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:683:9
    #15 0x63c58c in main /src/libfuzzer/FuzzerMain.cpp:20:10
    #16 0x7f118d38982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #17 0x405928 in _start (/out/ruby_fuzzer+0x405928)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants