-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide callbacks for node construction, skipping ts_node APIs #1
Comments
philipturnbull
added a commit
to philipturnbull/tree-sitter
that referenced
this issue
Jun 9, 2017
Some ParseActions have a state-id of -1 which can cause an out-of-bounds read when removing duplicate parse states. This was found by AddressSanitizer: ==90699==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6320000187f8 at pc 0x0001071220a9 bp 0x7fff595fd440 sp 0x7fff595fd438 READ of size 8 at 0x6320000187f8 thread T0 #0 0x1071220a8 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const build_parse_table.cc:398 tree-sitter#1 0x107121fa5 in void std::__1::__invoke_void_return_wrapper<void>::__call<tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&, unsigned long*>(tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)&&&, unsigned long*&&) __functional_base:416 ... 0x6320000187f8 is located 8 bytes to the left of 88264-byte region [0x632000018800,0x63200002e0c8) allocated by thread T0 here: #0 0x107b1576b in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x6076b) tree-sitter#1 0x10711da2c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::allocate(unsigned long) new:169 tree-sitter#2 0x10711d8fb in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1074 tree-sitter#3 0x107112f5c in std::__1::vector<unsigned long, std::__1::allocator<unsigned long> >::vector(unsigned long) vector:1068 tree-sitter#4 0x1070af381 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states() build_parse_table.cc:378 tree-sitter#5 0x10709d827 in tree_sitter::build_tables::ParseTableBuilder::build() build_parse_table.cc:85 ... SUMMARY: AddressSanitizer: heap-buffer-overflow build_parse_table.cc:398 in tree_sitter::build_tables::ParseTableBuilder::remove_duplicate_parse_states()::'lambda0'(unsigned long*)::operator()(unsigned long*) const Shadow bytes around the buggy address: 0x1c64000030a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x1c64000030e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =>0x1c64000030f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa] 0x1c6400003100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c6400003140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
philipturnbull
added a commit
that referenced
this issue
Jul 18, 2017
SpyInput uses a fixed-size buffer and explicitly zeros memory which is good for catching logic errors but defeats valgrind's memory tracking. Use a separate buffer of exactly the correct size for each request. This correctly catches the problem under valgrind: ``` ==8694== Invalid read of size 2 ==8694== at 0x54EFFB: utf16_iterate (utf16.c:10) ==8694== by 0x551126: ts_lexer__get_lookahead (lexer.c:54) ==8694== by 0x5515CD: ts_lexer_start (lexer.c:154) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ==8694== by 0x40F8C5: std::function<void ()>::operator()() const (functional:2267) ==8694== Address 0x5d08be0 is 0 bytes inside a block of size 1 alloc'd ==8694== at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8694== by 0x507C3E: SpyInput::read(void*, unsigned int*) (spy_input.cc:66) ==8694== by 0x55103D: ts_lexer__get_chunk (lexer.c:29) ==8694== by 0x5515B6: ts_lexer_start (lexer.c:152) ==8694== by 0x54699F: parser(long,...)(long long) (parser.c:297) ==8694== by 0x54788A: parser__get_lookahead (parser.c:439) ==8694== by 0x54B2D3: parser__advance (parser.c:1150) ==8694== by 0x54C2AA: parser_parse (parser.c:1348) ==8694== by 0x53F063: ts_document_parse_with_options (document.c:136) ==8694== by 0x53EF43: ts_document_parse (document.c:107) ==8694== by 0x4AED11: {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}::operator()() const (document_test.cc:82) ==8694== by 0x4B56B6: std::_Function_handler<void (), {lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#4}::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) (functional:1871) ```
The library is pretty tightly coupled to its syntax tree representation, so I don't think I see us going this route at this point. RIP issue 1️⃣ ! |
Gediminas19
pushed a commit
to Gediminas19/tree-sitter
that referenced
this issue
Jun 8, 2022
Mayhem Integration
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It would be convenient for some purposes to be able to provide one or more callbacks which construct a parse tree, rather than waiting for
TSNode
s to be constructed and then mapping them into some other parse tree.This would allow more immediate results, plus lower resource consumption, at the cost of losing the editing features &c of the
ts_node_*
APIs.The text was updated successfully, but these errors were encountered: