Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on /* */ comments #64

Closed
XVilka opened this issue Apr 7, 2021 · 4 comments
Closed

Segmentation fault on /* */ comments #64

XVilka opened this issue Apr 7, 2021 · 4 comments

Comments

@XVilka
Copy link
Contributor

XVilka commented Apr 7, 2021

[i] ℤ gdb build/ts-c-cpp-parser                                                                                                                                                                                                   16:38:44 
Reading symbols from build/ts-c-cpp-parser...
(gdb) run test/ts-crash.h 
Starting program: /home/user/rizin/ts-ctypes/build/ts-c-cpp-parser test/ts-crash.h
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-4.fc33.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
root_node: translation_unit
root_node: comment

Program received signal SIGSEGV, Segmentation fault.
ts_node__subtree (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:49
49        return *(const Subtree *)self.id;
(gdb) bt
#0  ts_node__subtree (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:49
#1  0x0000000000406d32 in ts_node_type (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:427
#2  0x0000000000401457 in main (argc=2, argv=0x7fffffffd5a8) at ../c_cpp_parser.c:62
(gdb) 

The parser itself is:

#include <assert.h>
#include <string.h>
#include <stdio.h>
#include <rz_types.h>
#include <rz_list.h>
#include <rz_util/rz_file.h>
#include <tree_sitter/api.h>

// Declare the `tree_sitter_c` function, which is
// implemented by the `tree-sitter-c` library.
TSLanguage *tree_sitter_c();

// Declare the `tree_sitter_cpp` function, which is
// implemented by the `tree-sitter-cpp` library.
//TSLanguage *tree_sitter_cpp();


int main(int argc, char **argv) {
  // Build a syntax tree based on source code stored in a string.
  //const char *source_code = "typedef struct bla { int a; char **b[52]; } bla_t;";

  if (argc < 1) {
	  printf("Usage ts-c-cpp-parser <filename>\n");
	  return -1;
  }
  char *file_path = argv[1];
  if (!file_path) {
	  printf("Usage ts-c-cpp-parser <filename>\n");
	  return -1;
  }

  size_t read_bytes = 0;
  char *source_code = rz_file_slurp(file_path, &read_bytes);
  if (!source_code || !read_bytes) {
		return -1;
  }

  // Create a parser.
  TSParser *parser = ts_parser_new();
  // Set the parser's language (C in this case)
  ts_parser_set_language(parser, tree_sitter_c());


  TSTree *tree = ts_parser_parse_string(
    parser,
    NULL,
    source_code,
    strlen(source_code)
  );

  // Get the root node of the syntax tree.
  TSNode root_node = ts_tree_root_node(tree);

  // Print the syntax tree as an S-expression.
  char *string = ts_node_string(root_node);
  printf("Syntax tree: %s\n", string);

  // Free all of the heap-allocated memory.
  free(string);
  ts_tree_delete(tree);
  ts_parser_delete(parser);
  return 0;
}

The content of ts-crash.h is:

/* Some comment */

int a;
@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Apr 7, 2021

Hi! In general, when you post a reproduction example like this, it is a lot more useful if you don't rely on unknown third party libraries like <rz_util/rz_file.h>.

This is a pretty severe issue, so I spent some time this morning adapting your code so that it is actually reproducible. Here is the change that I made:

4,6d3
< #include <rz_types.h>
< #include <rz_list.h>
< #include <rz_util/rz_file.h>
25a23
> 
32,36c30,36
<   size_t read_bytes = 0;
<   char *source_code = rz_file_slurp(file_path, &read_bytes);
<   if (!source_code || !read_bytes) {
< 		return -1;
<   }
---
>   FILE *f = fopen(file_path, "rb");
>   fseek(f, 0, SEEK_END);
>   size_t read_bytes = ftell(f);
>   fseek(f, 0, SEEK_SET);
>   char *source_code = malloc(read_bytes);
>   fread(source_code, 1, read_bytes, f);
>   fclose(f);

I'm just using standard C APIs instead of custom, third-party ones. With these changes, there is no segfault. Everything works as expected:

clang -I ../tree-sitter/lib/include test.c src/parser.c ../tree-sitter/libtree-sitter.a
./a.out ts-crash.h  

Output:

Syntax tree: (translation_unit (comment) (declaration type: (primitive_type) declarator: (identifier)))

@XVilka
Copy link
Contributor Author

XVilka commented Apr 7, 2021

Oh, sorry about that. Thank you for your investigation. I will check tomorrow again and will close tomorrow if it's indeed isn't reproducible with these changes 👍

@XVilka
Copy link
Contributor Author

XVilka commented Apr 8, 2021

Hmm, I removed the third-party library (rz-util) but crash is still here.
I use tree-sitter-0.19.4 and tree-sitter-c from master. Attaching the self-contained archive with Meson/Ninja project.

ts-c-crash.zip

To reproduce just run:

meson build
ninja -C build
build/ts-c-cpp-parser test/ts-crash.h

This project uses Tree-Sitter runtime and C grammar as statically-linked Meson subprojects (see subprojects/ directory).

Full log:

[i] ℤ meson build                                                                                                                                                                                                                 13:26:49 
The Meson build system
Version: 0.55.3
Source dir: /home/user/rizin/ts-ctypes
Build dir: /home/user/rizin/ts-ctypes/build
Build type: native build
Project name: ts-c-cpp-parser
Project version: undefined
C compiler for the host machine: cc (gcc 10.2.1 "cc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)")
C linker for the host machine: cc ld.bfd 2.35-18
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python3 found: YES (/usr/bin/python3)
Dependency tree-sitter skipped: feature use_sys_tree_sitter disabled
Downloading tree-sitter source from https://github.com/tree-sitter/tree-sitter/archive/v0.19.4.tar.gz
Downloading file of unknown size.

|Executing subproject tree-sitter method meson 
|Project name: tree-sitter
|Project version: undefined
|C compiler for the host machine: cc (gcc 10.2.1 "cc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)")
|C linker for the host machine: cc ld.bfd 2.35-18
|Compiler for C supports arguments -std=gnu99: YES 
|Build targets in project: 1
|Subproject tree-sitter finished.


|Executing subproject tree-sitter-c method meson 
|Project name: tree-sitter-c
|Project version: f05e279aedde06a25801c3f2b2cc8ac17fac52ae
|Build targets in project: 2
|Subproject tree-sitter-c finished.

Build targets in project: 3

ts-c-cpp-parser undefined

  Configuration
    System tree-sitter library: NO

  Subprojects
                   tree-sitter: YES
                 tree-sitter-c: YES

Found ninja-1.10.2 at /usr/bin/ninja
                                                                                                                                                                                                                                            
ts-ctypes on  master [✘!] 
[i] ℤ ninja -C build                                                                                                                                                                                                              13:28:04 
ninja: Entering directory `build'
[2/6] Compiling C object subprojects/tree-sitter-c/libtree-sitter-c.a.p/src_parser.c.o
In file included from ../subprojects/tree-sitter-c/src/parser.c:1:
../subprojects/tree-sitter-c/src/parser.c: In function ‘ts_lex_keywords’:
../subprojects/tree-sitter-c/src/tree_sitter/parser.h:135:8: warning: variable ‘eof’ set but not used [-Wunused-but-set-variable]
  135 |   bool eof = false;             \
      |        ^~~
../subprojects/tree-sitter-c/src/parser.c:4123:3: note: in expansion of macro ‘START_LEXER’
 4123 |   START_LEXER();
      |   ^~~~~~~~~~~
[6/6] Linking target ts-c-cpp-parser
ts-ctypes on  master [✘!] 
[i] ℤ gdb build/ts-c-cpp-parser                                                                                                                                                                                                   13:28:18 
Reading symbols from build/ts-c-cpp-parser...
(gdb) run test/ts-crash.h 
Starting program: /home/akochkov/rizin/ts-ctypes/build/ts-c-cpp-parser test/ts-crash.h
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-4.fc33.x86_64
Read 30 bytes from test/ts-crash.h file
root_node: translation_unit
root_node: comment

Program received signal SIGSEGV, Segmentation fault.
ts_node__subtree (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:49
49        return *(const Subtree *)self.id;
(gdb) bt
#0  ts_node__subtree (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:49
#1  0x0000000000406dcf in ts_node_type (self=...) at ../subprojects/tree-sitter-0.19.4/lib/src/./node.c:427
#2  0x00000000004014f4 in main (argc=2, argv=0x7fffffffd5a8) at ../c_cpp_parser.c:52

@XVilka
Copy link
Contributor Author

XVilka commented Apr 8, 2021

Nevermind, it's my lack of attention - it was in this line

printf("node: %s\n", ts_node_type(struct_node));

which didn't exist in the AST obviously. Sorry for the trouble, indeed not a bug but my fault.

@XVilka XVilka closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants