Application of the generated parser as a compiler's parser #831
-
Sorry, this isn't an issue. Rather, it's a question, but I did not know where else to ask this. I've been prototyping my language's grammar with tree-sitter. This is not only because tree-sitter grammars are easy to write, but it is pretty useful to figure out if there's ambiguity in the syntax through conflicts. tree-sitter emphasizes "incremental parsing". As I'm not experienced with parsers/lexers in general, I'd like to know your thoughts in using the generated parser as the actual parser in the compiler front-end for this language I've been working on. I've noticed other language projects generally rely on hand-written parsers which, in my naivete, would be more precise in reporting errors at specific places in the code... Right? I'm sure there are some other trade-offs, but I'm not knowledgeable enough about this. Particularly, this language I've mentioned is supposed to be transpiled to JavaScript. Definitely nothing like e.g. Rust which has complicated lifetime semantics scattered throughout the code - in my case it'd be fine to simply abort the compilation on the first syntax error, because I'll not be doing semantic analysis in the code itself. In general, how do you feel about trying to use the generated parser as a compiler component in an actual project? Are there some limitations you envision by taking that approach? Or, taking a step back, is the approach proper for doing this kind of work or would I be missing out on some valuable context/information that I'd have if I were to build the AST with a hand-written parser? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 10 replies
-
I converted this to a discussion thread since we're trying to move toward using Discussions for these types of conversations. The advantage is that they never need to be closed, and they're not mixed in with the more traditional "issues" that each capture an actionable item. |
Beta Was this translation helpful? Give feedback.
-
I think the biggest downside to using a Tree-sitter parser in a compiler front-end is that, while we've done a lot of work on Tree-sitter's error recovery, we haven't yet built out functionality for error messages. So it isn't trivial to find out the exact token/position where the error initiated, and get a list of expected tokens, and things like that. Also, the error recovery currently isn't customizable in domain-specific ways (e.g. as soon as the word "function" appears, assume that the user meant to write an entire function definition). Down the road, I would love to invest in both of these things, but because there's so much other stuff we're working on, it may be a while before this happens. |
Beta Was this translation helpful? Give feedback.
-
Max Brunsfeld <notifications@github.com> writes:
I converted this to a discussion thread since we're trying to move
toward using Discussions for these types of conversations. The
advantage is that they never need to be closed, and they're not mixed
in with the more traditional "issues" that each capture an actionable
item.
I don't see how to create a new "Discussion" item. On the tree-sitter github
home page https://github.com/ubolonton/emacs-tree-sitter, the word
"discussion" does not occur, but there is a tab for Issues.
Perhaps the header of the Issues page (and the README on the home page?)
could mention how to create a Discussion instead?
…--
-- Stephe
|
Beta Was this translation helpful? Give feedback.
-
Even though this question has been answered, I wanted to ask a more specific question since I came here with the My question is, is there an API within tree-sitter that actually returns the content of the node (not just the ranges)? It looks like the answer is 'no', as I did spend some time in the code, and also looked at the neovim lua integration at https://github.com/nvim-treesitter/nvim-treesitter/blob/master/lua/nvim-treesitter/ts_utils.lua which provides a utility to fetch a content from the originating text given the ranges. |
Beta Was this translation helpful? Give feedback.
-
I'm curious, @maxbrunsfeld, do you still believe that this is the case? Or is tree-sitter now a good candidate for a compiler front end? |
Beta Was this translation helpful? Give feedback.
I think the biggest downside to using a Tree-sitter parser in a compiler front-end is that, while we've done a lot of work on Tree-sitter's error recovery, we haven't yet built out functionality for error messages. So it isn't trivial to find out the exact token/position where the error initiated, and get a list of expected tokens, and things like that.
Also, the error recovery currently isn't customizable in domain-specific ways (e.g. as soon as the word "function" appears, assume that the user meant to write an entire function definition).
Down the road, I would love to invest in both of these things, but because there's so much other stuff we're working on, it may be a while before thi…