-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite the grammar #29
Conversation
As I understand tree-sitter's behavior there is no such option for manual controlling the parser's state. The parser makes this automatically and there only one option to correctly save the external scanner state through serialization and restore through deserialization. In the deserialization moment the external scanner gets the buffer with a state to which the parser decided to make the restore. |
hm, bummer. I looked at how the C grammar handles |
I naively imagine functionality in |
@tek regarding this PR contents: $ git fetch origin refs/pull/29/head
...................................................................
* branch refs/pull/29/head -> FETCH_HEAD
$ git log --stat FETCH_HEAD
commit 5b7947c562b9b2b5d844c04351eacc331a6bbb86
structure grammar into multiple files
grammar.js | 1313 +-
grammar/basic.js | 98 +
grammar/class.js | 99 +
grammar/data.js | 147 +
grammar/decl.js | 81 +
grammar/exp.js | 241 +
grammar/id.js | 126 +
grammar/import.js | 30 +
grammar/misc.js | 12 +
grammar/module.js | 37 +
grammar/pat.js | 95 +
grammar/pattern.js | 32 +
grammar/type.js | 201 +
grammar/util.js | 66 +
src/grammar.json | 7391 +++--
src/node-types.json | 17 +-
src/parser.c | 298845 +++++++++++++++++++---------------------------------
17 files changed, 154611 insertions(+), 154220 deletions(-)
commit 62d7ebf592a9491d4d44273fceb1390b85d148b1 (refs/pull/29/head)
rewrite the grammar
...................................................................
63 files changed, 296573 insertions(+), 484667 deletions(-) With such amount of changes it's too hard to understand what is an adaptation of the repo to the current tree-sitter version and what is improvements of the grammar. Also splitting the grammar.js to several files in my opinion is controversial change.
|
@ahlinc it's a complete rewrite, I wouldn't know how to split this according to your suggestion.
Why is it controversial to split the files? |
@tek I'm not Haskell expert so if you can give here a minimal correct version of a Haskell program with such preprocessor construction, may be I'll been able to help you with a way how to handle your preprocessor issue. |
{-# language CPP #-}
f :: IO Int
f = do
_ <- do
pure ()
#ifdef foo
pure 1
#else
pure ()
pure 1
#endif
@ahlinc this should compile! |
Didn't know that, it's hard to understand in the huge PR.
It's just my opinion, I didn't see such splits in any tree-sitter grammar repos and it's easier for me to navigate in a one file by a word search instead of several files. Also it seems that |
@tek thanks for the program snippet, I'll take a look today or this weekend. |
I just now split the files after working on the project for a while, because it was quite unergonomic to navigate with your suggested word search, since lots of rules have strings like Why do you suspect this will be problematic with IDEs? What about Typescript? Are you suggesting to use it? |
awesome, thanks! |
At least |
For now I use VSCode as the main editor and has good completion in
I think about it as a way to solve my issues with navigation in the |
Is this a general problem or does it relate to this project, specifically the file separation? In the former case, I can't imagine how an IDE would be able to navigate to rules since they are universally referenced as attributes of the
So you would give the |
What do I need to configure for "tree-sitter": {
"scope": "source.hs",
"file-types": [
"hs"
],
"highlights": [
"queries/highlights.scm"
],
"injection-regex": "^(hs|haskell)$"
} but it says |
It might have to do with finding the parser repo itself? https://tree-sitter.github.io/tree-sitter/syntax-highlighting#paths This UX could probably be made smoother. |
@maxbrunsfeld I have "parser-directories": [
"/home/tek/code/tek/js"
], in my config, and the repo path is |
About processing preprocessor directivesI was able to break mentioned https://github.com/tree-sitter/tree-sitter-c C language parser with a simple C program where a directive splits function call rule. @tek this looks the same type of problem that you are trying to solve in this Haskell parser. #include <stdio.h>
#define FOO
int main(int argc, char const *argv[])
{
#ifdef FOO
printf
#endif
("Hello world!\n");
return 0;
} With current tree-sitter abilities it looks for me impossible to implement correct tree-sitter parser in a one tree-sitter's language defined by a one What I can suggest with current tree-sitter abilities is to:
All above is just my opinion and understanding how tree-sitter works and what limitations it has for now. |
I had the same impression about the C grammar. Thanks for your suggestion with the multi-language feature, I'll try to implement something with that. Howevery, I'd suggest to not make this part of this PR. @maxbrunsfeld do you think it would be feasible to add functionality to the scanner API (or grammar) with which a state shift could be transparently enforced in order to reset to a previous preprocessor directive? Any other additional information that might help? |
I think we’re always going to have to treat the preprocessor in an approximate way. It’s ok IMO if certain preprocessor patterns cause parse errors. Hopefully the errors will be recoverable and we’ll get useful results the vast majority of the time. |
@maxbrunsfeld so if the parser would execute a |
those named precedences are quite the game changer! |
question: what is the general policy when one rule can be substituted for another, but is more restrictive? |
@maxbrunsfeld I added that when I was dealing with the preprocessor directives. Since the indentations would have to be reset on an |
@maxbrunsfeld @patrickt invisible tokens are inlined, double vector is removed, I renamed lots of user-facing nodes and all tests green. if you're satisfied, please merge! |
🚀 |
Just FYI, I've been doing squash merges on these grammar repos lately, since they contain generated files, to avoid the repo size growing too fast. Thanks for the awesome work @tek! |
makes sense. it's been a pleasure! |
Huge thanks, @tek! This is a real step forward for the Haskell ecosystem at large, since this is (I think) the only working GHC Haskell parser outside of GHC itself! |
omg 😂 |
@maxbrunsfeld @patrickt Am I supposed to be committing further changes to master? |
thanks! |
Thank you so much @tek for the amazing rewrite! This is huge for the Haskell community, wonderful work 👏 ❤️ |
@rewinfrey very kind, thank you! ❤️ |
@tek Re. master: it’s your call. No one’s consuming this repository as of yet, so I don’t see any huge problem with pushing small fixes directly to master. Bigger features etc. are nice to have as PRs. |
@patrickt sure thing, I was mainly asking whether I'm permitted! |
Yup! I’ve given you |
will do, thanks! |
Hello everyone, I have just found this thread after unsuccessfully trying to use the https://www.npmjs.com/package/tree-sitter-haskell package, which had its latest publish 5 years ago. Would it be possible to publish a new version containing the changes in this PR? Or would this involve additional work that is specific to the npm package? |
@maxbrunsfeld you wanna add my account to that package's maintainers? |
@tek thanks! For anyone landing here: There is also a prebuilt wasm file in this repo. It can be re-built via FYI I built this visualizer: https://felixroos.github.io/haskell-tree-sitter-playground/ |
very nice! |
@felixroos So as per the linked issue above, can we somehow npm install this package these days? Thanks! |
hello 👋 I rewrote the grammar and it's working quite nicely.
There's still some stuff to do, but at this point I'm opening this PR to get some advice on one specific problem that seems impossible to me.
The issue is preprocessor macros, as in:
Here the block inside of the
#ifdef
ends the current rule, but in the#else
it starts inside of that rule again.I can keep track of the previous state in the scanner, but I don't know how to deal with that for the grammar.
Is there some way to reset the parser to a previous state? I looked at the C API but didn't find anything suitable.