-
-
Notifications
You must be signed in to change notification settings - Fork 16.1k
tree-sitter-grammars: Enable grammar testing #391520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
tree-sitter-grammars: Enable grammar testing #391520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised the emacs plug-in doesn't seem to do the same as nvim-treesitter
(separately locked grammars, specific to the plug-in).
Should we worry about reporting a grammar as broken "just" because some its tests aren't passing, if it builds and actually parses its inputs? I'm kind of surprised at the number of grammars which turned up broken = true
here.
@@ -76,6 +76,7 @@ let | |||
src = grammar.src or (fetchGrammar grammar); | |||
location = grammar.location or null; | |||
generate = grammar.generate or false; | |||
isBroken = grammar ? isBroken && grammar.isBroken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isBroken = grammar ? isBroken && grammar.isBroken; | |
isBroken = grammar ? isBroken or false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isBroken
is always false
indeed.
isBroken = grammar ? isBroken;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion raises an error. I am not sure why...
nix-repl> attr = {a=true; b=false;}
nix-repl> (attr ? c) or false
error: undefined variable 'or'
at «string»:1:1:
1| (attr ? c) or false
| ^
nix-repl> attr.a or false
true
Maybe instead: broken = grammar.broken or false
.
@@ -58,4 +65,9 @@ stdenv.mkDerivation ({ | |||
fi | |||
runHook postInstall | |||
''; | |||
|
|||
meta = { | |||
broken = isBroken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well rename it to broken
and inherit
it here IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isBroken
is always false (since it is generated by the update script), the name indicates that is flagged as broken.
I don't mind changing it to broken
.
The reason behind this choice is to avoid updating all the JSON files without a new and unnecessary attribute.
...cations/editors/emacs/elisp-packages/manual-packages/tree-sitter-langs/default-grammars.json
Show resolved
Hide resolved
...plications/editors/emacs/elisp-packages/manual-packages/tree-sitter-langs/update-defaults.py
Outdated
Show resolved
Hide resolved
...plications/editors/emacs/elisp-packages/manual-packages/tree-sitter-langs/update-defaults.py
Show resolved
Hide resolved
|
Honestly, I can't say. That's why it is important to have feedback. What I can say (working on this PR for almost a year now), is that the recent bump from tree-sitter 0.24 to 0.25 has increased the number of broken grammar. The lack of testing is probably at cause here. |
Very much appreciate the effort to add some testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine aside from what had already been said by ambroisie.
if not check_grammar(data): | ||
# Test the grammar with Tree-sitter and set the isBroken flag accordingly | ||
data['broken'] = not check_grammar(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just use a variable instead of having to run the check twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, good catch. Thank you.
a81917f
to
1439f9f
Compare
We should keep the commit history clean- can we squash the fixups? |
I split everything into atomics commit for the review. Indeed, the goal is to merge, but should I do it know? Please be mindful of my time, git manipulations like that take time and can be error-prone. |
Linked issues (three grammars on which neovim depends fail to build):
|
I manually removed the failing tests:
|
|
I would rather not break entire neovim ecosystem right before a release.
Is it possible to hide these tests somewhere internally? I mean, grammars are not to be packaged, and it is good to find regressions, but it is very bad to stop the updates because of some minor issue (grammars are not that important, but if we notice a huge spike of failures, we should not ship it to users). We could run it as a part of vimPlugins update, but without blocking the whole updates. Just diff the failures and resolve them separately |
@@ -507,6 +510,7 @@ let | |||
curl = "${curl}/bin/curl"; | |||
nix-prefetch-git = "${nix-prefetch-git}/bin/nix-prefetch-git"; | |||
printf = "${coreutils}/bin/printf"; | |||
tree-sitter = "${lib.getExe tree-sitter}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the quotes
The absence of versioning in the treesitter ecosystem is maddening. Running some tests help maintainers. I wont merge this since I can't maintain my tree-sitter ecosystem for more than a week so I dont feel qualified but merging this now, at the beginning of a nixos cycle is better than closer to 25.11. |
4b2ba95
to
727a4e2
Compare
I don't want to block this, just want some clarity - what will be the effects of this PR on consumers? I went through and made a list of the broken grammars after this:
Is the intention to fix some of the more high profile ones in staging, or just push this out quickly to unstable, and hope that upstream issues are resolved? I worry that some of the more high-profile grammars being marked as broken based on failing tests (especially if they seem to be working fine in practice) will leave their consumers to come into this PR to yell at us for merging this. Not against this PR, to be clear! I just want to mention this kind of thing before we get an angry mob mentioning it after-the-fact. |
I proposed to @adfaure to make the tests optional and provide a visible way how to run them regardless. Or maybe turn tests on by default but make it possible to ignore them, something like that. |
Hi, I agree with your concerns. As @fricklerhandwerk stated, I will try to find a way to add tests without everything's falling apart. I also want to mention, that when we opened this PR, tree-sitter 25 was just released. My first step tomorrow will be to update the grammars and check if there is any improvement. |
Lists in Nix allows for any element to have a different type, correct? My idea is to use an attribute value with key |
for g in grammars: | ||
exists = check_grammar_exists(nixpkgs, g) | ||
if exists: | ||
existing.append(fmt_grammar(g)) | ||
broken = check_grammar_broken(nixpkgs, g) | ||
if not broken: | ||
existing.append(fmt_grammar(g)) | ||
else: | ||
sys.stderr.write("Grammar is broken: " + fmt_grammar(g) + "\n") | ||
sys.stderr.flush() | ||
else: | ||
sys.stderr.write("Missing grammar: " + fmt_grammar(g) + "\n") | ||
sys.stderr.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose a rewrite:
for g in grammars:
grammar = fmt_grammar(g)
if not check_grammar_exists(nixpkgs, g):
print(f"Missing grammar: {grammar}", file=sys.stderr)
continue
if check_grammar_broken(nixpkgs, g):
print(f"Broken grammar: {grammar}", file=sys.stderr)
continue
existing.append(grammar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks.
This script is only for Emacs plugins, so I think that would add the grammars tagged as broken back into the emacs ecosystem.
Which is something that I might have forgotten to handle without your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to fix some of the more high profile ones in staging, or just push this out quickly to unstable, and hope that upstream issues are resolved?
We can't break a build even for a single grammar, because many use nvim-treesitter.withAllGrammars
. Even if we mark some grammars as broken and exclude it from withAllGrammars
, we will start to see bug reports like "grammar X is not found", which basically breaks workflow for the end user as it is an extremely annoying error (you have to press enter on each cursor move)
@PerchunPak completely agree not breaking things, even temporarilly, is important (in general). It's worth noting that From what I've seen working on #408414 the current state of grammar sources is not... great. That PR is ready for review |
I just wanted to stress that breaking even a single grammar is unacceptable
Nope, |
Hi, I understand the problem. First, I want to mention, is that this PR has been released shortly after a new major version of tree sitter (24 to 25), leading to many broken grammars. Currently, I am trying to update grammars to see if there is any improvement (I believe there will). However, I am unable to build the updating scripts since it takes forever to build on my laptop The second changes proposed by @fricklerhandwerk (as an answer to #391520 (comment)) is instead of having a @kimburgess your help will be greatly appreciated indeed if your PR is merged before, thanks ! |
Description of changes
Context
While working on #320783 with @fricklerhandwerk. We used
tree-sitter-grammars
to build Python bindings, we explored ways to test these grammars. We found that the best approach is to use Tree-sitter's dedicated testing framework (documentation).This PR introduces a check phase using
tree-sitter test
.Changes
Activate Grammar Testing
Adds a test phase and an
isBroken
attribute.Runs the tests during the update phase and sets the
isBroken
attribute if the tests fail.Update the Emacs Script
Emacs has an automated mechanism for updating grammars. This script is updated to use the
isBroken
flag.Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.