-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(master): release 1.0.0 #19
chore(master): release 1.0.0 #19
Conversation
db918c7
to
fcd00ba
Compare
fcd00ba
to
2f69d76
Compare
Testing on this as it's a small grammar (after testing locally on my own grammars for a while) Things to do for each grammar:
Here's to having proper release cycles with tree-sitter grammars! |
2f69d76
to
04e4f2c
Compare
馃 Release is at https://github.com/tree-sitter/tree-sitter-regex/releases/tag/v1.0.0 馃尰 |
That being said @maxbrunsfeld, I think there isn't an NPM_TOKEN secret setup in the org, and only a CARGO_REGISTRY_TOKEN If you could set up the npm token, that'd be great! |
@amaanq are you sure it should be May be it would be better to discuss versions policy first? All years before and up until now grammars followed principle that they don't have higher version that the second part in the main repo what signals that they should be renewed when something important happened in the main repo like changes in the language ABI version. |
That would be really confusing for downstream projects, and doesn't make sense for the vast majority of languages that don't have versions. From the point of view of downstream, the most important information about a parser is whether an update requires query changes (since the vast majority of queries are maintained downstream). This is the point of using semver, which tells you at a glance whether an update is safe to pull in or would lead to breaking changes. Whether you start that at 1.0 or not is secondary, but a) these parsers are stable enough to merit 1.0.0, and b) that bump is a clear signal to everyone that the versioning scheme has changed. On the other hand, compatibility with the tree-sitter library and CLI is already fully specified with the ABI version, so encoding this in the versioning scheme is redundant. This has become even less relevant now that tree-sitter supports multiple ABI in parallel, both for generating and consuming parsers. It's also a scheme that has never been followed outside the org, where the vast majority of parsers live (nvim-treesitter is close to the 200 mark now), some of which are already following semver. |
It seems no one really care about versioning policy in all details yet. This question wasn't raised in the TS repo and there was no official suggestions for it.
It would be good to clearly define what every semver part would mean for grammars and when every part would be increased.
It's not the rule, it happens from time to time when it's possible to keep compatibility with several older versions without increasing library logic complexity.
So there is no real difference to start form 1 or from 14 for the major part and increase it in sync with language version. For the current grammars distribution model where it's required that The situation with grammars is that when they would be distributed in a portable binary form over release artifacts (I hope this happens) it would be possible to generate grammar libraries for a broad range of language versions. As I know official VSCode + Tree-sitter integration stuck on the language versions problem because the problem of compatibility lagging in both directions between integrated library and grammars is not acceptable for them. @jasonwilliams may be knows more about current situation. The version policy for grammars may look like: The semver X - major version is in sync with the latest language version. Y - minor version changes:
Z - patch version changes on every changes that don't have an impact on a grammar's tree shape, like:
|
I already gave my suggestion in nvim-treesitter/nvim-treesitter#3944 (comment), which admittedly is driven by downstream (editor using tree-sitter parsers and queries as "runtime files") requirements. The important difference to your strategy is that anything that breaks queries is a breaking change and requires a major version bump. Only changes that add new nodes are minor. |
It's not very hard to implement an automation by tracking changes of an external scanner and what was actually changed in the grammar by processing |
I think enforcing conventional commits ( But you know better, of course. If it's possible to robustly determine this automatically from parsing |
But the main point is that you cannot at the same time track ABI compatibility and downstream (query) breaking changes with the major version. (And I think the latter is more useful, and aligns better with the intended meaning of semver -- that would mean that tree-sitter itself should have ABI version as its major version, not parsers.) |
I think it's enough to track all breaking changes for queries in the |
No it's not -- not for us. We need to distinguish between actually breaking changes (hard errors for our users) and changes that just change behavior or allow new features. Semver is very clear about the use of major version bumps: signal to downstream that an upgrade is not safe. And the chain here is also clear: tree-sitter lib -> parser -> query -> user What you describe is not semver but something else that may make more sense for you, but isn't as useful for downstream. (In addition, ABI version is available to introspection, while grammar changes are not -- so it seems suboptimal to "waste" a version field on that.) |
Let me be clear here: the lack of semantic versioning of parsers is a major pain point in maintaining tree-sitter support in Neovim (and I expect in similar projects like Helix and Emacs as well). And at the risk of being ungrateful (I just have to vent), it often feels like we're fighting against tree-sitter to make work our use case that upstream doesn't really want to support nor cares about... |
I would say that the vast majority of changes to grammars affect the shape of the syntax tree, so they could require query changes. There are occasionally changes that fix some bug without affecting the shape of any syntax trees, but that is pretty rare. Even changes to hidden rules can break queries. It depends on the query. @clason Consumers of grammars (like neovim) that maintain separate queries need to pin to a specific version of a grammar. A SHA1 should work just fine, as should a specific version tag. I agree that we should start publishing tags (with corresponding releases to crates.io) more consistently, but it doesn't really matter that much for consumers whether they pin to a git SHA1, or a tag. Consumers should never just use HEAD, and expect a random query to keep working. That makes no sense. |
Yeah neovim does pin to commits, but following semver and semantic commit messages would make it clearer when there's actually breaking changes, and easier for downstream to know when to adapt/update queries. With that being said - another big question from our chats I'll bring up that I'm sure some people are wondering is if the versions should be coupled to tree-sitter core, or if independent. If it's independent, I'd like to say starting from 1.0.0 makes sense, and would signal stability + healthy release cycles henceforth. Though, I wouldn't mind continuing from 0.20.x but it could be a bit confusing for consumers |
I don't want these grammars to be versioned 1.0 right now. We'll make that transition at some point, and we'll also bump the Tree-sitter library and CLIs to 1.0. But we're not quite there yet. |
@maxbrunsfeld I'm sorry, but that is not the reality for us. (Recall that we deal with 200 grammars rather than just the ~20 maintained by the tree-sitter org!) In practice, it's about an equal chance whether a commit is a Bugfix without capture changes, an update that adds new captures, or a breaking change. We can and do of course catch breaking changes in our CI before we bump, but apart from that we're left guessing. Semver would be a _huge_help, since we could both tune automation and immediately know where manual intervention is required. |
(whether that starts at 1.0 or 0.0.1-pre is of no importance to me.) As amaanq is saying, for us the big question is whether and why individual parsers need to follow tree-sitter core's versions. I understand why the latter is not 1.0 (you made that clear in the tracking issue), but not why the parsers shouldn't be. |
But a bugfix for your query might actually break some other theoretical query that some other consumer has crafted. |
No, a bugfix for the parser by definition will not break the query. I do not care about the CST changing; I trust grammar maintainers that any such change will be an improvement. I only care about changes that completely break syntax highlighting (since an invalid capture will turn a buffer into a red wall of errors). |
Deviating a bit @maxbrunsfeld, but here's what automated releases look like w/ release-please: #21 (bumping this grammar to 0.20.0, adding a changelog, and auto-publishing to crates + npm once that PR were to be merged) |
CSTs are all that these repos provide.
You shouldn't need to deal with broken highlighting. The cool thing is (as you may know) checking whether your query is compatible with a given language can be done without ever running the query, because impossible queries are caught at instantiation time. |
Which is great! As long as they're valid, we're happy to use them. Let me reiterate: if the same text produces a different CST, that is not a "breaking" change in our view.
And yet we do. (If we can't highlight at all because the queries have become incompatible with the parser, that is "broken highlighting".) Since queries and parsers are managed by different people, mismatches are the rule rather than the exception and have to be managed by us. I acknowledge that we (Neovim) are using tree-sitter not in the way it was originally designed to be used; I am honestly asking whether that way is welcome, or a nuisance. |
No, Tree-sitter and the queries are designed for this use case. That's why queries are checked statically, so that you can avoid ever shipping an app where the queries and the parser don't match. It's even better than a version number, because you don't have to trust it, you can programmatically verify it. |
Don't worry, it's good that we argue a little in a good sense, it helps to cover and dig into more precise details. A bit more thoughts about semver:
Do you have a plan how to convince all grammar authors always follow to semver suggestions? Are you sure that all authors are able always correctly interpret severity of their changes and bump all version segments correctly and without mistakes? I'm convinced that interpreting severity of changes on a such broad scale like ~200 grammars, it's a job for a tool developed specifically for this needs, because humans always make mistakes from time to time. |
Yes, but we don't ship a (monolithic) app. That's the whole point: queries (and parsers) are dynamic runtime files that can be updated on the fly. |
Yeah, I really do understand what |
I agree; this is a very important discussion to have, so I'm kinda forcing it now.
That's why I've been pushing a concrete interpretation across the ecosystem (that works back from the point of view of query authors).
Yes. Require semver for that grammar to be considered "stable" and amenable for auto-installing and auto-updating etc. Mistakes will happen, of course, but if that becomes the exception rather than the rule, this will significantly improve things for me. |
Yes, I do. But if a (query, parser) combination is invalid, it can not be used for highlighting (or whatever purpose the query serves), so highlighting (or whatever) will not be possible if that combination is found. Now the question is how to deal with that. We could just stop updating parsers, that's true, and let whatever current state be frozen. But I think we can do better, and I hoped my suggestion was suitable for that. |
You don't need to not upgrade. You just need to upgrade to a specific version, and when you do that, you make sure that the queries, which are checked into the repo, are compatible with that version. |
Yes, and my suggestion was meant exactly to facilitate that. (Since the current state is not sustainable, and we're still a ways off from the goal of parity with the ~700 syntax files Vim ships with.) Of course we have CI that prevents updating a parser if it is no longer compatible with the current queries, but every time that happens (one out of 5-10 that are updated), this requires manual intervention from me. (And it can't catch queries that are not managed by me but in other plugins.) Having semantic versioning that indicates that before updating allows a dependabot-style strategy that only makes "safe" updates and leaves breaking updates for manual interaction at the maintainers' leisure, and allows me to mark the parser update (and hence the nvim-treesitter update) as potentially breaking for downstream plugins. |
@ahlinc's comment covers a lot of my concerns. @amaanq I'm ok with the
I like that with 鈽濓笍 this approach, the publishing is just driven by pushing a tag. Curious about your thoughts on that. |
That's fine; I feel I've made my point clearly enough now. I accept that you have different (incompatible) goals for your parser versioning and will think about how to best work around that. (I'll definitely follow our semver strategy with the parsers in the Neovim sphere of influence, of course.) |
Okay, with trying to look on the whole situation (~200 grammars, ~700 syntax files, a lot of grammar authors, grammars complexity, no automatic tools that would provide a short resume of changes between any two versions), I think I got your point @clason, you want to convince all grammar authors to follow for the standard and well known semver model without any sub interpretations of its meaning because community is pretty huge and only maximally simple rules would work well. Also I suspect you actually wouldn't care at all about actual semver segment values and what is only important is that proper segments should monotonically increase accordingly severity of changes. I think I can agree with your arguments and points. But I'd like to suggest 5 cents, semver allows a pretty freedom tail and I'd like to suggest to add a language ABI version to the semver in a format like P.S: I mostly want to resolve the ABI version visibility issue because it's pretty hidden thing for years. |
Off topic note: this approach is widely used but it was a catastrophe for the main Tree-sitter repo, and the whole CI was rebuilt with a one idea that tag should appear automatically and only at the latest moment when nothing critical can fail in the release process. |
That's a doable option - only issue is when pushing tags we must ensure some sort of CI is run beforehand to check that package.json and Cargo.toml have been updated and are equal to the tagged version, and that tests pass, otherwise a broken version can potentially be published. I can work on a solution that implements these checks Though, another benefit w/ release-please and semantic commits is the CHANGELOG.md file can easily be updated in a clean manner, but as you mentioned the downside is a stricter rule on commit messages. I don't really mind, it's mostly preference (though a slight bit of work tacked on to update the manifests) |
I came across this via tree-sitter/tree-sitter-javascript#294. As an fyi, I've moved some of the commentary up to the tree-sitter discussion on the topic: tree-sitter/tree-sitter#1768 (comment) |
馃 I have created a release beep boop
1.0.0 (2023-07-15)
Features
Bug Fixes
This PR was generated with Release Please. See documentation.