Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(master): release 1.0.0 #19

Merged

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jul 15, 2023

馃 I have created a release beep boop

1.0.0 (2023-07-15)

Features

Bug Fixes

  • ci: bump action and node versions (7119b8f)

This PR was generated with Release Please. See documentation.

@github-actions github-actions bot force-pushed the release-please--branches--master--components--tree-sitter-regex branch from db918c7 to fcd00ba Compare July 15, 2023 06:28
@amaanq amaanq force-pushed the release-please--branches--master--components--tree-sitter-regex branch from fcd00ba to 2f69d76 Compare July 15, 2023 06:30
@amaanq
Copy link
Member

amaanq commented Jul 15, 2023

Testing on this as it's a small grammar (after testing locally on my own grammars for a while)

Things to do for each grammar:

  • add release please action (it'll automate creating a release pr, and automate publishing afterwards once the PR is merged)
  • force push first PR to 1.0.0, it'll try to incrementally bump the semver accordingly but ideally now we want to decouple from tree-sitter versions and have each grammar maintain their own versions accordingly
  • add bindings/rust/README.md for those that don't have it

Here's to having proper release cycles with tree-sitter grammars!

@github-actions github-actions bot force-pushed the release-please--branches--master--components--tree-sitter-regex branch from 2f69d76 to 04e4f2c Compare July 15, 2023 06:33
@amaanq amaanq changed the title chore(master): release 0.20.0 chore(master): release 1.0.0 Jul 15, 2023
@amaanq amaanq merged commit 17a3293 into master Jul 15, 2023
@github-actions
Copy link
Contributor Author

@amaanq
Copy link
Member

amaanq commented Jul 15, 2023

That being said @maxbrunsfeld, I think there isn't an NPM_TOKEN secret setup in the org, and only a CARGO_REGISTRY_TOKEN

If you could set up the npm token, that'd be great!

@ahlinc
Copy link

ahlinc commented Jul 15, 2023

@amaanq are you sure it should be 1.0.0 release version?

May be it would be better to discuss versions policy first?

All years before and up until now grammars followed principle that they don't have higher version that the second part in the main repo what signals that they should be renewed when something important happened in the main repo like changes in the language ABI version.
May be it makes sense to switch grammars versioning on a scheme where major grammars version equals to the latest language version because it's most important for the grammars.

@clason
Copy link

clason commented Jul 15, 2023

May be it makes sense to switch grammars versioning on a scheme where major grammars version equals to the latest language version because it's most important for the grammars.

That would be really confusing for downstream projects, and doesn't make sense for the vast majority of languages that don't have versions.

From the point of view of downstream, the most important information about a parser is whether an update requires query changes (since the vast majority of queries are maintained downstream). This is the point of using semver, which tells you at a glance whether an update is safe to pull in or would lead to breaking changes. Whether you start that at 1.0 or not is secondary, but a) these parsers are stable enough to merit 1.0.0, and b) that bump is a clear signal to everyone that the versioning scheme has changed.

On the other hand, compatibility with the tree-sitter library and CLI is already fully specified with the ABI version, so encoding this in the versioning scheme is redundant. This has become even less relevant now that tree-sitter supports multiple ABI in parallel, both for generating and consuming parsers. It's also a scheme that has never been followed outside the org, where the vast majority of parsers live (nvim-treesitter is close to the 200 mark now), some of which are already following semver.

@ahlinc
Copy link

ahlinc commented Jul 15, 2023

It's also a scheme that has never been followed outside the org

It seems no one really care about versioning policy in all details yet. This question wasn't raised in the TS repo and there was no official suggestions for it.

Whether you start that at 1.0 or not is secondary

It would be good to clearly define what every semver part would mean for grammars and when every part would be increased.

This has become even less relevant now that tree-sitter supports multiple ABI in parallel

It's not the rule, it happens from time to time when it's possible to keep compatibility with several older versions without increasing library logic complexity.

nvim-treesitter is close to the 200 mark now
That would be really confusing for downstream projects, and doesn't make sense for the vast majority of languages that don't have versions.

So there is no real difference to start form 1 or from 14 for the major part and increase it in sync with language version.

For the current grammars distribution model where it's required that parser.c exists in the repo because it's generated only for the one specific language version it would be still unclear what language versions are compatible with a library version that a user has (yes, may be tree-sitter should start it's stable release not from v1 but from v14 too).

The situation with grammars is that when they would be distributed in a portable binary form over release artifacts (I hope this happens) it would be possible to generate grammar libraries for a broad range of language versions.

As I know official VSCode + Tree-sitter integration stuck on the language versions problem because the problem of compatibility lagging in both directions between integrated library and grammars is not acceptable for them. @jasonwilliams may be knows more about current situation.


The version policy for grammars may look like:

The semver X.Y.Z would grow as the following:

X - major version is in sync with the latest language version.

Y - minor version changes:

  • When something changes in grammar.js, that changes a grammar's tree shape, like:
    • Some rule was renamed, deleted or added.
    • Some field name was renamed, deleted or added.
    • Some super type was renamed, deleted or added.
    • Some anonymous node (specified as string literal or an alias to it) was changed, deleted or added.
    • Was a change in rules structure that produces changes on a resulting parsing tree.

Z - patch version changes on every changes that don't have an impact on a grammar's tree shape, like:

  • Any changes and improvements in an external scanner.
  • Any changes in regex defined tokens.
  • Any changes to underscored (hidden) rules.

/cc: @amaanq @clason

@clason
Copy link

clason commented Jul 15, 2023

I already gave my suggestion in nvim-treesitter/nvim-treesitter#3944 (comment), which admittedly is driven by downstream (editor using tree-sitter parsers and queries as "runtime files") requirements.

The important difference to your strategy is that anything that breaks queries is a breaking change and requires a major version bump. Only changes that add new nodes are minor.

@ahlinc
Copy link

ahlinc commented Jul 15, 2023

It's not very hard to implement an automation by tracking changes of an external scanner and what was actually changed in the grammar by processing grammar.json and give an advise for a maintainer or for an automatic releasing tool what parts of the semver need to be increased accordingly.

@clason
Copy link

clason commented Jul 15, 2023

I think enforcing conventional commits (fix, feat, feat!) is more robust. And yes, documentation (e.g., in a PR template) is necessary.

But you know better, of course. If it's possible to robustly determine this automatically from parsing grammar.json, so much the better.

@clason
Copy link

clason commented Jul 15, 2023

But the main point is that you cannot at the same time track ABI compatibility and downstream (query) breaking changes with the major version. (And I think the latter is more useful, and aligns better with the intended meaning of semver -- that would mean that tree-sitter itself should have ABI version as its major version, not parsers.)

@ahlinc
Copy link

ahlinc commented Jul 15, 2023

I think it's enough to track all breaking changes for queries in the Y - minor version changes level like I described above.

@clason
Copy link

clason commented Jul 15, 2023

No it's not -- not for us. We need to distinguish between actually breaking changes (hard errors for our users) and changes that just change behavior or allow new features.

Semver is very clear about the use of major version bumps: signal to downstream that an upgrade is not safe. And the chain here is also clear:

tree-sitter lib -> parser -> query -> user

What you describe is not semver but something else that may make more sense for you, but isn't as useful for downstream.

(In addition, ABI version is available to introspection, while grammar changes are not -- so it seems suboptimal to "waste" a version field on that.)

@clason
Copy link

clason commented Jul 15, 2023

Let me be clear here: the lack of semantic versioning of parsers is a major pain point in maintaining tree-sitter support in Neovim (and I expect in similar projects like Helix and Emacs as well).

And at the risk of being ungrateful (I just have to vent), it often feels like we're fighting against tree-sitter to make work our use case that upstream doesn't really want to support nor cares about...

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jul 17, 2023

I would say that the vast majority of changes to grammars affect the shape of the syntax tree, so they could require query changes. There are occasionally changes that fix some bug without affecting the shape of any syntax trees, but that is pretty rare. Even changes to hidden rules can break queries. It depends on the query.

@clason Consumers of grammars (like neovim) that maintain separate queries need to pin to a specific version of a grammar. A SHA1 should work just fine, as should a specific version tag. I agree that we should start publishing tags (with corresponding releases to crates.io) more consistently, but it doesn't really matter that much for consumers whether they pin to a git SHA1, or a tag.

Consumers should never just use HEAD, and expect a random query to keep working. That makes no sense.

@amaanq
Copy link
Member

amaanq commented Jul 17, 2023

Yeah neovim does pin to commits, but following semver and semantic commit messages would make it clearer when there's actually breaking changes, and easier for downstream to know when to adapt/update queries. With that being said - another big question from our chats I'll bring up that I'm sure some people are wondering is if the versions should be coupled to tree-sitter core, or if independent. If it's independent, I'd like to say starting from 1.0.0 makes sense, and would signal stability + healthy release cycles henceforth. Though, I wouldn't mind continuing from 0.20.x but it could be a bit confusing for consumers

@maxbrunsfeld
Copy link
Contributor

I don't want these grammars to be versioned 1.0 right now. We'll make that transition at some point, and we'll also bump the Tree-sitter library and CLIs to 1.0. But we're not quite there yet.

@clason
Copy link

clason commented Jul 17, 2023

@maxbrunsfeld I'm sorry, but that is not the reality for us. (Recall that we deal with 200 grammars rather than just the ~20 maintained by the tree-sitter org!) In practice, it's about an equal chance whether a commit is a Bugfix without capture changes, an update that adds new captures, or a breaking change. We can and do of course catch breaking changes in our CI before we bump, but apart from that we're left guessing. Semver would be a _huge_help, since we could both tune automation and immediately know where manual intervention is required.

@clason
Copy link

clason commented Jul 17, 2023

(whether that starts at 1.0 or 0.0.1-pre is of no importance to me.)

As amaanq is saying, for us the big question is whether and why individual parsers need to follow tree-sitter core's versions. I understand why the latter is not 1.0 (you made that clear in the tracking issue), but not why the parsers shouldn't be.

@maxbrunsfeld
Copy link
Contributor

In practice, it's about an equal chance whether a commit is a Bugfix without capture changes, an update that adds new captures, or a breaking change.

But a bugfix for your query might actually break some other theoretical query that some other consumer has crafted.

@clason
Copy link

clason commented Jul 17, 2023

No, a bugfix for the parser by definition will not break the query. I do not care about the CST changing; I trust grammar maintainers that any such change will be an improvement. I only care about changes that completely break syntax highlighting (since an invalid capture will turn a buffer into a red wall of errors).

@amaanq
Copy link
Member

amaanq commented Jul 17, 2023

Deviating a bit @maxbrunsfeld, but here's what automated releases look like w/ release-please: #21 (bumping this grammar to 0.20.0, adding a changelog, and auto-publishing to crates + npm once that PR were to be merged)

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jul 17, 2023

I do not care about the CST changing

CSTs are all that these repos provide.

I only care about changes that completely break syntax highlighting (since an invalid capture will turn a buffer into a red wall of errors).

You shouldn't need to deal with broken highlighting. The cool thing is (as you may know) checking whether your query is compatible with a given language can be done without ever running the query, because impossible queries are caught at instantiation time.

@clason
Copy link

clason commented Jul 17, 2023

CSTs are all that these repos provide.

Which is great! As long as they're valid, we're happy to use them. Let me reiterate: if the same text produces a different CST, that is not a "breaking" change in our view.

You shouldn't need to deal with broken highlighting.

And yet we do. (If we can't highlight at all because the queries have become incompatible with the parser, that is "broken highlighting".)

Since queries and parsers are managed by different people, mismatches are the rule rather than the exception and have to be managed by us. I acknowledge that we (Neovim) are using tree-sitter not in the way it was originally designed to be used; I am honestly asking whether that way is welcome, or a nuisance.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jul 17, 2023

No, Tree-sitter and the queries are designed for this use case. That's why queries are checked statically, so that you can avoid ever shipping an app where the queries and the parser don't match. It's even better than a version number, because you don't have to trust it, you can programmatically verify it.

@ahlinc
Copy link

ahlinc commented Jul 17, 2023

And at the risk of being ungrateful (I just have to vent), it often feels like we're fighting against tree-sitter to make work our use case that upstream doesn't really want to support nor cares about...

Don't worry, it's good that we argue a little in a good sense, it helps to cover and dig into more precise details.

A bit more thoughts about semver:

  • I saw on practice that many projects reinterpret it as it fits to their needs, someone just raises major version every week, others this that three version segments aren't enough and add much more info into allowed tail. That's why I asked to strictly explain of all 3 segments for grammars, because for grammar needs I thing it's enough last two and the first one can be used to be in sync with the lib.
  • In the lib the language ABI version is important thing and it may be a good marker stored in the first segment. So yes versions would jump to 14.x.x and would look like 14.major.minor.

Recall that we deal with 200 grammars

Do you have a plan how to convince all grammar authors always follow to semver suggestions? Are you sure that all authors are able always correctly interpret severity of their changes and bump all version segments correctly and without mistakes?


I'm convinced that interpreting severity of changes on a such broad scale like ~200 grammars, it's a job for a tool developed specifically for this needs, because humans always make mistakes from time to time.

@clason
Copy link

clason commented Jul 17, 2023

Yes, but we don't ship a (monolithic) app. That's the whole point: queries (and parsers) are dynamic runtime files that can be updated on the fly.

@maxbrunsfeld
Copy link
Contributor

Yes, but we don't ship a (monolithic) app.

Yeah, I really do understand what nvim-tree-sitter is. But I don't feel like you're understanding the point that I'm making. Do you get that Tree-sitter already provides a mechanism to check up-front whether a (query, parser) combination is valid, and that it's completely doable already to avoid ever having broken syntax highlighting?

@clason
Copy link

clason commented Jul 17, 2023

Don't worry, it's good that we argue a little in a good sense, it helps to cover and dig into more precise details.

I agree; this is a very important discussion to have, so I'm kinda forcing it now.

I saw on practice that many project reinterpret it as it fits to their needs,

That's why I've been pushing a concrete interpretation across the ecosystem (that works back from the point of view of query authors).

Do you have a plan how to convince all grammar authors always follow to semver suggestions

Yes. Require semver for that grammar to be considered "stable" and amenable for auto-installing and auto-updating etc. Mistakes will happen, of course, but if that becomes the exception rather than the rule, this will significantly improve things for me.

@clason
Copy link

clason commented Jul 17, 2023

Do you get that Tree-sitter already provides a mechanism to check up-front whether a (query, parser) combination is valid, and that it's completely doable already to avoid ever having broken syntax highlighting?

Yes, I do. But if a (query, parser) combination is invalid, it can not be used for highlighting (or whatever purpose the query serves), so highlighting (or whatever) will not be possible if that combination is found.

Now the question is how to deal with that. We could just stop updating parsers, that's true, and let whatever current state be frozen. But I think we can do better, and I hoped my suggestion was suitable for that.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jul 17, 2023

Now the question is how to deal with that. We could just stop updating parsers, that's true, and let whatever current state be frozen. But I think we can do better, and I hoped my suggestion was suitable for that.

You don't need to not upgrade. You just need to upgrade to a specific version, and when you do that, you make sure that the queries, which are checked into the repo, are compatible with that version.

@clason
Copy link

clason commented Jul 17, 2023

Yes, and my suggestion was meant exactly to facilitate that. (Since the current state is not sustainable, and we're still a ways off from the goal of parity with the ~700 syntax files Vim ships with.)

Of course we have CI that prevents updating a parser if it is no longer compatible with the current queries, but every time that happens (one out of 5-10 that are updated), this requires manual intervention from me. (And it can't catch queries that are not managed by me but in other plugins.) Having semantic versioning that indicates that before updating allows a dependabot-style strategy that only makes "safe" updates and leaves breaking updates for manual interaction at the maintainers' leisure, and allows me to mark the parser update (and hence the nvim-treesitter update) as potentially breaking for downstream plugins.

@maxbrunsfeld
Copy link
Contributor

@ahlinc's comment covers a lot of my concerns.

@amaanq I'm ok with the release-please thing, though I don't love how opinionated it is about the commits. There is another approach to automatically publishing releases which is a bit simpler, and is in use already for some of the grammar repos:

I like that with 鈽濓笍 this approach, the publishing is just driven by pushing a tag. Curious about your thoughts on that.

@clason
Copy link

clason commented Jul 17, 2023

That's fine; I feel I've made my point clearly enough now. I accept that you have different (incompatible) goals for your parser versioning and will think about how to best work around that. (I'll definitely follow our semver strategy with the parsers in the Neovim sphere of influence, of course.)

@ahlinc
Copy link

ahlinc commented Jul 17, 2023

Okay, with trying to look on the whole situation (~200 grammars, ~700 syntax files, a lot of grammar authors, grammars complexity, no automatic tools that would provide a short resume of changes between any two versions), I think I got your point @clason, you want to convince all grammar authors to follow for the standard and well known semver model without any sub interpretations of its meaning because community is pretty huge and only maximally simple rules would work well.

Also I suspect you actually wouldn't care at all about actual semver segment values and what is only important is that proper segments should monotonically increase accordingly severity of changes.

I think I can agree with your arguments and points.

But I'd like to suggest 5 cents, semver allows a pretty freedom tail and I'd like to suggest to add a language ABI version to the semver in a format like 1.0.0+abi14 and enforce that +abi14 tail presence by Tree-sitter CLI.

P.S: I mostly want to resolve the ABI version visibility issue because it's pretty hidden thing for years.

@ahlinc
Copy link

ahlinc commented Jul 17, 2023

the publishing is just driven by pushing a tag.

Off topic note: this approach is widely used but it was a catastrophe for the main Tree-sitter repo, and the whole CI was rebuilt with a one idea that tag should appear automatically and only at the latest moment when nothing critical can fail in the release process.

@amaanq
Copy link
Member

amaanq commented Jul 17, 2023

@ahlinc's comment covers a lot of my concerns.

@amaanq I'm ok with the release-please thing, though I don't love how opinionated it is about the commits. There is another approach to automatically publishing releases which is a bit simpler, and is in use already for some of the grammar repos:

I like that with point_up this approach, the publishing is just driven by pushing a tag. Curious about your thoughts on that.

That's a doable option - only issue is when pushing tags we must ensure some sort of CI is run beforehand to check that package.json and Cargo.toml have been updated and are equal to the tagged version, and that tests pass, otherwise a broken version can potentially be published. I can work on a solution that implements these checks

Though, another benefit w/ release-please and semantic commits is the CHANGELOG.md file can easily be updated in a clean manner, but as you mentioned the downside is a stricter rule on commit messages. I don't really mind, it's mostly preference (though a slight bit of work tacked on to update the manifests)

@dcreager
Copy link

dcreager commented Feb 2, 2024

I came across this via tree-sitter/tree-sitter-javascript#294. As an fyi, I've moved some of the commentary up to the tree-sitter discussion on the topic: tree-sitter/tree-sitter#1768 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants