Clean up parse table representation, use 16 bits for production_id #943
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #511
Background
Currently, because of the way that Tree-sitter's parser ABI incrementally evolved, the
production_id
field on parse actions (which is used to identify the sequence of fields and aliases that applies to a given node's children) is represented as auint8_t
. So far, this has not been a problem for any of the grammars that we maintain, but other users have hit the limit, and when it happens, it's very confusing.Change
This PR changes
production_id
to a niceuint16_t
, which should always be more than large enough.Unfortunately, because of some shortcomings of the old struct layout, this was not feasible to do without making a breaking change to the parser ABI. So, for the first time in a long time, I have bumped tree-sitter's
TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION
constant. This means that new versions of the library will refuse to load parsers that were generated with previous versions of the CLI. Specificallyts_parser_set_language
will returnfalse
. Users will need to regenerate their parsers.Normally, this is not a problem: I think that for most use cases, users can easily upgrade the library and their parsers together. There are a few exceptions that I know of:
Atom - At some point, when Atom adopts this version of Tree-sitter, this change will affect Atom users who have installed packages using third-party Tree-sitter parsers. The packages will needto be updated to use a regenerated version of the parsers. /cc @darangi - I don't think Atom plans on upgrading Tree-sitter anytime soon, so this is probably not urgent for you I think?
Neovim - I think that some end-users of Neovim have installed their own third-party parsers. When Neovim upgrades to the latest version of Tree-sitter, Neovim will refuse to load these older parsers. The end users will need to install new versions of the parsers. /cc @bfredl @vigoux @theHamsta @tjdevries
Emacs-Tree-Sitter - /cc @ubolonton I'm not sure if users can install their own parsers. If so, then this is something to be aware of.
Details
Since an ABI change was already needed, I took this opportunity to fix some dumb aspects of the current ABI. There are no more bitfields (they were used unnecessarily before). Also, I've added one more field to the
TSLanguage
struct -production_id_count
, which makes it so that all the arrays onTSLanguage
can have their length known at runtime.