Skip to content

GDExtension: Store source of gdextension_interface.h in JSON #107845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dsnopek
Copy link
Contributor

@dsnopek dsnopek commented Jun 22, 2025

This PR converts the canonical source of truth for the GDExtension interface from the gdextension_interface.h header, to a new JSON file (gdextension_interface.json).

The header can still be generated from the JSON, and godot --dump-gdextension-interface will still create a header. (This will be the "legacy header" - more on that below.)

But now there's a new godot --dump-gdextension-interface-json that will dump the JSON data.

(When we eventually do Godot 5 and can break compat, I'd like to make --dump-gdextension-interface output the JSON, and a new --dump-gdextension-interface-header to do the header. This will make it clear the JSON is the canonical source, but probably lots of folks will still want to use the header.)

Advantages

This is something we've talked about at GDExtension team meetings for a long time (years!), which has a number of advantages:

  • There are already bindings that parse the gdextension_interface.h to power their own code generation. They will no longer need to deal with the complexities of parsing C! They can just use the JSON data. (FYI, I initially made the JSON in this PR with a script that extracted the data from the header into JSON, and now no one needs to do that ever again :-))

  • We can iterate on the header used within Godot and godot-cpp. Another change we've discussed in the past is replacing some of the typedef void *s with struct {}s so that the compiler can help prevent us from mixing up types, but we've been unable to do that, because we don't want to break developers using the "legacy header". Well, now Godot and godot-cpp can generate a modified header from the JSON to use internally, without affecting external users of the legacy header

  • Up until now we've had to enforce restrictions on what can go into gdextension_interface.h and its format because we know some bindings parse it. This has meant trying to catch PRs that attempt to add "real code" to the header in review. We won't have to worry about that anymore! Anything added to the JSON will automatically be added to the "legacy header" in the correct format.

Result

You can compare the new header generated by Godot, with the old header - or look at this convenient diff!

All the changes are cosmetic, mostly related to spacing or comment type. (I decided to standardize on C-style comments, and putting comments before what they comment, rather than inline.)

I've run the godot-cpp tests with the legacy header generated by Godot, and everything seems to work fine!

Notes

Note 1: Generating the header

There are two places in the PR that take the JSON and generate a C header: in Python code run by SCons to make the header used by Godot, and in Godot itself to generate the "legacy header".

This is intentional!

Right now, they do basically the same thing (with the exception of some type-o's in type names maintained in the legacy header, but fixed in the Python version).

But, as explained above, the goal is that we can continue to iterate on the Python version, so that we can make some improvements to the header used by Godot and godot-cpp. Whereas the header generation code in Godot will stay the roughly same for backwards compatibility.

Note 2: The docs

This PR doesn't attempt to improve the docs at all! It just transfers what we already have into JSON.

Since this PR will conflict with any other changes to gdextension_interface.h, it runs the risk of stalling any other GDExtension work that needs to touch that file.

So, while there may be a temptation to iterate on the docs in this PR, instead, I think we should attempt to merge it with the existing docs as soon as possible in the Godot 4.6 dev cycle. The goal is for the docs to be no worse than the docs we already have.

Afterwards, we can make follow-up PRs to improve the docs.

I'd like for review of this PR to focus on the format of the JSON file, and correctly generating the header.

@dsnopek dsnopek added this to the 4.6 milestone Jun 22, 2025
@dsnopek dsnopek requested a review from a team as a code owner June 22, 2025 13:29
@dsnopek dsnopek requested review from a team as code owners June 22, 2025 13:29
@dsnopek dsnopek force-pushed the gdextension-interface-json branch 2 times, most recently from fd610ba to 7abb091 Compare June 22, 2025 14:33
@Bromeon
Copy link
Contributor

Bromeon commented Jun 22, 2025

Somehow, I completely missed this huge change! 😅
There are several upsides binding maintainers, as you state. Thanks a lot for that!
Also for the very helpful before-after diff 👍

For completeness, I'd also like to mention potential downsides. Some are inherent to this sort of migration and not a flaw of the chosen approach, so they're not necessarily arguments against moving to JSON 🙂 long-term, this will likely simplify a lot!

  1. Having a C header as the source of truth guarantees that the FFI signatures are indeed correct. For many languages, there is automated tooling to convert C declarations into APIs in the target language. Hand-written JSON-to-Language parsers are more prone to bugs on the other hand.

    • On the other hand, we do have this issue on the extension_api.json already. This JSON adds some new "concepts" though, so there may be some churn in the initial phase.
  2. A running Godot engine is now required to generate the header. As of now, CI workflows can simply download the gdextension_interface.h file directly from a tag on the repo.

  3. The JSON is now almost impossible to read by hand. 9 vs 3 kLoC and the very boilerplaty structure of JSON make it difficult to see how the entries map to actual functions.

    • It's possible to have tooling for this, or generate the original header, but we can no longer inspect an "information-dense" file on the web directly -- and this includes PR reviews.
  4. While the JSON is a good way moving forward, I don't see how we can migrate to it in the short term without maintaining two complete parsing/generating workflows in parallel. If this lands in Godot 4.6, that means Godot 4.1-4.5 do not support the "new" way, but bindings may support these versions for years to come.

    • Probably worth if the long-term plans pan out, but this does increase complexity for all bindings in the mid term.

I wonder if it would be possible to convert header files of older Godot versions to the JSON format and make them available? This would only need to happen for each minor version (as the GDExtension header doesn't change in-between), and would completely solve point (4).


I'll look into the detailed changes at a later point.

Just one request: could we keep the field names and structure as close as possible to the extension_api.json, for consistency and to allow reusing parser infrastructure? Examples:

  • args -> arguments
  • ret -> return_value
  • Pointer spacing: const void * -> const void*
  • type should only be used for actual types, not categories (simple|enum|struct) -> maybe type_category
  • probably more...

@Bromeon
Copy link
Contributor

Bromeon commented Jun 22, 2025

Another question: if the JSON becomes now the source-of-truth, how do contributors modify it?

If manually, is there tooling to ensure indentation, order of keys etc., to keep some consistency?

@dsnopek
Copy link
Contributor Author

dsnopek commented Jun 22, 2025

@Bromeon Thanks, those are some very good points :-)

  1. Having a C header as the source of truth guarantees that the FFI signatures are indeed correct. For many languages, there is automated tooling to convert C declarations into APIs in the target language.

I think it makes sense to always keep the option of generating the C header.

So, for language bindings that use premade tooling that can parse a C header, they can keep using that. But with the JSON, they can also do their own additional code generation, without having to make their own parser for the C header.

3. The JSON is now almost impossible to read by hand. 9 vs 3 kLoC and the very boilerplaty structure of JSON make it difficult to see how the entries map to actual functions.

I agree that dealing with the whole thing at once is kind of hard to read.

But I think with individual PRs that only add a couple things at a time, it should be fine. And, it'll also make it harder for certain errors to slip in, especially if we add some validation of the JSON, which you also bring up, and I agree would be a good idea.

4. While the JSON is a good way moving forward, I don't see how we can migrate to it in the short term without maintaining two complete parsing/generating workflows in parallel. If this lands in Godot 4.6, that means Godot 4.1-4.5 do not support the "new" way, but bindings may support these versions for years to come.

Actually, bindings can always use the latest version of the JSON! (And this is true of the current gdextension_interface.h as well.)

We never change or remove old signatures. You just need to ignore any interface functions with a "since" field that's newer than the Godot version you are targeting.

So, a binding could use the JSON from Godot 4.6 to build the Godot 4.1-4.5 compatible versions as well.

Just one request: could we keep the field names and structure as close as possible to the extension_api.json, for consistency and to allow reusing parser infrastructure? Examples:

That makes sense! I'll update the PR when I have a chance.

Another question: if the JSON becomes now the source-of-truth, how do contributors modify it?

If manually, is there tooling to ensure indentation, order of keys etc., to keep some consistency?

Yes, manually. We could (and should!) add some tooling for formatting and validation. I think this would help to prevent the sort of errors and inconsistencies that have already cropped up in the gdextension_interface.h (ex #107788)

I'll dig into this

@Ivorforce
Copy link
Member

  1. The JSON is now almost impossible to read by hand. 9 vs 3 kLoC and the very boilerplaty structure of JSON make it difficult to see how the entries map to actual functions.

I agree that dealing with the whole thing at once is kind of hard to read.

But I think with individual PRs that only add a couple things at a time, it should be fine. And, it'll also make it harder for certain errors to slip in, especially if we add some validation of the JSON, which you also bring up, and I agree would be a good idea.

I just want to mention that JSON is not the only option we have.
It does make a lot of sense, considering it's probably most widely spread structured data format, and we already use it for the Object data.
But since this file will be hand-edited, JSON might not be the best fit. Personally, I go for .toml for hand written data. It's also wide-spread, not overcomplicated like .yaml, but made for human editing unlike .json.

@dsnopek
Copy link
Contributor Author

dsnopek commented Jun 22, 2025

I just want to mention that JSON is not the only option we have.
It does make a lot of sense, considering it's probably most widely spread structured data format, and we already use it for the Object data.
But since this file will be hand-edited, JSON might not be the best fit. Personally, I go for .toml for hand written data. It's also wide-spread, not overcomplicated like .yaml, but made for human editing unlike .json.

I agree that JSON isn't the best format for human editing.

(If I recall correctly, I think @vnen was advocating for using XML for this data at a meeting in the past?)

I went for JSON largely because (a) parsing it in Godot is very easy and (b) developers already need to work with the extension_api.json and so giving them two JSON files seemed better than one JSON file and one of something else.

I suppose we could keep the data in some other format (yaml, toml, xml, whatever) and then convert it to JSON to distribute it? That's a lot of converting, though

@Bromeon
Copy link
Contributor

Bromeon commented Jun 22, 2025

Actually, bindings can always use the latest version of the JSON! (And this is true of the current gdextension_interface.h as well.)

Very good point. I was actually considering this at some point for the .h file as well, but wasn't sure this was a 100% guarantee. If the JSON file is only additive (or changing in non-binary-breaking ways), that will simplify a lot!


Yes, manually. We could (and should!) add some tooling for formatting and validation. I think this would help to prevent the sort of errors and inconsistencies that have already cropped up in the gdextension_interface.h (ex #107788)

Yes, or non-C code creeping in 🙂 #96408


I went for JSON largely because (a) parsing it in Godot is very easy and (b) developers already need to work with the extension_api.json and so giving them two JSON files seemed better than one JSON file and one of something else.

I suppose we could keep the data in some other format (yaml, toml, xml, whatever) and then convert it to JSON to distribute it? That's a lot of converting, though

It would make sense if both extension_api.* and gdextension_interface.* were in the same format.

Personal preference:

  • JSON is fine. It's not the best for human editing, but at least ubiquitious.
  • XML is very verbose. If we want to make it easier for humans to edit, I'm not sure whether repeating open/close tags + 2 representations (values and attributes) are the way. It's also a rather complex standard (meaning consumers will likely pull in complex parsers to deal with only a tiny subset).
  • YAML is a complexity and inconsistency nightmare.
  • TOML is quite nice and would probably fit the relatively flat declarative structure we have in Godot.

I'm not sure if it's even an option to change the format for now, but for a potential Godot 5, I could totally imagine something like TOML.

@dsnopek dsnopek force-pushed the gdextension-interface-json branch from 7abb091 to 2070c09 Compare June 24, 2025 11:17
@dsnopek
Copy link
Contributor Author

dsnopek commented Jun 24, 2025

Just one request: could we keep the field names and structure as close as possible to the extension_api.json, for consistency and to allow reusing parser infrastructure?

In my latest push, I've attempted to align the structure with extension_api.json

  • type should only be used for actual types, not categories (simple|enum|struct) -> maybe type_category

I've decided to go with "kind" (so, we have "a kind of type", which is literally what I would say out loud in normal conversation).

And I've also renamed "simple" to "alias", which I think is more descriptive of what it is

Yes, manually. We could (and should!) add some tooling for formatting and validation. I think this would help to prevent the sort of errors and inconsistencies that have already cropped up in the gdextension_interface.h (ex #107788)

Yes, or non-C code creeping in 🙂 #96408

I added some simple validation for types, so it shouldn't be possible to accidentally sneak a bool in there :-)

There's some more validation I'd like to add, including around formatting, which'll come in my next update!

@raulsntos
Copy link
Member

For the C# bindings we have no plans to move to the new JSON, so as long as the C header is still available it doesn't really affect us what the source of truth is.

In the C# bindings we use ClangSharp which can generate C# bindings from the C header. It uses Clang directly, so as long as the C header is valid C/C++ code it should work fine.


For other languages, keep in mind they may not have great support for TOML, whereas JSON and XML tend to be ubiquitous.

It makes sense to use JSON since they'll need to support it for extension_api.json anyway, so it's guaranteed that they'll have something to work with that.

But as mentioned earlier, I don't plan to consume this new format, so I don't have a vested interest in the choice of format.

@vnen
Copy link
Member

vnen commented Jun 24, 2025

I don't remember when I suggested XML as a format. It was probably because of the class ref docs, so we could use a similar format. But we usually rely on doctool creating the stuff for us, writing by hand is not really great. Not that I think JSON is much better (at least compared writing plain C code), but since all the tooling is based on JSON already, it makes sense to keep in the same format.

Speaking of which, it might be interesting to keep a JSON Schema too, which also helps with validation (though I'm not sure how easy it would be to integrate a schema validator in our system).

@dsnopek
Copy link
Contributor Author

dsnopek commented Jun 25, 2025

Speaking of which, it might be interesting to keep a JSON Schema too, which also helps with validation (though I'm not sure how easy it would be to integrate a schema validator in our system).

I think making a JSON Schema is a great idea, even if it only ends up as documentation

Since we already have Python code in SCons parsing the JSON and looping over it, it may be easier just to validate its structure in there, rather than adding JSON schema validation tooling. But I'll take a look at it! If the schema validator can run as part of the pre-commit hooks, then we can check its format before SCons even runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants