-
-
Notifications
You must be signed in to change notification settings - Fork 22.7k
[3.x] Provide quick access to Object
ancestry
#107462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
abc9057
to
cb14c3a
Compare
This comment was marked as outdated.
This comment was marked as outdated.
4330714
to
31023df
Compare
Actually on second thoughts there's no reason why we can't agree on the implementation then do the call points. We don't have a |
Node
ancestryObject
ancestry
31023df
to
5331f8c
Compare
5331f8c
to
d3a31a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a lot of sense to me. From my side, this is ready to merge.
I think we should bring this into the core meeting before making a final decision. But I'm very confident this will be quite important at little cost in the areas where it's needed.
As discussed in the core meeting, I've created some code to measure how many If anyone interested in running, the counting debugger is here: I'll try and write some results here. TPS demoa few seconds of gameplay:
This confirms that the FTI (even optimized) is a heavy user of Jetpaca (2D game)
As expected, there's no use of the 3D Editor (Jetpaca)
In fact I wonder whether we have some dodgy loops there which are causing this, perhaps it is recursive layout for the GUI. Regardless, this should make editor more snappy if there is a 2x speedup in these casts. Editor (TPS demo)
I'll try and add some more types and repost data. |
So to summarize the results, some casts I can see that are particularly hot:
I've updated the PR to include all of these, and it takes 15 bits. As I say, going up from 4 bits is highly likely to be "free" for all intents and purposes, because of padding, and not worth skimping too much as we already store heavy data on Now it will just be left for @Ivorforce to work his magic after this PR merged, in order to make it work with |
d3a31a8
to
ae786bd
Compare
Just had a little try at the I was guessing we were aiming for something like:
I'm currently bumping c++ to 17, so we can assume we'll be able to use The dependency management with headers is tricky though, as we don't want to be I tried an experiment of detecting whether the class had been defined:
The idea was to only compile in the ancestry check if Anyway will leave @Ivorforce to try and figure this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more types than I expected, but I suppose it's free to add more until the bit field costs 32 bits. So we still have some slack even after the PR.
Looks great, ready to merge!
Thanks! |
Although we have increased
Object::cast_to
performance significantly with #103708 (4.x) and #104825 (3.x) we discussed at the time that for some high performance bottleneck scenarios we may want an even faster way of determining whether anObject
is one of the key main types.At the time I trialed using a bitfield to store this info and it worked well, and is likely to be one of the fastest methods, and discussed this with @Ivorforce .
While it involves storing (and retrieving) data from the
Object / Node
(thus cache effects), it avoids overheads with a virtual function approach, and the virtual function requires reading thevtable
from the object, so there is a read in all cases I think.Benchmarks
In order to compare the different approaches, I benchmarked between:
virtual bool is_spatial()
)Object::cast_to<Spatial>
I wrote gdscript to create a node with 200,000 children, alternating
Node
andSpatial
. The reason for alternating and the count is to ensure that the optimizer doesn't optimize out the benchmark (although the alternating may make things easier for branch predictor compared to random).Created a
benchmark
c++ function:Results
Godot Engine v3.7.dev.custom_build [57304c0ce] - X11 - 2 monitors - GLES2 - Mesa Intel(R) Graphics (RPL-S) - 13th Gen Intel(R) Core(TM) i7-13700T (24 threads)
200,000 node children
release
ancestry 85793, virtual 86305, cast 103659
debug
ancestry 350618, virtual 249225, cast 308729
2000 node children
release
ancestry 175, virtual 297, cast 344
debug
ancestry 2719, virtual 2804, cast 3386
Discussion
Well, as I suspected, this is a turnup for the books. I did question whether the cache slowdown of actually reading the data would compare to the overhead of a virtual function, and it turns out to be closer than we thought.
In release, with large numbers of children, storing the type on the object (ancestry) and virtual seem to be neck and neck, with
cast_to
not that much slower.With a smaller number of children, the results seem to flip, and everything is more likely to be in cache, so the costs of reading the object are lower, and the ancestry approach clearly wins. So with large numbers of nodes, memory read speed may be the bottleneck, and with smaller numbers of nodes, the processing itself may be more a bottleneck.
So the question at the moment is which approach is likely to be faster in the wild, and by how much. At the moment it looks like ancestry is fastest in all cases, but possibly not by a huge margin once we have a really large number of nodes, or cache is cold. We also have to bear in mind this might be hardware dependent as memory read speed / cache versus virtual overhead may vary e.g. on mobile.
Also I have realized that we need to read from the memory in both cases. Even for virtual function, we need to read the
vtable
pointer from the object in RAM, so there should in theory always be more work for the virtual approach (though it uses no memory on the object itself).Power use on mobile
I also considered power use on mobile, and asked Grok:
Notes
cast_to
to useAncestry
for the specific cases covered, so it will invisibly be used everywhere.