New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve V3Localize #3027
Improve V3Localize #3027
Conversation
a71aa7b
to
21917ab
Compare
Actually, turns out even wiht tracing there are 19000 more locals in OpenTitan with this patch! |
To clarify, on OpenTitan:
Let's hope that stays that way after I fix the sanitizer failures.. |
Sounds like great results, wish had spotted that earlier! Please also check SweRV performance.
|
The sanitizer error seems a pre-existing genuine read past the end of a VlWide (trying to do a bit extract I think). It shows up now because the VlWide has been localized, while previously it just read past the end of it in the module struct.. Will need to figure out how to fix this.
Agree it should be applied more globally where possible, but the last patch depends on this so I will keep it here for now, and revisit where the allocators can be used as a later patch/PR. |
More modest gain on EH1:
On the other hand, OpenTitan builds about 20% faster on GCC with this patch (with no ccache, 2191 CPU seconds vs 2651), something that @rswarbrick might be happy to hear. |
BTW I intend to rebase this and keep the 3 commits separate instead of squasing them. Sadly GitHub doesn't support dependent PRs.. |
V3Localize can now localize variable references that reference variables located in scopes different from the referencing function. This also means V3Descope has now moved after V3Localize.
These utility classes can be used to hang advanced data structures off AstNode user*u() pointers, and they take care of memory management for the client. Use via the call operator().
Teach V3Localize how to localize variables that are used in multiple functions, if in all functions where they are used, they are always written in whole before being consumed. This allows a lot more variables to be localized (+20k variables on OpenTitan - when building without --trace), and can cause significant performance improvement (OpenTitan simulates 8.5% - build single threaded and withuot --trace).
21917ab
to
cad021a
Compare
Got a bit side tracked form the ccache efficiency work when I found this optimization opportunity. It makes the OpenTitan model 8.5% faster when compiled without --trace (measured on a single threaded model), as V3Localize now finds an additional ~21000 variables that it can convert into locals which no longer need to live in (and be stored to) the model state.
(Sadly when compiled with tracing these opportunities go away as variables are read in the tracing routines.)There are 3 patches and they should be kept separate as they are largely orthogonal. The first moves V3Descpoe after V3Localize, and teaches V3Localize how to localize variables of another module. Patch 2 is just infrastructure to be used in patch 3 initially but is generic. Patch 3 teaches V3Localize how to localize any variable that is always assigned before it's read within a function (previously it only did this if the variable was used only in a single function, but a lot of stuff was duplicaed into the settle loop, so there was a lot of lost opportuity).
Some numbers for OpenTitan (single threaded, without --trace):