New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peg parser optimizations #2878
Comments
I've discovered couple places, where realloc is called unnecessarily. Fixing that shaves off about ~13% of the runtime. I will send a PR to packcc soon. @masatake: By the way, do you have any plans to migrate to the upstream packcc? |
I have also a very strong suspicion that the allocations can be optimized much more. By logging all de/allocations, I have discovered, that there is significantly more calls to
The log file also shows that there is many small (64 bytes or less) objects (lr answers, captures, thunks, chunks, ...) that are alocated only to be immediately freed again as the algortihm backtracks. I have a strong suspicion, that a simple memory pool (or object pool) would help a lot. I'll try to create some proof of concept implementation, at least for some of the structures, just to be able to measure the performance impact. |
Yes. We should use the upstream version. |
I found the object pool must be implemented in |
I implemented a very simple fixed-size memory pool and modified packcc to use it for all allocation of However, more testing is needed, since the pool is not suitable for general use yet. Some of the Kotlin files I use for testing require quite a lot memory, while others require only few thousands simultaneously allocated objects. There is no reasonable limit to set as a default size, so it must be dynamic, which complicates the implementation quite a bit. Only proper testing will reveal if the pool overhead will be lower then using malloc/free in the first place... |
Ok, I will suggest the optimizations directly to upstream. You can either merge them to ctags later or use them after ctags switches to upstream packcc.
I wasn't aware of that, but since I followed the general codestyle used in packcc, the pool actually turned out to be both thread safe and reentrant :-) |
All the changes made by @dolik-rce are merged to the upstream project. Now we must use the upstream version.
|
I listed the changes developed at u-ctags project. |
I think now we can close this. |
Following up from discussion in #2866 about peg parser speed and possible optimization, I'd like to use this issue to share some of my findings and to discuss possibilities for optimization.
As a first expirement, I wrote a simple script that optimizes the grammar. It reduces the number of rules in the grammar, which in turn means less allocations and other overhead. The real-world results are quite good, for Kotlin parser, the grammar is reduced to less than half (from 517 to 222 rules) and the runtime is ~40% faster.
This approach is very basic and could be much improved, especially if it was implemented directly in packcc. However, that is probably not going to happen, since one of the goals of packcc is to generate readable parser which would not be true if the grammar was highly optimized.
Another option is to add a script like this in the build process of ctags, preprocessing the peg files before the are compiled. Full original grammar could be used in debug mode, to make the development easier.
The 40% speed-up is nice, however it still is orders of magnitude worse then custom C parsers. I'll definitely continue to look inside the packcc internals. I believe there must be some way to make the generated parsers faster (and to reduce the memory overhead.
The text was updated successfully, but these errors were encountered: