-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize program size by refactoring error reporting routines #4446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed by your analysis, good work.
(Moving to main thread vs code comment) This demonstrates what I mean:
Output:
The old code had a lock around building the arguments. |
So, I'm suggesting the macro calls a V3Error::getLock(). The unlocking is fine in errorEnd. |
Oh, I got it. Indeed there are some function likes |
There's an article says that functions in chained << operators can be called in any order before C++17. To save another function call, I came up with two solution. The first is to use separate statements to clarify the order, but this cannot be compiled in MSVC (probably unsupported). v3errorEnd((({std::ostringstream& os = V3Error::v3errorPrep(code); os << msg}), v3errorStr())) The second is to use a self-called anonymous lambda. v3errorEnd(
[&]() -> std::ostringstream& {
std::ostringstream& os = V3Error::v3errorPrep(code);
os << msg;
return os;
}()) But the lambda cannot be inlined and make the program slightly bigger, so now it's modified to v3errorEnd((V3Error::v3errorPrep(code), static_cast<std::ostringstream&>(V3Error::v3errorStr() << msg))) It's just ok. |
Thanks for the research. Please add a comment near that code summarizing why the comma is needed until C++17. Then looks good to go. |
Done |
If you remove all the code about error reporting/assertions (like
v3error
/UASSERT_OBJ
/...), you may find that the size of the verilator_bin drops from ~18MB to ~12.3MB, which is ~30% of the original size! Disassembler shows that a singlev3fatalSrc
produces about 264 bytes of the code. For example, a singlenodep->v3fatalSrc("...");
in V3LinkInc.cpp is compiled into the assembly shows in the following image. It contains code that acquires the V3Error::s().m_mutex, formats the output text with streaming operators and so on...A lambda passed to
m_mutex.lockCheckStopRequest
is also generated for each call. These greatly increase the program size. It's very inefficient.After this patch the acquiring and releasing of the lock take place in
V3Error::v3errorAcquireLock()
(called byV3Error::v3errorPrep
) andV3Error::v3errorEnd
respectively, so it won't be inlined into every function call. It also guarantees thatV3ErrorGuarded::v3errorStr()
andV3ErrorGuarded::v3errorEnd()
are called with the lock acquired. The streaming operation to format the output text is also optimized carefully. Now the generated code looks much better.The binary size is quite smaller than before.
Before
Now
Besides, it almost doubles the speed of the building process.
Before
Now