Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize program size by refactoring error reporting routines #4446

Merged
merged 6 commits into from
Aug 29, 2023

Conversation

donlon
Copy link
Contributor

@donlon donlon commented Aug 28, 2023

If you remove all the code about error reporting/assertions (like v3error/UASSERT_OBJ/...), you may find that the size of the verilator_bin drops from ~18MB to ~12.3MB, which is ~30% of the original size! Disassembler shows that a single v3fatalSrc produces about 264 bytes of the code. For example, a single nodep->v3fatalSrc("..."); in V3LinkInc.cpp is compiled into the assembly shows in the following image. It contains code that acquires the V3Error::s().m_mutex, formats the output text with streaming operators and so on...

before

A lambda passed to m_mutex.lockCheckStopRequest is also generated for each call. These greatly increase the program size. It's very inefficient.

After this patch the acquiring and releasing of the lock take place in V3Error::v3errorAcquireLock() (called by V3Error::v3errorPrep) and V3Error::v3errorEnd respectively, so it won't be inlined into every function call. It also guarantees that V3ErrorGuarded::v3errorStr() and V3ErrorGuarded::v3errorEnd() are called with the lock acquired. The streaming operation to format the output text is also optimized carefully. Now the generated code looks much better.

after

The binary size is quite smaller than before.

Before

 18342000 verilator_bin
100197376 verilator_bin_dbg

Now

 13619032 verilator_bin
 82263520 verilator_bin_dbg

Besides, it almost doubles the speed of the building process.

Before

$ time make -j20
make -j20  1799.37s user 57.01s system 1607% cpu 1:55.49 total

Now

$ time make -j20
make -j20  1067.50s user 40.01s system 1508% cpu 1:13.40 total

Copy link
Member

@wsnyder wsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm impressed by your analysis, good work.

src/V3Width.cpp Outdated Show resolved Hide resolved
@wsnyder
Copy link
Member

wsnyder commented Aug 29, 2023

(Moving to main thread vs code comment)

This demonstrates what I mean:

#include <iostream>

void lock() { std::cout << "--Lock\n"; }
void unlock() { std::cout << "--Unlock\n"; }

std::string argument() { std::cout << "--Making argument\n"; return "arg"; }

std::string v3errorPrepOld(std::string arg) {  //
    std::cout << "--Old error prep " << arg << std::endl;
    return arg;
}

std::string v3errorPrepNew(std::string arg) {  //
    lock();
    std::cout << "--New error prep " << arg << std::endl;
    return arg;
}

void v3errorEndOld(std::string arg) {  //
    std::cout << "--Old error end " << arg << std::endl;
}
void v3errorEndNew(std::string arg) {  //
    std::cout << "--New error end " << arg << std::endl;
    unlock();
}

int main() {
    {
        std::cout << "-----------\n";
        lock();
        v3errorEndOld(v3errorPrepOld(argument()));
    }
    {
        std::cout << "-----------\n";
        v3errorEndNew(v3errorPrepNew(argument()));
    }
    return 0;
}

Output:

--Lock
--Making argument
--Old error prep arg
--Old error end arg
-----------
--Making argument
--Lock
--New error prep arg
--New error end arg
--Unlock

The old code had a lock around building the arguments.

@wsnyder
Copy link
Member

wsnyder commented Aug 29, 2023

So, I'm suggesting the macro calls a V3Error::getLock(). The unlocking is fine in errorEnd.

@donlon
Copy link
Contributor Author

donlon commented Aug 29, 2023

Oh, I got it. Indeed there are some function likes nodep->warnMore() requires the lock acquired. But it seems that V3Error::v3errorPrep() (LHS of the operator <<) is always called before any function in the msg list (RHS of the operator <<), however I'm not sure whether it's a UB.

@donlon
Copy link
Contributor Author

donlon commented Aug 29, 2023

There's an article says that functions in chained << operators can be called in any order before C++17. To save another function call, I came up with two solution. The first is to use separate statements to clarify the order, but this cannot be compiled in MSVC (probably unsupported).

v3errorEnd((({std::ostringstream& os = V3Error::v3errorPrep(code); os << msg}), v3errorStr()))

The second is to use a self-called anonymous lambda.

v3errorEnd(
    [&]() -> std::ostringstream& {
        std::ostringstream& os = V3Error::v3errorPrep(code);
        os << msg;
        return os;
    }())

But the lambda cannot be inlined and make the program slightly bigger, so now it's modified to

v3errorEnd((V3Error::v3errorPrep(code), static_cast<std::ostringstream&>(V3Error::v3errorStr() << msg)))

It's just ok.

@wsnyder
Copy link
Member

wsnyder commented Aug 29, 2023

Thanks for the research. Please add a comment near that code summarizing why the comma is needed until C++17. Then looks good to go.

@donlon
Copy link
Contributor Author

donlon commented Aug 29, 2023

Done

@wsnyder wsnyder merged commit cf6566b into verilator:master Aug 29, 2023
40 checks passed
@donlon donlon deleted the pr/v3error-opt branch August 30, 2023 03:34
@donlon donlon changed the title Optimize program size by refactor error reporting routines Optimize program size by refactoring error reporting routines Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants