Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace try/catch/rethrow in unwind with unwind_guard #309

Closed
wants to merge 10 commits into from

Conversation

wravery
Copy link
Contributor

@wravery wravery commented Mar 18, 2022

I noticed while investigating microsoft/cppgraphqlgen#222 that parse_tree implements unwind on the make_control<>::state_handler<> struct, and this results in creating a try/catch/rethrow block around every rule match in match_control_unwind. Every try/catch/rethrow seems to be consuming significant stack space, so if you raise/throw an exception from a relatively shallow (< 20) nested rule, it often exhausts the stack.

This change makes it so parse_tree will only implement unwind on the Control type wrapper if either the Control or the Node type implements it. When parsing into a tree that doesn't implement either of them, the exception will be thrown once all the way from the initial raise to the caller of parse_tree::parse.

Since the unit tests use the default node which inherits basic_node, they still implement unwind. The unit tests also wrap the rule in a try_catch_type handler, though, so the exception doesn't bubble all the way up. Before I nailed down the constexpr bool definitions, I accidentally suppressed unwind for basic_node types in the unit tests as well, and the lack of stack.pop_back() calls resulted in the assert(stack.size() == 1) firing for the unit test. So that tells me this is incompatible with try_catch_type. That's the only thing I think might block it.

@d-frey d-frey self-assigned this Mar 18, 2022
@d-frey
Copy link
Member

d-frey commented Mar 18, 2022

As you noticed, skipping the unwind() makes the assert(stack.size() == 1) fire. Conceptually, the state.pop_back() is necessary for the parse tree to work correctly, it can't just be skipped. In its current form, we therefore can not accept this PR.

However, you do have a point here, all those try/catch-blocks are something to look into and they do provide an optimization opportunity. So, what could be done? I can see some potential by checking if a grammar rule (including all reachable sub-rules) can throw an exception. This could be used to skip unwind properly. I'll see if I can come up with something.

@wravery
Copy link
Contributor Author

wravery commented Mar 18, 2022

Yeah, partly this change is predicated on letting the exception out to the caller, in which case you don't need to unwind because the stack will just be deleted and the assert is never reached. But to rely on that, you need to know that there won't be any other catch blocks that turn an exception into a regular return, like try_catch_type.

What I'm probably going to do for my own project is copy enough of the parse_tree implementation to implement a non-unwinding version. My grammar and custom ast_node implementation don't require any try_catch_type handling or per-frame unwinding, and it made a pretty dramatic difference in how far I could get without overflowing. Originally it overflowed at a depth of 18 or so, only when throwing an exception from an if_must rule. Now it can successfully throw an exception without overflowing the stack from a depth of at least 100 (I stopped at that point, I'm also going to add a depth limit since it can't remain completely unbounded).

@wravery
Copy link
Contributor Author

wravery commented Mar 18, 2022

Maybe it would be better to define an RAII unwind_guard which will pop the stack in its destructor. You'd then need to disarm it in case an exception is not thrown and you don't want to pop the stack, but then you wouldn't need an explicit try/catch block, it could just perform regular stack unwinding.

Edit: This would actually apply to the core match_control_unwind method and not to anything in parse_tree specifically.

@wravery
Copy link
Contributor Author

wravery commented Mar 19, 2022

Here you go, how about something like this?

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2022

Codecov Report

Merging #309 (a2e3cc3) into main (e3c8cb4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #309   +/-   ##
=======================================
  Coverage   99.68%   99.68%           
=======================================
  Files         253      254    +1     
  Lines        5083     5088    +5     
=======================================
+ Hits         5067     5072    +5     
  Misses         16       16           
Impacted Files Coverage Δ
include/tao/pegtl/internal/unwind_guard.hpp 100.00% <100.00%> (ø)
include/tao/pegtl/match.hpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3c8cb4...a2e3cc3. Read the comment docs.

@d-frey
Copy link
Member

d-frey commented Mar 20, 2022

I like the idea about the unwind_guard, but it seems independent of the rest, so I'd like to see this as a separate PR.

Also, I would like to get some idea of how it improves code on all major compilers (GCC, Clang, and MSVC), as we don't want to improve code on one compiler on the expense of others. I ran some small checks with GCC so far and sometimes the binaries became larger, sometimes they became smaller. With Clang, there was a lot more overhead and most binaries grew in size.

Minor details: The capture list can be shortened to just [&] instead of [&in, &st...] and the <functional> include should be <utility>.

@d-frey
Copy link
Member

d-frey commented Mar 20, 2022

One more thing that I should probably mention: We don't want to check the code with Debug mode, as the PEGTL is heavily templated and relies on the optimizer to generate reasonable code. Anything compiled without optimizations is best effort, at best. We should probably also put this in the documentation in the "Requirements" section. @ColinH Would you like to formulate something?

@wravery
Copy link
Contributor Author

wravery commented Apr 21, 2022

One more thing that I should probably mention: We don't want to check the code with Debug mode, as the PEGTL is heavily templated and relies on the optimizer to generate reasonable code. Anything compiled without optimizations is best effort, at best. We should probably also put this in the documentation in the "Requirements" section. @ColinH Would you like to formulate something?

FWIW, the same stack overflow occurs in release builds using MSVC (VS2022) on my machine. I don't think the release optimizations can save it from this particular issue.

@wravery
Copy link
Contributor Author

wravery commented Apr 21, 2022

I like the idea about the unwind_guard, but it seems independent of the rest, so I'd like to see this as a separate PR.

OK, will do.

Also, I would like to get some idea of how it improves code on all major compilers (GCC, Clang, and MSVC), as we don't want to improve code on one compiler on the expense of others. I ran some small checks with GCC so far and sometimes the binaries became larger, sometimes they became smaller. With Clang, there was a lot more overhead and most binaries grew in size.

Would you prefer an #ifdef for MSVC? I think the stack overflow I'm trying to mitigate is likely specific to the code that compiler generates for catching and rethrowing exceptions.

Minor details: The capture list can be shortened to just [&] instead of [&in, &st...] and the <functional> include should be <utility>.

OK. 👍🏼

@wravery
Copy link
Contributor Author

wravery commented Apr 21, 2022

What I'm probably going to do for my own project is copy enough of the parse_tree implementation to implement a non-unwinding version. My grammar and custom ast_node implementation don't require any try_catch_type handling or per-frame unwinding, and it made a pretty dramatic difference in how far I could get without overflowing. Originally it overflowed at a depth of 18 or so, only when throwing an exception from an if_must rule. Now it can successfully throw an exception without overflowing the stack from a depth of at least 100 (I stopped at that point, I'm also going to add a depth limit since it can't remain completely unbounded).

I have a configurable depth limit and a prototype of a custom parse_tree fork now, but I'm going to abandon the custom implementation of parse_tree in favor of this PR. I'd rather not diverge like that from PEGTL.

@wravery wravery changed the title Avoid try/catch/rethrow if unwind is unimplemented Replace try/catch/rethrow in unwind with unwind_guard Apr 21, 2022
@ColinH
Copy link
Member

ColinH commented May 14, 2022

@d-frey Can we merge this as is?

@d-frey
Copy link
Member

d-frey commented May 14, 2022

I think we already talked about this, especially benchmarks. Are we sure this PR would not introduce drawbacks for GCC/Clang?

@wravery
Copy link
Contributor Author

wravery commented May 14, 2022

I think we already talked about this, especially benchmarks. Are we sure this PR would not introduce drawbacks for GCC/Clang?

I've been playing around with it on Compiler Explorer, and with -O2 it seems to slightly shrink the code for GCC (12 lines of assembly vs. 10 lines), and it grows the code by about 58% for Clang (31 lines of assembly vs. 49 lines). It's definitely an improvement for MSVC at runtime just by not causing a stack overflow, no matter what the size difference might be. Overall, I'd argue it's a wash between GCC and Clang (which seems to generate a lot more code than GCC to begin with).

N.B. A side benefit of #313 is that it'll remove this code in parse_tree entirely in cases where a Node::unwind declaration is omitted.

@ColinH
Copy link
Member

ColinH commented Jun 28, 2022

Here's one finding after finally playing around with this for a bit, on a Mac with M1 Max running Monterey.

With a slightly modified json_parse example that adds a couple of unwind() functions to the control, both compilers produce identical binaries, no difference between try-catch and unwind_guard.

So, at least for Clang and GCC, it doesn't seem to matter, since, at least in this simple case, the optimisers sees right through the supposed "difference".

PS: On this platform Clang consistently produced much smaller binaries than GCC 11.

@d-frey
Copy link
Member

d-frey commented Jun 28, 2022

Thanks Colin for testing. @wravery Can you update this PR so I can merge it? Thanks!

@d-frey d-frey closed this in ad71c24 Jun 29, 2022
d-frey added a commit that referenced this pull request Jun 29, 2022
@d-frey
Copy link
Member

d-frey commented Jun 29, 2022

I merged it manually, I'll do a release of the 3.x branch shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants