New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the yk BlockDisambiguate pass. #11
Conversation
To make sure that I understand: we split the LLVM IR so that the code gen backend doesn't feel the need to introduce internal control flow? |
Not quite. We introduce an extra IR block (and edge to it) for non-internal control flow, so that it can't be confused with internal control flow during the mapping stage. The problem before is that (e.g.) With this pass, [The method we tried to use before to disambiguate these cases used the addresses of the MBBs to try and decide if it's a true rerun of the block. It turns out that this doesn't always work -- one of the tests in the suite was passing by chance] Hope that makes sense? |
I think I've got it. Maybe silly question: should we fix the code gen so that it never inserts additional control flow or is that impractical / silly / ... ? |
I doubt it will be possible. A few months back I looked at what kinds of operations do this, but I've since forgot. I think some LLVM IR instructions are high-level enough that they have to lower to extra control flow. Also finding all of the places where this happens would be hard and maybe platform specific. |
Right, I think I see where you're going with this. |
@@ -0,0 +1,204 @@ | |||
//===- BlockDisambiguate.cpp - Unambiguous block mapping for yk ----===// | |||
// | |||
// This pass ensures that yk is able to unambiguously map machine blocks back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is very long (which is good!) but reads like a murder mystery (which is bad). Let's give the reader an intuition up-front about what they'll see. Maybe something like "LLVM IR blocks that branch to themselves lead to multiple machine-level basic blocks that we cannot unambiguously map back to LLVM IR. This pass splits such blocks into two parts such that the resulting machine-level basic blocks do have an unambiguous mapping back to the IR."
Something I also haven't quite got from the comment is the precise preconditions for when this pass will decide to rewrite things. It's probably worth clarifying that somewhere (not necesarilly up-front though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I squash those last three commits to make the review easier for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead.
cc8a196
to
106b29c
Compare
106b29c reworks the comment. I've added a up-front spoiler to the whodunnit, simplified/shortened some bits, made the text sound more like an informal proof, and added an "alternative approaches" section. Hopefully this is more coherent? |
OK, that helps me a lot -- thanks! I think I have one last question. Is the problem here that we're trying to deal with someone jumping to a machine basic block that is not the first machine basic block corresponding to an LLVM IR basic block? In other words, imagine I have an LLVM IR basic block |
No. That's not the problem I'm tying to address here. I don't think it can happen.
There are two related problematic cases related to the fact that calls in LLVM IR don't terminate a block:
To expand on the latter, suppose we have ir like:
Currently the mapper requires a sequence like:
The second [Lukas' block-splitting branch (that we are still unsure if we will merge) aims to simplify this stuff by ensuring that each IR block has at most one call. Then there's no need to keep track of "progress through blocks"] |
Would that obviate the need for this current PR then? |
Sadly I don't think so. The example in the big doc-comment has no calls, so Lukas' pass would leave it untouched, and the issue would remain. |
Devil's advocate: Lukas's patch means that the second case above would no longer be a problem (I think? Meaning we need only handle the first case. But could we say "we throw away the first block in a trace if there's not a precise match"? |
That's correct.
Not sure I understand. Can you give an example? |
This relates to:
We could (for example) insert a dummy loop after turning PT on, so that we guarantee to record all the bits of the program we care about? |
Thinking about this some more (and my head is on the verge of exploding), you can only turn on tracing via a call, namely to the control point. So Lukas' pass would in fact help in the case that tracing is turned on mid-IR-block. Since the code after the call to the control point would be in a separate block. I'm still unsure about what you mean with "throw away the first block in a trace if there's not a precise match" though. I'm starting to get confused. |
OK, so imagine you have this:
and PT kicks in half-way through BB2. This PR, I think, guarantees that we pick up BB2. What I'm saying is that if PT kicks in half-way through BB2, we should drop that from the trace i.e. we would see it starting at BB3. If we care about BB2, we should insert a "fake" basic block before BB2. |
Are those blocks high-level IR blocks, or machine blocks. And is this after Lukas' pass has run? |
Those are machine basic blocks and after Lukas's PR has merged. |
After quite a bit of offline discussion, we have agreed to move forward with this, and to maybe optimise it later (see this issue). Also:
Give this a shot @ltratt |
I'm more-or-less fine with this -- except I can't see there being either a) a test which this new pass changes b) a new test which this pass does something specific on. Should I be expecting one of those? |
On the |
I think we're ready to squash! |
Great news. Let's review that one small change on the yk side first, then i will push a squash. |
This is required for th yk JIT runtime to unambiguously map machine blocks back to high-level IR blocks. See the doc comment in BlockDisambiguate.cpp for a detailed description of the problem and how we solve it.
a7b7b05
to
4bda425
Compare
squashed. |
bors r+ |
Build succeeded: |
We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () #4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 #5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007f2bb3d144ee in ?? () #8 0x746e692f706d742f in ?? () #9 0x692d747065637265 in ?? () #10 0x2f653631326b3034 in ?? () #11 0x646d632e35353532 in ?? () #12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439 (cherry picked from commit d919d02)
This is required for yk to unambiguously map machine blocks back to IRblocks.
See the doc comment in
BlockDisambiguate.cpp
for a detailed description of the problem this solves and how we solve it.[that comment took many hours to get right -- it's really "detaily" and hard to explain]
A companion PR will come soon. This PR should be merged first.