Add the yk BlockDisambiguate pass. #11

vext01 · 2021-11-09T11:05:06Z

This is required for yk to unambiguously map machine blocks back to IRblocks.

See the doc comment in BlockDisambiguate.cpp for a detailed description of the problem this solves and how we solve it.

[that comment took many hours to get right -- it's really "detaily" and hard to explain]

A companion PR will come soon. This PR should be merged first.

ltratt · 2021-11-09T14:44:38Z

To make sure that I understand: we split the LLVM IR so that the code gen backend doesn't feel the need to introduce internal control flow?

vext01 · 2021-11-09T14:54:07Z

To make sure that I understand: we split the LLVM IR so that the code gen backend doesn't feel the need to introduce internal control flow?

Not quite.

We introduce an extra IR block (and edge to it) for non-internal control flow, so that it can't be confused with internal control flow during the mapping stage.

The problem before is that (e.g.) [bbA, bbA] is either two full runs of the basic block bbA, or just one run where the IR block has code-genned to >1 MBB.

With this pass, [bbA, bbA] is only every interpreted as one run of bbA. If bbA were really run more than once, then we'd see the extra block in-between, i.e. [bbA, bbA-proxy, bbA].

[The method we tried to use before to disambiguate these cases used the addresses of the MBBs to try and decide if it's a true rerun of the block. It turns out that this doesn't always work -- one of the tests in the suite was passing by chance]

Hope that makes sense?

ltratt · 2021-11-09T16:24:02Z

I think I've got it. Maybe silly question: should we fix the code gen so that it never inserts additional control flow or is that impractical / silly / ... ?

vext01 · 2021-11-09T16:31:15Z

should we fix the code gen so that it never inserts additional control flow or is that impractical / silly / ... ?

I doubt it will be possible. A few months back I looked at what kinds of operations do this, but I've since forgot.

I think some LLVM IR instructions are high-level enough that they have to lower to extra control flow.

Also finding all of the places where this happens would be hard and maybe platform specific.

ltratt · 2021-11-09T20:41:26Z

I think some LLVM IR instructions are high-level enough that they have to lower to extra control flow.

Right, I think I see where you're going with this.

ltratt · 2021-11-09T20:44:54Z

llvm/lib/Transforms/Yk/BlockDisambiguate.cpp

@@ -0,0 +1,204 @@
+//===- BlockDisambiguate.cpp - Unambiguous block mapping for yk ----===//
+//
+// This pass ensures that yk is able to unambiguously map machine blocks back


This comment is very long (which is good!) but reads like a murder mystery (which is bad). Let's give the reader an intuition up-front about what they'll see. Maybe something like "LLVM IR blocks that branch to themselves lead to multiple machine-level basic blocks that we cannot unambiguously map back to LLVM IR. This pass splits such blocks into two parts such that the resulting machine-level basic blocks do have an unambiguous mapping back to the IR."

Something I also haven't quite got from the comment is the precise preconditions for when this pass will decide to rewrite things. It's probably worth clarifying that somewhere (not necesarilly up-front though).

Can I squash those last three commits to make the review easier for you?

vext01 · 2021-11-10T13:15:53Z

106b29c reworks the comment.

I've added a up-front spoiler to the whodunnit, simplified/shortened some bits, made the text sound more like an informal proof, and added an "alternative approaches" section.

Hopefully this is more coherent?

ltratt · 2021-11-10T14:03:59Z

OK, that helps me a lot -- thanks!

I think I have one last question. Is the problem here that we're trying to deal with someone jumping to a machine basic block that is not the first machine basic block corresponding to an LLVM IR basic block?

In other words, imagine I have an LLVM IR basic block L1 and from it are generated two machine basic blocks M1 and M2. Can anyone ever jump to M2 without having executed M1? [If "yes", I agree with the PR as-is. If "no" then I wonder if an alternative solution is to never try and map something that's not the first machine basic block (i.e. in this case M1).]

vext01 · 2021-11-10T14:21:44Z

Is the problem here that we're trying to deal with someone jumping to a machine basic block that is not the first machine basic block corresponding to an LLVM IR basic block?

No. That's not the problem I'm tying to address here. I don't think it can happen.

If "no" then I wonder if an alternative solution is to never try and map something that's not the first machine basic block (i.e. in this case M1).]

There are two related problematic cases related to the fact that calls in LLVM IR don't terminate a block:

Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.
We require an entry for when a callee returns to a caller.

To expand on the latter, suppose we have ir like:

main() {
  ...stuff
  call f()
  ...more stuff
}

Currently the mapper requires a sequence like:

main:bb0   <- execute "stuff" up until the call to f
f:bb0
...
f:bbN
main:bb0    <- return to main to execute "more stuff"

The second main:bb0 is required to tell the trace compiler that it should resume copying instructions from after the call to f. The trace compiler has to keep track of how far it has gotten through each active IR block on the stack.

[Lukas' block-splitting branch (that we are still unsure if we will merge) aims to simplify this stuff by ensuring that each IR block has at most one call. Then there's no need to keep track of "progress through blocks"]

ltratt · 2021-11-10T14:29:16Z

Lukas' block-splitting branch (that we are still unsure if we will merge) aims to simplify this stuff by ensuring that each IR block has at most one call. Then there's no need to keep track of "progress through blocks"

Would that obviate the need for this current PR then?

vext01 · 2021-11-10T14:36:36Z

Would that obviate the need for this current PR then?

Sadly I don't think so.

The example in the big doc-comment has no calls, so Lukas' pass would leave it untouched, and the issue would remain.

ltratt · 2021-11-10T14:38:26Z

There are two related problematic cases related to the fact that calls in LLVM IR don't terminate a block:

Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.

We require an entry for when a callee returns to a caller.

Devil's advocate: Lukas's patch means that the second case above would no longer be a problem (I think? Meaning we need only handle the first case. But could we say "we throw away the first block in a trace if there's not a precise match"?

vext01 · 2021-11-10T14:51:33Z

Lukas's patch means that the second case above would no longer be a problem (I think? Meaning we need only handle the first case.

That's correct.

But could we say "we throw away the first block in a trace if there's not a precise match"?

Not sure I understand. Can you give an example?

ltratt · 2021-11-10T14:53:39Z

But could we say "we throw away the first block in a trace if there's not a precise match"?
Not sure I understand. Can you give an example?

This relates to:

Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.

We could (for example) insert a dummy loop after turning PT on, so that we guarantee to record all the bits of the program we care about?

vext01 · 2021-11-10T15:10:05Z

Thinking about this some more (and my head is on the verge of exploding), you can only turn on tracing via a call, namely to the control point. So Lukas' pass would in fact help in the case that tracing is turned on mid-IR-block. Since the code after the call to the control point would be in a separate block.

I'm still unsure about what you mean with "throw away the first block in a trace if there's not a precise match" though.

I'm starting to get confused.

ltratt · 2021-11-10T15:13:52Z

OK, so imagine you have this:

BB1
start_tracing()
BB2
BB3

and PT kicks in half-way through BB2. This PR, I think, guarantees that we pick up BB2. What I'm saying is that if PT kicks in half-way through BB2, we should drop that from the trace i.e. we would see it starting at BB3. If we care about BB2, we should insert a "fake" basic block before BB2.

vext01 · 2021-11-10T15:16:50Z

Are those blocks high-level IR blocks, or machine blocks. And is this after Lukas' pass has run?

ltratt · 2021-11-10T15:17:08Z

Those are machine basic blocks and after Lukas's PR has merged.

vext01 · 2021-11-17T12:02:12Z

After quite a bit of offline discussion, we have agreed to move forward with this, and to maybe optimise it later (see this issue).

Also:

s/Proxy/Disambiguation/g
Revise long comment using a simpler example, and with ASCII art.

Give this a shot @ltratt

ltratt · 2021-11-17T13:47:56Z

I'm more-or-less fine with this -- except I can't see there being either a) a test which this new pass changes b) a new test which this pass does something specific on. Should I be expecting one of those?

vext01 · 2021-11-17T13:52:44Z

a) a test which this new pass changes

On the yk side, several tests were strengthened. These would fail prior to the new pass.

ltratt · 2021-11-17T14:13:13Z

I think we're ready to squash!

vext01 · 2021-11-17T14:16:34Z

Great news.

Let's review that one small change on the yk side first, then i will push a squash.

This is required for th yk JIT runtime to unambiguously map machine blocks back to high-level IR blocks. See the doc comment in BlockDisambiguate.cpp for a detailed description of the problem and how we solve it.

vext01 · 2021-11-17T15:00:59Z

squashed.

ltratt · 2021-11-17T16:58:09Z

bors r+

bors · 2021-11-17T17:22:31Z

Build succeeded:

buildbot/yk-buildbot-build-script-hw

11: Add the yk BlockDisambiguate pass. r=ltratt a=vext01 Co-authored-by: Edd Barrett <vext01@gmail.com>

We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () #4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 #5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007f2bb3d144ee in ?? () #8 0x746e692f706d742f in ?? () #9 0x692d747065637265 in ?? () #10 0x2f653631326b3034 in ?? () #11 0x646d632e35353532 in ?? () #12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439 (cherry picked from commit d919d02)

vext01 assigned ltratt Nov 9, 2021

vext01 mentioned this pull request Nov 9, 2021

Mapper changes due to the BlockDisambiguate pass. ykjit/yk#434

Merged

ltratt reviewed Nov 9, 2021

View reviewed changes

vext01 force-pushed the block-disambiguate branch from cc8a196 to 106b29c Compare November 10, 2021 13:14

Add the yk BlockDisambiguate pass.

4bda425

This is required for th yk JIT runtime to unambiguously map machine blocks back to high-level IR blocks. See the doc comment in BlockDisambiguate.cpp for a detailed description of the problem and how we solve it.

vext01 force-pushed the block-disambiguate branch from a7b7b05 to 4bda425 Compare November 17, 2021 15:00

bors bot merged commit 0ee4d69 into ykjit:ykjit/13.x Nov 17, 2021

bors bot added a commit that referenced this pull request Jul 25, 2022

Merge #11

fd7b5b9

11: Add the yk BlockDisambiguate pass. r=ltratt a=vext01 Co-authored-by: Edd Barrett <vext01@gmail.com>

bors bot added a commit that referenced this pull request Aug 4, 2022

Merge #11

b2bbd83

11: Add the yk BlockDisambiguate pass. r=ltratt a=vext01 Co-authored-by: Edd Barrett <vext01@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the yk BlockDisambiguate pass. #11

Add the yk BlockDisambiguate pass. #11

vext01 commented Nov 9, 2021 •

edited

ltratt commented Nov 9, 2021

vext01 commented Nov 9, 2021

ltratt commented Nov 9, 2021

vext01 commented Nov 9, 2021 •

edited

ltratt commented Nov 9, 2021

ltratt Nov 9, 2021

vext01 Nov 10, 2021

ltratt Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

vext01 commented Nov 17, 2021 •

edited

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

bors bot commented Nov 17, 2021

Add the yk BlockDisambiguate pass. #11

Add the yk BlockDisambiguate pass. #11

Conversation

vext01 commented Nov 9, 2021 • edited

ltratt commented Nov 9, 2021

vext01 commented Nov 9, 2021

ltratt commented Nov 9, 2021

vext01 commented Nov 9, 2021 • edited

ltratt commented Nov 9, 2021

ltratt Nov 9, 2021

Choose a reason for hiding this comment

vext01 Nov 10, 2021

Choose a reason for hiding this comment

ltratt Nov 10, 2021

Choose a reason for hiding this comment

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 10, 2021

ltratt commented Nov 10, 2021

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

vext01 commented Nov 17, 2021 • edited

vext01 commented Nov 17, 2021

ltratt commented Nov 17, 2021

bors bot commented Nov 17, 2021

vext01 commented Nov 9, 2021 •

edited

vext01 commented Nov 9, 2021 •

edited

vext01 commented Nov 17, 2021 •

edited