Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the yk BlockDisambiguate pass. #11

Merged
merged 1 commit into from Nov 17, 2021

Conversation

vext01
Copy link

@vext01 vext01 commented Nov 9, 2021

This is required for yk to unambiguously map machine blocks back to IRblocks.

See the doc comment in BlockDisambiguate.cpp for a detailed description of the problem this solves and how we solve it.

[that comment took many hours to get right -- it's really "detaily" and hard to explain]

A companion PR will come soon. This PR should be merged first.

@ltratt
Copy link

ltratt commented Nov 9, 2021

To make sure that I understand: we split the LLVM IR so that the code gen backend doesn't feel the need to introduce internal control flow?

@vext01
Copy link
Author

vext01 commented Nov 9, 2021

To make sure that I understand: we split the LLVM IR so that the code gen backend doesn't feel the need to introduce internal control flow?

Not quite.

We introduce an extra IR block (and edge to it) for non-internal control flow, so that it can't be confused with internal control flow during the mapping stage.

The problem before is that (e.g.) [bbA, bbA] is either two full runs of the basic block bbA, or just one run where the IR block has code-genned to >1 MBB.

With this pass, [bbA, bbA] is only every interpreted as one run of bbA. If bbA were really run more than once, then we'd see the extra block in-between, i.e. [bbA, bbA-proxy, bbA].

[The method we tried to use before to disambiguate these cases used the addresses of the MBBs to try and decide if it's a true rerun of the block. It turns out that this doesn't always work -- one of the tests in the suite was passing by chance]

Hope that makes sense?

@ltratt
Copy link

ltratt commented Nov 9, 2021

I think I've got it. Maybe silly question: should we fix the code gen so that it never inserts additional control flow or is that impractical / silly / ... ?

@vext01
Copy link
Author

vext01 commented Nov 9, 2021

should we fix the code gen so that it never inserts additional control flow or is that impractical / silly / ... ?

I doubt it will be possible. A few months back I looked at what kinds of operations do this, but I've since forgot.

I think some LLVM IR instructions are high-level enough that they have to lower to extra control flow.

Also finding all of the places where this happens would be hard and maybe platform specific.

@ltratt
Copy link

ltratt commented Nov 9, 2021

I think some LLVM IR instructions are high-level enough that they have to lower to extra control flow.

Right, I think I see where you're going with this.

@@ -0,0 +1,204 @@
//===- BlockDisambiguate.cpp - Unambiguous block mapping for yk ----===//
//
// This pass ensures that yk is able to unambiguously map machine blocks back
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very long (which is good!) but reads like a murder mystery (which is bad). Let's give the reader an intuition up-front about what they'll see. Maybe something like "LLVM IR blocks that branch to themselves lead to multiple machine-level basic blocks that we cannot unambiguously map back to LLVM IR. This pass splits such blocks into two parts such that the resulting machine-level basic blocks do have an unambiguous mapping back to the IR."

Something I also haven't quite got from the comment is the precise preconditions for when this pass will decide to rewrite things. It's probably worth clarifying that somewhere (not necesarilly up-front though).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I squash those last three commits to make the review easier for you?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead.

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

106b29c reworks the comment.

I've added a up-front spoiler to the whodunnit, simplified/shortened some bits, made the text sound more like an informal proof, and added an "alternative approaches" section.

Hopefully this is more coherent?

@ltratt
Copy link

ltratt commented Nov 10, 2021

OK, that helps me a lot -- thanks!

I think I have one last question. Is the problem here that we're trying to deal with someone jumping to a machine basic block that is not the first machine basic block corresponding to an LLVM IR basic block?

In other words, imagine I have an LLVM IR basic block L1 and from it are generated two machine basic blocks M1 and M2. Can anyone ever jump to M2 without having executed M1? [If "yes", I agree with the PR as-is. If "no" then I wonder if an alternative solution is to never try and map something that's not the first machine basic block (i.e. in this case M1).]

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

Is the problem here that we're trying to deal with someone jumping to a machine basic block that is not the first machine basic block corresponding to an LLVM IR basic block?

No. That's not the problem I'm tying to address here. I don't think it can happen.

If "no" then I wonder if an alternative solution is to never try and map something that's not the first machine basic block (i.e. in this case M1).]

There are two related problematic cases related to the fact that calls in LLVM IR don't terminate a block:

  • Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.
  • We require an entry for when a callee returns to a caller.

To expand on the latter, suppose we have ir like:

main() {
  ...stuff
  call f()
  ...more stuff
}

Currently the mapper requires a sequence like:

main:bb0   <- execute "stuff" up until the call to f
f:bb0
...
f:bbN
main:bb0    <- return to main to execute "more stuff"

The second main:bb0 is required to tell the trace compiler that it should resume copying instructions from after the call to f. The trace compiler has to keep track of how far it has gotten through each active IR block on the stack.

[Lukas' block-splitting branch (that we are still unsure if we will merge) aims to simplify this stuff by ensuring that each IR block has at most one call. Then there's no need to keep track of "progress through blocks"]

@ltratt
Copy link

ltratt commented Nov 10, 2021

Lukas' block-splitting branch (that we are still unsure if we will merge) aims to simplify this stuff by ensuring that each IR block has at most one call. Then there's no need to keep track of "progress through blocks"

Would that obviate the need for this current PR then?

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

Would that obviate the need for this current PR then?

Sadly I don't think so.

The example in the big doc-comment has no calls, so Lukas' pass would leave it untouched, and the issue would remain.

@ltratt
Copy link

ltratt commented Nov 10, 2021

There are two related problematic cases related to the fact that calls in LLVM IR don't terminate a block:

  • Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.
  • We require an entry for when a callee returns to a caller.

Devil's advocate: Lukas's patch means that the second case above would no longer be a problem (I think? Meaning we need only handle the first case. But could we say "we throw away the first block in a trace if there's not a precise match"?

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

Lukas's patch means that the second case above would no longer be a problem (I think? Meaning we need only handle the first case.

That's correct.

But could we say "we throw away the first block in a trace if there's not a precise match"?

Not sure I understand. Can you give an example?

@ltratt
Copy link

ltratt commented Nov 10, 2021

But could we say "we throw away the first block in a trace if there's not a precise match"?
Not sure I understand. Can you give an example?

This relates to:

Tracing can be turned on mid-IR-block (in a machine block other than the first). If we didn't map this machine block, then we could miss a chunk of program semantics.

We could (for example) insert a dummy loop after turning PT on, so that we guarantee to record all the bits of the program we care about?

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

Thinking about this some more (and my head is on the verge of exploding), you can only turn on tracing via a call, namely to the control point. So Lukas' pass would in fact help in the case that tracing is turned on mid-IR-block. Since the code after the call to the control point would be in a separate block.

I'm still unsure about what you mean with "throw away the first block in a trace if there's not a precise match" though.

I'm starting to get confused.

@ltratt
Copy link

ltratt commented Nov 10, 2021

OK, so imagine you have this:

BB1
start_tracing()
BB2
BB3

and PT kicks in half-way through BB2. This PR, I think, guarantees that we pick up BB2. What I'm saying is that if PT kicks in half-way through BB2, we should drop that from the trace i.e. we would see it starting at BB3. If we care about BB2, we should insert a "fake" basic block before BB2.

@vext01
Copy link
Author

vext01 commented Nov 10, 2021

Are those blocks high-level IR blocks, or machine blocks. And is this after Lukas' pass has run?

@ltratt
Copy link

ltratt commented Nov 10, 2021

Those are machine basic blocks and after Lukas's PR has merged.

@vext01
Copy link
Author

vext01 commented Nov 17, 2021

After quite a bit of offline discussion, we have agreed to move forward with this, and to maybe optimise it later (see this issue).

Also:

  • s/Proxy/Disambiguation/g
  • Revise long comment using a simpler example, and with ASCII art.

Give this a shot @ltratt

@ltratt
Copy link

ltratt commented Nov 17, 2021

I'm more-or-less fine with this -- except I can't see there being either a) a test which this new pass changes b) a new test which this pass does something specific on. Should I be expecting one of those?

@vext01
Copy link
Author

vext01 commented Nov 17, 2021

a) a test which this new pass changes

On the yk side, several tests were strengthened. These would fail prior to the new pass.

@ltratt
Copy link

ltratt commented Nov 17, 2021

I think we're ready to squash!

@vext01
Copy link
Author

vext01 commented Nov 17, 2021

Great news.

Let's review that one small change on the yk side first, then i will push a squash.

This is required for th yk JIT runtime to unambiguously map machine
blocks back to high-level IR blocks.

See the doc comment in BlockDisambiguate.cpp for a detailed description
of the problem and how we solve it.
@vext01
Copy link
Author

vext01 commented Nov 17, 2021

squashed.

@ltratt
Copy link

ltratt commented Nov 17, 2021

bors r+

@bors
Copy link

bors bot commented Nov 17, 2021

Build succeeded:

@bors bors bot merged commit 0ee4d69 into ykjit:ykjit/13.x Nov 17, 2021
bors bot added a commit that referenced this pull request Jul 25, 2022
11: Add the yk BlockDisambiguate pass. r=ltratt a=vext01

Co-authored-by: Edd Barrett <vext01@gmail.com>
bors bot added a commit that referenced this pull request Aug 4, 2022
11: Add the yk BlockDisambiguate pass. r=ltratt a=vext01

Co-authored-by: Edd Barrett <vext01@gmail.com>
ptersilie pushed a commit that referenced this pull request Aug 8, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants