Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QEMU bug with branch delay slots on ARC #54720

Closed
andyross opened this issue Feb 10, 2023 · 24 comments · Fixed by #54721, #54910 or zephyrproject-rtos/sdk-ng#692
Closed

QEMU bug with branch delay slots on ARC #54720

andyross opened this issue Feb 10, 2023 · 24 comments · Fixed by #54721, #54910 or zephyrproject-rtos/sdk-ng#692
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@andyross
Copy link
Contributor

(Submitting this to Zephyr just to have a fix to link with the full description from the workaround patch. We probably want to open a toolchain bug for the broader problem?)

As of recent commits, qemu_arc_em is failing in the tests/kernel/mem_protect/syscalls test with:

START - test_syscall_torture
Running syscall torture test with 4 threads on 1 cpu(s)
E: ***** Exception vector: 0x2, cause code: 0x1, parameter 0x0
E: Address 0x800014a4
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x804009f0 (unknown)
E: Halting system

The proximate cause was (hilariously) that the patch count since the last release candidate had reached 100. This caused the version string printed by the boot banner to be one byte longer and exposed the bug. (I actually got a test rig created where two Zephyr binaries that differed ONLY in whether the last byte of a fake banner string was a "x" or a newline would differ in crash behavior).

But as it turns out that's all just timing interaction. The real problem happens in the heap code, and is a compiler bug. @ruuddw pointed out that the fault (at 0x800014a4) is actually flagging an illegal instruction in a branch delay slot. And indeed, the disassembly below shows that the faulting instruction is a ENTER_S (one of the forbidden instructions), and that the instruction immediately preceding is indeed an unconditional branch!

This is a clear optimizer bug. The generated code can't be allowed to emit a linker section that ends on an instruction with a branch delay slot, because it can't control the instruction the linker will place next. And indeed, stuffing a NOP instruction at the end of the function fixes the symptom

8000146c <merge_chunks>:
{
8000146c:       c2e8                    enter_s [r13-r16,blink]
8000146e:       4748                    mov_s   r15,r2
80001470:       4508                    mov_s   r13,r0
80001472:       4030                    mov_s   r16,r1
        chunksz_t newsz = chunk_size(h, lc) + chunk_size(h, rc);
80001474:       0e1e ffcf               bl      -484    ;80001290 <chunk_size>
80001478:       41e1                    mov_s   r1,r15
8000147a:       4608                    mov_s   r14,r0
8000147c:       40a1                    mov_s   r0,r13
8000147e:       0e16 ffcf               bl      -492    ;80001290 <chunk_size>
        set_chunk_size(h, lc, newsz);
80001482:       4102                    mov_s   r1,r16
        chunksz_t newsz = chunk_size(h, lc) + chunk_size(h, rc);
80001484:       661e                    add_s   r14,r14,r0
        set_chunk_size(h, lc, newsz);
80001486:       40a1                    mov_s   r0,r13
80001488:       42c1                    mov_s   r2,r14
8000148a:       0e4a ffcf               bl      -440    ;800012d0 <set_chunk_siz
e>
        return c + chunk_size(h, c);
8000148e:       41e1                    mov_s   r1,r15
80001490:       40a1                    mov_s   r0,r13
80001492:       0e02 ffcf               bl      -512    ;80001290 <chunk_size>
        chunk_set(h, c, LEFT_SIZE, size);
80001496:       43c1                    mov_s   r3,r14
80001498:       6719                    add_s   r1,r15,r0
8000149a:       704c                    mov_s   r2,0
8000149c:       40a1                    mov_s   r0,r13
}
8000149e:       c2c8                    leave_s [r13-r16,blink]
800014a0:       05d1 ffcf               b       -560    ;80001270 <chunk_set>

800014a4 <free_list_add>:
{
800014a4:       c2e8                    enter_s [r13-r16,blink]
        return big_heap_chunks(h->end_chunk);
800014a6:       80e2                    ld_s    r15,[r0,0x8]
800014a8:       4628                    mov_s   r14,r1
800014aa:       4508                    mov_s   r13,r0
@andyross andyross added the bug The issue is a bug, or the PR is fixing a bug label Feb 10, 2023
@andyross andyross self-assigned this Feb 10, 2023
andyross added a commit to andyross/zephyr that referenced this issue Feb 10, 2023
See comments in code.  The workaround is straightforward once the
issue is understood, but the path to get here was hilariously weird.

Fixes zephyrproject-rtos#54720

Signed-off-by: Andy Ross <andyross@google.com>
@andyross andyross added the priority: high High impact/importance bug label Feb 10, 2023
@andyross
Copy link
Contributor Author

Probably should mark this P1 as it's currently breaking CI

@galak
Copy link
Collaborator

galak commented Feb 10, 2023

Can you open an issue on the SDK to track the toolchain bug.

andyross added a commit to andyross/zephyr that referenced this issue Feb 10, 2023
This is a workaround for a compiler bug on (at least) GCC 12.1.0 in
Zephyr SDK 0.15.1.  The optimizer generates this function with a last
instruction that is an unconditional branch (a tail call into the
chunk_set() handling).  But that means that the NEXT instruction gets
decoded as part of the branch delay slot, but that instruction isn't
part of this function!  Some instructions aren't legal in branch delay
slots.  One of those is ENTER_S, which is a very common entry
instruction for whatever function the linker places after us.  It
seems like the compiler doesn't understand this problem.  Stuff a NOP
in to guarantee the code is legal.

Comment above is duplicated in the code.  The workaround is
straightforward once the issue is understood, but the path to get here
was hilariously weird.

Fixes zephyrproject-rtos#54720

Signed-off-by: Andy Ross <andyross@google.com>
@abrodkin
Copy link
Collaborator

@andyross I'm not sure which instruction in the disassembly above supposedly has a delay-slot?
IIRC instructions with delay-slot have .d suffix, i.e. bl.d is a good example. While in the code above I see only b (branch) & bl (branch and link) instructions.
So to me it looks more of a QEMU bug. BTW, is that possible to attach the failing binary here so that I run it on our proprietary simulator to see what we get there?

@ruuddw
Copy link
Member

ruuddw commented Feb 10, 2023

This needs some more investigations indeed to know the root cause to see if the .d delay slot flag is set or made up by qemu.
The nop workaround still makes sense to unblock CI to me, though.

@andyross
Copy link
Contributor Author

That makes sense too, I suppose. Here's a zephyr.exe
zephyr.elf.gz
file built off of my tree (commit 5222691) with the 0.15.1 SDK. There's nothing special, all the CI builds fail the same way.

nashif pushed a commit that referenced this issue Feb 10, 2023
This is a workaround for a compiler bug on (at least) GCC 12.1.0 in
Zephyr SDK 0.15.1.  The optimizer generates this function with a last
instruction that is an unconditional branch (a tail call into the
chunk_set() handling).  But that means that the NEXT instruction gets
decoded as part of the branch delay slot, but that instruction isn't
part of this function!  Some instructions aren't legal in branch delay
slots.  One of those is ENTER_S, which is a very common entry
instruction for whatever function the linker places after us.  It
seems like the compiler doesn't understand this problem.  Stuff a NOP
in to guarantee the code is legal.

Comment above is duplicated in the code.  The workaround is
straightforward once the issue is understood, but the path to get here
was hilariously weird.

Fixes #54720

Signed-off-by: Andy Ross <andyross@google.com>
@abrodkin
Copy link
Collaborator

That's funny, with enable -singlestep option in QEMU it works:

------ TESTSUITE SUMMARY START ------

SUITE PASS - 100.00% [syscalls]: pass = 8, fail = 0, skip = 0, total = 8 duration = 15.017 seconds
 - PASS - [syscalls.test_arg64] duration = 0.001 seconds
 - PASS - [syscalls.test_more_args] duration = 0.001 seconds
 - PASS - [syscalls.test_string_nlen] duration = 0.002 seconds
 - PASS - [syscalls.test_syscall_context] duration = 0.001 seconds
 - PASS - [syscalls.test_syscall_torture] duration = 15.004 seconds
 - PASS - [syscalls.test_to_copy] duration = 0.002 seconds
 - PASS - [syscalls.test_user_string_alloc_copy] duration = 0.003 seconds
 - PASS - [syscalls.test_user_string_copy] duration = 0.003 seconds

------ TESTSUITE SUMMARY END ------

===================================================================
PROJECT EXECUTION SUCCESSFUL

And as I expected works well in nSIM, so most likely QEMU issue. I'll file an issue in our QEMU repo once I collect all the needed data.

@andyross
Copy link
Contributor Author

It's worth pointing out that this is hugely timing dependent, so the fact that singlestep avoids the bug doesn't say much, IMHO. The function in question is (logically) branchless, and the heap code absolutely hits it many times in this binary. If I had to guess, it's something like "qemu is returning from an interrupt at the right moment and calling some function to re-populate instruction state", and that particular case mixed up the .d and not-.d instruction variants or something.

@andyross
Copy link
Contributor Author

Oh, and we should probably continue discussion in the still-open toolchain issue and not this one, which just got closed. zephyrproject-rtos/sdk-ng#627

@npitre
Copy link
Collaborator

npitre commented Feb 13, 2023

I'm reopening this. As I alluded to in PR #54721 the merged workaround
is a bad one and issue simply moved around it.

See https://github.com/zephyrproject-rtos/zephyr/actions/runs/4161101126/jobs/7198752243
for the new crash location. Still in the same test but completely
unrelated to the heap code this time.

@ruuddw
Copy link
Member

ruuddw commented Feb 13, 2023

Indeed, in our internal daily regression testing the issue popped up again.
Note that this is likely not a code generation bug, but an issue in qemu.

@npitre
Copy link
Collaborator

npitre commented Feb 13, 2023 via email

asemjonovs pushed a commit to asemjonovs/zephyr that referenced this issue Feb 13, 2023
This is a workaround for a compiler bug on (at least) GCC 12.1.0 in
Zephyr SDK 0.15.1.  The optimizer generates this function with a last
instruction that is an unconditional branch (a tail call into the
chunk_set() handling).  But that means that the NEXT instruction gets
decoded as part of the branch delay slot, but that instruction isn't
part of this function!  Some instructions aren't legal in branch delay
slots.  One of those is ENTER_S, which is a very common entry
instruction for whatever function the linker places after us.  It
seems like the compiler doesn't understand this problem.  Stuff a NOP
in to guarantee the code is legal.

Comment above is duplicated in the code.  The workaround is
straightforward once the issue is understood, but the path to get here
was hilariously weird.

Fixes zephyrproject-rtos#54720

Signed-off-by: Andy Ross <andyross@google.com>
@stephanosio
Copy link
Member

@abrodkin
Copy link
Collaborator

@stephanosio we're looking into it. In the meantime we'll try to submit a temporary work-around with -singlestep passed to QEMU.

@andyross
Copy link
Contributor Author

Just to duplicate a comment from discord: I can also suppress results with CONFIG_QEMU_ICOUNT=n. Both techniques are probably just changing timing, but kconfig can be done on a per-test level and might be more narrowly targetted.

@stephanosio stephanosio removed the Release Blocker Use this label for justified release blockers label Feb 15, 2023
@stephanosio
Copy link
Member

Taking off the release blocker label since, according to @ruuddw, this issue is not observed in real hardware and seems to be a QEMU-specific issue.

Unless one of the maintainers can come up with a workaround by tomorrow, I will send a PR to temporarily disable the affected test-platform.

@npitre
Copy link
Collaborator

npitre commented Feb 15, 2023 via email

npitre pushed a commit to npitre/zephyr that referenced this issue Feb 15, 2023
Use -fno-optimize-sibling-calls on QEMU builds.

This is a workaround to make tests/kernel/mem_protect/syscalls (and
possibly others) work.

Without this, result is:

|START - test_syscall_torture
|Running syscall torture test with 4 threads on 1 cpu(s)
|E: ***** Exception vector: 0x2, cause code: 0x1, parameter 0x0
|E: Address 0x80006e48
|E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
|E: Current thread: 0x8040095c (unknown)
|E: Halting system

With this commit applied:

|START - test_syscall_torture
|Running syscall torture test with 4 threads on 1 cpu(s)
|
| PASS - test_syscall_torture in 15.004 seconds
|[...]
|PROJECT EXECUTION SUCCESSFUL

Fixes zephyrproject-rtos#54720

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
@npitre
Copy link
Collaborator

npitre commented Feb 15, 2023

Please consider merging PR #54876

@npitre npitre changed the title Code generation bug with branch delay slots on ARC QEMU bug with branch delay slots on ARC Feb 15, 2023
@npitre
Copy link
Collaborator

npitre commented Feb 15, 2023

Renamed this issue per overall consensus.

evgeniy-paltsev added a commit to evgeniy-paltsev/zephyr that referenced this issue Feb 16, 2023
…tos#54720

Disable tests/kernel/mem_protect/syscalls for qemu_arc_em where
we trigger ARC QEMU bug which cause illegal instruction exception
on perfectly valid ARC code.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Evgeniy Paltsev <PaltsevEvgeniy@gmail.com>
stephanosio pushed a commit that referenced this issue Feb 16, 2023
Disable tests/kernel/mem_protect/syscalls for qemu_arc_em where
we trigger ARC QEMU bug which cause illegal instruction exception
on perfectly valid ARC code.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Evgeniy Paltsev <PaltsevEvgeniy@gmail.com>
@stephanosio stephanosio added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels Feb 16, 2023
@stephanosio
Copy link
Member

Re-opening since #54910 is only a workaround for this issue until a solution (QEMU bug fix in a new Zephyr SDK release) is provided.

@stephanosio stephanosio reopened this Feb 16, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Apr 18, 2023
@ruuddw
Copy link
Member

ruuddw commented Apr 18, 2023

Still open, see ARC QEMU issue.

@ruuddw ruuddw removed the Stale label Apr 18, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jun 18, 2023
@abrodkin abrodkin removed the Stale label Jun 19, 2023
@abrodkin
Copy link
Collaborator

For the record, we're actively working on significant re-work on delay-slot related functionality in QEMU for ARC. Apparently this is much more complex work as one may think and take a bit more time than expected.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment