Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP-32 preemption regressions with asm2 #6346

Closed
andyross opened this issue Feb 23, 2018 · 3 comments
Closed

ESP-32 preemption regressions with asm2 #6346

andyross opened this issue Feb 23, 2018 · 3 comments
Assignees
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@andyross
Copy link
Contributor

In some cases (though apparently not all) the new asm2 code on esp32 is failing in situations where it is asked to switch back to a previously-interrupted thread from an otherwise working one. See #6339 for one report.

The most common case happens with the idle thread, which will be correctly interrupted and switched away from (e.g. a timer interrupt will wake up a blocked thread), but then the system will hang (or sometimes report a spurious exception) when switching back into idle.

There's a heisenbug quality to this too, where calling printk() inside the interrupt handler will magically "fix" the problem. My current theory is that the window spill code is broken somehow: calling into a complicated handler like printk() will flush the register windows for us and make the spill code a noop. The spill code is different between ESP-32 (which has 64 registers) and qemu (32), which might explain the difference.

This also is consistent with the "failure on switch back to idle" behavior -- a bad spill would only affect the old thread and not the new one.

@andyross andyross self-assigned this Feb 23, 2018
@andyross andyross added bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug area: Xtensa Xtensa Architecture labels Feb 23, 2018
@locomuco
Copy link
Contributor

@andyross what ESP32 toolchain version is currently supported by zephyr? the documentation always points to the latest esp32 toolchain

@linkmeyer linkmeyer added this to the v1.11.0 milestone Mar 5, 2018
andyross pushed a commit to andyross/zephyr that referenced this issue Mar 5, 2018
When returning into a different thread than we interrupted, we
obviously need to spill all the existing register windows to make sure
all their values are in the old thread's stack.  But the code to do
this forgot to reset the current stack pointer to the value it had at
interrupt time (it was still pointing to the saved context below
that), so the caller of the interrupted function was spilling to the
wrong spot.

This wouldn't show up as an instant failure, it would only happen when
switching BACK to the improperly-spilled thread.  And even then it
would be a noop if the original interrupt handler was deep enough to
have spilled that function naturally.

In practice, this happened only in some instances on ESP-32 (which has
more windowed registers than qemu) when interrupting the idle thread
(which is very shallow) with a (very simple) timer interrupt.  Trivial
to see, hard to find.

See zephyrproject-rtos#6346 for more
detail.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
@andyross
Copy link
Contributor Author

andyross commented Mar 5, 2018

Got it. It wasn't the spill code, which had a decent unit test and was pretty simple. It wasn't the cross-stack call code, which was likewise testable in isolation but a little more subtle. It was the really obvious-looking glue between the two. Cross-stack calls return with the stack pointer still pointing to the restore area, which lives below the interrupted stack. But the register spills need the stack pointer to point to the actual function that got interrupted, otherwise its CALLER (xtensa register windows are kinda crazy) will spill to the wrong spot.

Turns out qemu never managed to hit this case because it has half as many windowed registers as ESP-32.

@andyross
Copy link
Contributor Author

andyross commented Mar 5, 2018

@locomuco Sorry, missed your question. We support only the official SDK toolchain from Esspressif on ESP-32. @lpereira can give more context, but broadly Xtensa toolchains are not portable between targets, they are hard-configured to assume specific hardware features. Part of the output of their CPU generator is an "overlay" source package you have to apply to a specific version of binutils/gcc, making a generic toolchain really hard to manage.

nashif pushed a commit that referenced this issue Mar 6, 2018
When returning into a different thread than we interrupted, we
obviously need to spill all the existing register windows to make sure
all their values are in the old thread's stack.  But the code to do
this forgot to reset the current stack pointer to the value it had at
interrupt time (it was still pointing to the saved context below
that), so the caller of the interrupted function was spilling to the
wrong spot.

This wouldn't show up as an instant failure, it would only happen when
switching BACK to the improperly-spilled thread.  And even then it
would be a noop if the original interrupt handler was deep enough to
have spilled that function naturally.

In practice, this happened only in some instances on ESP-32 (which has
more windowed registers than qemu) when interrupting the idle thread
(which is very shallow) with a (very simple) timer interrupt.  Trivial
to see, hard to find.

See #6346 for more
detail.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants