Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: SMP: Two threads synchronize failed using mutex or semaphore while both doing irq_lock() #33551

Closed
enjiamai opened this issue Mar 22, 2021 · 5 comments
Assignees
Labels
area: Kernel area: SMP Symmetric multiprocessing area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@enjiamai
Copy link
Collaborator

enjiamai commented Mar 22, 2021

Describe the bug

Given two threads synchronize by using mutex or semaphore, under they both lock interrupt by irq_lock(), it would hang up when the second thread call k_mutex_lock() or k_sem_take().
Please see the new added test case in PR #31010
,which are test_smp_irq_lock_sem() and test_smp_irq_lock_sem().

We know zehpyr has such a mechanism:
while call irq_lock(), interrupts remain disabled only while the thread is running. If the thread performs an operation that allows another thread to run (for example, giving a semaphore or sleeping for N milliseconds), the interrupt lock no longer applies, and
interrupts may be re-enabled while other processing occurs.

We were not sure if this is normal behavior or a correct user case, so please help to clarify this. ​

To Reproduce
Steps to reproduce the behavior:

  1. Apply the patch or cherry-pick the un-merged test case in PR tests: smp: add some module and integration test cases #31010
  2. west build tests/kernel/smp -p auto -b qemu_x86_64 -t run
    or twister -T tests/kernel/smp (failed on qemu_x86_64, qemu_x86_64_kpti, qemu_cortex_a53_smp)

Expected behavior
test cases test_smp_irq_lock_sem() and test_smp_irq_lock_sem() pass as expected.

Impact
If there are user scenarios like this, two threads both lock interrupt and they sync synchronize by using mutex or semaphore, would hang up.

Logs and console output

===================================================================
START - test_smp_irq_lock_sleep
(cpu0)t0 hold the lock
(cpu0)t0 after sleep
(cpu0)t0 after unlock
(cpu1)t1 hold the lock
(cpu0)t1 after unlock
 PASS - test_smp_irq_lock_sleep in 1.29 seconds
===================================================================
**START - test_smp_irq_lock_sem**

(cpu0)t0 hold the lock

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/smp/src/main.c:842: test_smp_irq_lock_sem: (k_sem_take(&smp_sem, K_MSEC(5000)) not equal to 0)
take the semaphore failed
 FAIL - test_smp_irq_lock_sem in 5.16 seconds
===================================================================
**START - test_smp_irq_lock_mutex**
(cpu0)t0 hold the mutex
(cpu0)t0 hold the lock
(cpu0)t0 after mutex unlock
(cpu0)t0 after mutex lock again
(cpu0)t0 after unlock
(cpu1)t1 hold the lock
(cpu0)t1 hold the lock

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/smp/src/main.c:860: t_try_lock_mutex: (k_mutex_lock(&smutex, K_MSEC(5000)) not equal to 0)
ASSERTION FAIL [!z_smp_cpu_mobile()] @ WEST_TOPDIR/zephyr/kernel/sched.c:1365
E: RAX: 0x0000000000000004 RBX: 0x000000000010aa65 RCX: 0x0000000000000001 RDX: 0x0000000000000001
E: RSI: 0x0000000000000555 RDI: 0x000000000010bdb0 RBP: 0x0000000000000046 RSP: 0x0000000000114048
E:  R8: 0x0000000000000001  R9: 0x0000000000113e00 R10: 0x00000000ffffffff R11: 0x0000000000000031
E: R12: 0x000000000010aa4f R13: 0x000000000000035c R14: 0x000000000010a250 R15: 0x0000000000000000
E: RSP: 0x0000000000114048 RFLAGS: 0x0000000000000206 CS: 0x0018 CR3: 0x0000000000136000
E: RIP: 0x0000000000102627
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 1
E: Current thread: 0x10d020 (unknown)
Caught system error -- reason 4 0
Fatal error was unexpected, aborting...

Environment (please complete the following information):

  • OS: Linux
  • Toolchain Zephyr SDK 0.12.3
  • Commit SHA : 1ca99cd9ab3106a8a77d43ef43581e046f22861b

Additional context
This is an unmerged test case.

@enjiamai enjiamai added bug The issue is a bug, or the PR is fixing a bug area: SMP Symmetric multiprocessing area: Tests Issues related to a particular existing or missing test labels Mar 22, 2021
@enjiamai
Copy link
Collaborator Author

@LeiW000 FYI, please.

@andyross
Copy link
Contributor

This was in fact a design bug in the older kernel (and the nanokernel before it), where it was "legal" (but never really "correct") to sleep with an irq lock held, for the simple reason that it wasn't possible for the kernel to detect and enforce it. We don't allow this anymore, because spinlocks are non-recursive (though there's a separate bug tracking the fact that we should detect and assert on this, which we don't). But the legacy irq_lock() API is sort of a gray area as to what is legal.

You can't sleep with a lock. Don't do that.

@enjiamai
Copy link
Collaborator Author

enjiamai commented Mar 22, 2021

Thank you so much for clarify this, @andyross . I saw issue #33319 you filed earlier, and I think I will remove the test case which tries to do sleep while irq locked. Maybe it could become a negative test case after the assertion being added. So, do mutex and semaphore also can not be used in this way, too?

@andyross
Copy link
Contributor

Mutexes and semaphores are blocking constructs already. Sleeping with one held (strictly: a semaphore can't be "owned", but you can use it like a lock if it has only one token) is a well-defined operation and does exactly what you would expect. The problem with sleeping with an irq lock held is that it "release" the lock, but that's not true for IPC primitives. So that is legal. It's also a very, very, very bad idea. But application bugs and misfeatures aren't the kernel's job to enforce.

Closing this one, as I think the question has been answered.

@enjiamai
Copy link
Collaborator Author

Ok, I think this can be closed. Thanks for your explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel area: SMP Symmetric multiprocessing area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants