Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

Closed
nordic-krch opened this issue Jul 10, 2020 · 21 comments · Fixed by #27343
Closed

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

nordic-krch opened this issue Jul 10, 2020 · 21 comments · Fixed by #27343
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug Enhancement Changes/Updates/Additions to existing features platform: nRF Nordic nRFx priority: medium Medium impact/importance bug

Comments

@nordic-krch
Copy link
Contributor

Describe the bug
When CONFIG_MULTITHREADING=n then interrupts are initially disabled (see bug #8393) when they are enabled then usage fault happens immediately (seems that it happens during returning from interrupt).

To Reproduce
Steps to reproduce the behavior:

  1. modify hello_world:
void main(void)
{
        /* enable interrupts */
	irq_unlock(0);
	printk("Hello World! %s\n", CONFIG_BOARD);
        /* wait for interrupt coming from LF clock being started. */
	k_busy_wait(1000000);

}

prj.conf:

CONFIG_MULTITHREADING=n
CONFIG_LOG=y
  1. buid and run. I used nrf52840dk_nrf52840 board
  2. See error
*** Booting Zephyr OS build zephyr-v2.3.0-979-ga043d48c5472  ***
[00:00:02.426,116] <err> os: ***** USAGE FAULT *****
[00:00:02.426,116] <err> os:   Illegal load of EXC_RETURN into PC
[00:00:02.426,116] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000001  r2/a3:  0x00000001
[00:00:02.426,116] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000020 r14/lr:  0x000029db
[00:00:02.426,116] <err> os:  xpsr:  0x00000000
[00:00:02.426,147] <err> os: Faulting instruction address (r15/pc): 0xe000ed00
[00:00:02.426,147] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.426,147] <err> os: Current thread: 0x00000000 (unknown)
[00:00:03.378,143] <err> os: Halting system

Expected behavior
No error should appear.

Impact
Interrupts cannot be used when multithreading is off. User cannot use any driver (even out of tree which does not use kernel synchronization apis).

Environment (please complete the following information):

@nordic-krch nordic-krch added the bug The issue is a bug, or the PR is fixing a bug label Jul 10, 2020
@nordic-krch
Copy link
Contributor Author

Note that #26372 was probably failing because of that, too (apart from kernel API usage).

@carlescufi
Copy link
Member

carlescufi commented Jul 21, 2020

@nordic-krch what is the root cause of the usage fault? won't CONFIG_LOG=y require multithreading by default? could you test with something that doesn't require threads at all? Because I remember that I was able to enable interrupts with irq_lock(0) and then use interrupts without a problem with multithreading disabled.

EDIT: See for example this use of an interrupt-driven UART: https://github.com/JuulLabs-OSS/mcuboot/blob/master/boot/zephyr/serial_adapter.c#L229 which is perfectly functional. So while it's true that interrupts are disabled by default with multithreading disabled, I am not quite sure that they are broken when enabled.

@thedjnK
Copy link
Collaborator

thedjnK commented Jul 23, 2020

I was seeing the same issue yesterday when trying to use I2C functions from main() without a separate thread whilst creating/integrating a driver, trace is as follows:

*** Booting Zephyr OS build zephyr-v2.0.0-8735-g2f1d9dded535  ***
[00:00:00.009,857] \1b[1;31m<err> os: ***** USAGE FAULT *****\1b[0m
[00:00:00.015,533] \1b[1;31m<err> os:   Illegal load of EXC_RETURN into PC\1b[0m
[00:00:00.022,308] \1b[1;31m<err> os: r0/a1:  0xb672b501  r1/a2:  0x6a104a0b  r2/a3:  0xbf1e2800\1b[0m
[00:00:00.031,005] \1b[1;31m<err> os: r3/a4:  0x62112100 r12/ip:  0xf9c2f007 r14/lr:  0xf3efb662\1b[0m
[00:00:00.039,703] \1b[1;31m<err> os:  xpsr:  0xea4f0000\1b[0m
[00:00:00.044,952] \1b[1;31m<err> os: Faulting instruction address (r15/pc): 0xf1a08005\1b[0m
[00:00:00.052,856] \1b[1;31m<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0\1b[0m
[00:00:00.060,699] \1b[1;31m<err> os: Current thread: 0x00000000 (unknown)\1b[0m
[00:00:00.067,474] \1b[1;31m<err> os: Halting system\1b[0m

@nordic-krch
Copy link
Contributor Author

@carlescufi with CONFIG_LOG_MINIMAL=y i see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?

@carlescufi
Copy link
Member

@carlescufi with CONFIG_LOG_MINIMAL=y i see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?

No, it does not. I just checked: disabled logging and built with CONFIG_MCUBOOT_SERIAL and CONFIG_MULTITHREADING remains disabled.

@carlescufi
Copy link
Member

@de-nordic and @nvlsianpu can you confirm that serial recovery is fully functional in mcuboot and that it keeps CONFIG_MULTITHREADING disabled?

@de-nordic
Copy link
Collaborator

@carlescufi I will look at this today and let you know.

@de-nordic
Copy link
Collaborator

@carlescufi the CONFIG_MULTITHREADING is still disabled, but serial recovery does not work with latest master commit (75949f4 at the time I am writing this).
I have enabled it for test purposes and then it worked.

Tested on nrf52840dk_nrf52840.

@carlescufi
Copy link
Member

@de-nordic thanks.

I have enabled it for test purposes and then it worked.
You mean you enabled CONFIG_MULTITHREADING right?

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

@de-nordic
Copy link
Collaborator

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

@carlescufi do you want me to check it?

@carlescufi
Copy link
Member

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

@carlescufi do you want me to check it?

Sure, yes please @de-nordic since we are at it.

@MaureenHelm MaureenHelm assigned nordic-krch and unassigned andyross Jul 28, 2020
@MaureenHelm MaureenHelm added the priority: medium Medium impact/importance bug label Jul 28, 2020
@carlescufi carlescufi changed the title Interrupts do not work with CONFIG_MULTITHREADING=n Interrupts on nRF devices do not work with CONFIG_MULTITHREADING=n Jul 28, 2020
@carlescufi carlescufi added the platform: nRF Nordic nRFx label Jul 28, 2020
@carlescufi
Copy link
Member

@nordic-krch this commit introduced the regression.

@de-nordic
Copy link
Collaborator

Waiting for @nordic-krch input on the 2881df3, but I am afraid that the change might have uncovered the issue rather than introduce it.
I have tried to pinpoint exact location where the failt is triggered, in mcuboot, and I have found out that it would happen (for me) on second invocation of boot_serial_start:595, call f->read(...), but when I have tried to step it (si) in gdb disassembly, I could basically put rock on the enter key and the issue would never happen.

@anangl
Copy link
Member

anangl commented Jul 30, 2020

@carlescufi @de-nordic @nordic-krch This issue caught my attention and I took a quick deeper look. I think the problem lies in incorrect configuration of stack pointer registers when CONFIG_MULTITHREADING is disabled.
Thread mode is configured to use PSP here:

mrs r0, CONTROL
movs r1, #2
orrs r0, r1 /* CONTROL_SPSEL_Msk */
msr CONTROL, r0

Because further initialization is done this way:

zephyr/kernel/init.c

Lines 475 to 479 in 7d90812

#ifdef CONFIG_MULTITHREADING
prepare_multithreading();
switch_to_main_thread();
#else
bg_thread_main(NULL, NULL, NULL);

PSP is not reconfigured to the top of the main stack by this code that is called from switch_to_main_thread():
/*
* Set PSP to the highest address of the main stack
* before enabling interrupts and jumping to main.
*/
__asm__ volatile (
"mov r0, %0\n\t" /* Store _main in R0 */
#if defined(CONFIG_CPU_CORTEX_M)
"msr PSP, %1\n\t" /* __set_PSP(start_of_main_stack) */
#endif

and after initialization is finished, PSP points to the same stack as MSP, just a little below MSP. Then, if an interrupt routine uses the stack (pointed by MSP) more intensively, it can overwrite the values stacked there on the exception entry (using PSP) and the return from exception may fail in various ways (most likely with UsageFault). But if all interrupt routines don't use too much stack, everything can work correctly for quite a long time. As @de-nordic already signaled:

Waiting for @nordic-krch input on the 2881df3, but I am afraid that the change might have uncovered the issue rather than introduce it.

And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.

@carlescufi
Copy link
Member

@anangl thanks for the extensive analysis!

And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.

@ioannisg should be able to look at this.

@de-nordic
Copy link
Collaborator

@anangl thanks!

@ioannisg
Copy link
Member

@anangl thanks for the study you did - it's half of the work already :)

@pabigot
Copy link
Collaborator

pabigot commented Aug 3, 2020

This was meant for #27343 but is still relevant here. I've updated the issue title.

Using #27136 on current master I've confirmed broken CONFIG_MULTITHREADING=n support on:

  • nucleo_l476rg (program dies after BOOT_BANNER)
  • nrf52840dk_nrf52840 (program runs, but once interrupt fires everything stops)
  • frdm_k64f (program dies after BOOT_BANNER)

Clearly CONFIG_MULTITHREADING=n is a poorly tested configuration, and the failure is not Nordic-specific.

@pabigot pabigot changed the title Interrupts on nRF devices do not work with CONFIG_MULTITHREADING=n Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n Aug 3, 2020
@ioannisg ioannisg added the Enhancement Changes/Updates/Additions to existing features label Aug 3, 2020
@tcpipchip
Copy link

tcpipchip commented Nov 6, 2022

Hi
I am having the same problem with STM32L072 + SX1276 (Zephyr /samples/subsys/lorawan/class_a
[00:00:00.200,000] sx127x: SX127x version 0x12 found
[00:00:00.302,000] lorawan_class_a: Joining network over OTAA
[00:00:00.315,000] os: ***** HARD FAULT *****
[00:00:00.315,000] os: r0/a1: 0x000000fd r1/a2: 0x000000f4 r2/a3: 0x00000041
[00:00:00.315,000] os: r3/a4: 0x000000f1 r12/ip: 0x000000ae r14/lr: 0x00000098
[00:00:00.315,000] os: xpsr: 0x00000000
[00:00:00.315,000] os: Faulting instruction address (r15/pc): 0x00000046
[00:00:00.315,000] os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:00.315,000] os: Current thread: 0x20000740 (unknown)
[00:00:00.379,000] os: Halting system
image

Some suggestion ?

@thedjnK
Copy link
Collaborator

thedjnK commented Nov 7, 2022

@tcpipchip You are probably trashing memory

@tcpipchip
Copy link

yes, looks stack memory!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug Enhancement Changes/Updates/Additions to existing features platform: nRF Nordic nRFx priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.