Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

nordic-krch · 2020-07-10T09:11:27Z

Describe the bug
When CONFIG_MULTITHREADING=n then interrupts are initially disabled (see bug #8393) when they are enabled then usage fault happens immediately (seems that it happens during returning from interrupt).

To Reproduce
Steps to reproduce the behavior:

modify hello_world:

void main(void)
{
        /* enable interrupts */
	irq_unlock(0);
	printk("Hello World! %s\n", CONFIG_BOARD);
        /* wait for interrupt coming from LF clock being started. */
	k_busy_wait(1000000);

}

prj.conf:

CONFIG_MULTITHREADING=n
CONFIG_LOG=y

buid and run. I used nrf52840dk_nrf52840 board
See error

*** Booting Zephyr OS build zephyr-v2.3.0-979-ga043d48c5472  ***
[00:00:02.426,116] <err> os: ***** USAGE FAULT *****
[00:00:02.426,116] <err> os:   Illegal load of EXC_RETURN into PC
[00:00:02.426,116] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000001  r2/a3:  0x00000001
[00:00:02.426,116] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000020 r14/lr:  0x000029db
[00:00:02.426,116] <err> os:  xpsr:  0x00000000
[00:00:02.426,147] <err> os: Faulting instruction address (r15/pc): 0xe000ed00
[00:00:02.426,147] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:02.426,147] <err> os: Current thread: 0x00000000 (unknown)
[00:00:03.378,143] <err> os: Halting system

Expected behavior
No error should appear.

Impact
Interrupts cannot be used when multithreading is off. User cannot use any driver (even out of tree which does not use kernel synchronization apis).

Environment (please complete the following information):

Commit a043d48

The text was updated successfully, but these errors were encountered:

nordic-krch · 2020-07-10T09:13:15Z

Note that #26372 was probably failing because of that, too (apart from kernel API usage).

carlescufi · 2020-07-21T17:33:43Z

@nordic-krch what is the root cause of the usage fault? won't CONFIG_LOG=y require multithreading by default? could you test with something that doesn't require threads at all? Because I remember that I was able to enable interrupts with irq_lock(0) and then use interrupts without a problem with multithreading disabled.

EDIT: See for example this use of an interrupt-driven UART: https://github.com/JuulLabs-OSS/mcuboot/blob/master/boot/zephyr/serial_adapter.c#L229 which is perfectly functional. So while it's true that interrupts are disabled by default with multithreading disabled, I am not quite sure that they are broken when enabled.

thedjnK · 2020-07-23T06:44:31Z

I was seeing the same issue yesterday when trying to use I2C functions from main() without a separate thread whilst creating/integrating a driver, trace is as follows:

*** Booting Zephyr OS build zephyr-v2.0.0-8735-g2f1d9dded535  ***
[00:00:00.009,857] \1b[1;31m<err> os: ***** USAGE FAULT *****\1b[0m
[00:00:00.015,533] \1b[1;31m<err> os:   Illegal load of EXC_RETURN into PC\1b[0m
[00:00:00.022,308] \1b[1;31m<err> os: r0/a1:  0xb672b501  r1/a2:  0x6a104a0b  r2/a3:  0xbf1e2800\1b[0m
[00:00:00.031,005] \1b[1;31m<err> os: r3/a4:  0x62112100 r12/ip:  0xf9c2f007 r14/lr:  0xf3efb662\1b[0m
[00:00:00.039,703] \1b[1;31m<err> os:  xpsr:  0xea4f0000\1b[0m
[00:00:00.044,952] \1b[1;31m<err> os: Faulting instruction address (r15/pc): 0xf1a08005\1b[0m
[00:00:00.052,856] \1b[1;31m<err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0\1b[0m
[00:00:00.060,699] \1b[1;31m<err> os: Current thread: 0x00000000 (unknown)\1b[0m
[00:00:00.067,474] \1b[1;31m<err> os: Halting system\1b[0m

nordic-krch · 2020-07-27T05:28:13Z

@carlescufi with CONFIG_LOG_MINIMAL=y i see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?

carlescufi · 2020-07-27T13:58:48Z

@carlescufi with CONFIG_LOG_MINIMAL=y i see the same issue and I think that error comes from clock interrupt (when LF clock is ready). Could it be serial recovery turns on multithreading?

No, it does not. I just checked: disabled logging and built with CONFIG_MCUBOOT_SERIAL and CONFIG_MULTITHREADING remains disabled.

carlescufi · 2020-07-27T14:01:05Z

@de-nordic and @nvlsianpu can you confirm that serial recovery is fully functional in mcuboot and that it keeps CONFIG_MULTITHREADING disabled?

de-nordic · 2020-07-27T14:03:35Z

@carlescufi I will look at this today and let you know.

de-nordic · 2020-07-27T14:58:12Z

@carlescufi the CONFIG_MULTITHREADING is still disabled, but serial recovery does not work with latest master commit (75949f4 at the time I am writing this).
I have enabled it for test purposes and then it worked.

Tested on nrf52840dk_nrf52840.

carlescufi · 2020-07-27T15:19:03Z

@de-nordic thanks.

I have enabled it for test purposes and then it worked.
You mean you enabled CONFIG_MULTITHREADING right?

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

de-nordic · 2020-07-27T15:23:12Z

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

@carlescufi do you want me to check it?

carlescufi · 2020-07-28T14:25:27Z

In this case, I think we need to find out whether it's GPIO or UART that are making use of multithreading, or if it's something else entirely.

@carlescufi do you want me to check it?

Sure, yes please @de-nordic since we are at it.

carlescufi · 2020-07-28T17:33:06Z

@nordic-krch this commit introduced the regression.

de-nordic · 2020-07-28T20:48:17Z

Waiting for @nordic-krch input on the 2881df3, but I am afraid that the change might have uncovered the issue rather than introduce it.
I have tried to pinpoint exact location where the failt is triggered, in mcuboot, and I have found out that it would happen (for me) on second invocation of boot_serial_start:595, call f->read(...), but when I have tried to step it (si) in gdb disassembly, I could basically put rock on the enter key and the issue would never happen.

anangl · 2020-07-30T13:15:18Z

@carlescufi @de-nordic @nordic-krch This issue caught my attention and I took a quick deeper look. I think the problem lies in incorrect configuration of stack pointer registers when CONFIG_MULTITHREADING is disabled.
Thread mode is configured to use PSP here:

zephyr/arch/arm/core/aarch32/cortex_m/reset.S

Lines 95 to 98 in 3bc6c55

    
           mrs r0, CONTROL 
        
           movs r1, #2 
        
           orrs r0, r1 /* CONTROL_SPSEL_Msk */ 
        
           msr CONTROL, r0

Because further initialization is done this way:

zephyr/kernel/init.c

Lines 475 to 479 in 7d90812

    
           #ifdef CONFIG_MULTITHREADING 
        
           	prepare_multithreading(); 
        
           	switch_to_main_thread(); 
        
           #else 
        
           	bg_thread_main(NULL, NULL, NULL);

PSP is not reconfigured to the top of the main stack by this code that is called from switch_to_main_thread():

zephyr/arch/arm/core/aarch32/thread.c

Lines 430 to 438 in 4c67339

    
           	/* 
        
           	 * Set PSP to the highest address of the main stack 
        
           	 * before enabling interrupts and jumping to main. 
        
           	 */ 
        
           	__asm__ volatile ( 
        
           	"mov   r0,  %0\n\t"	/* Store _main in R0 */ 
        
           #if defined(CONFIG_CPU_CORTEX_M) 
        
           	"msr   PSP, %1\n\t"	/* __set_PSP(start_of_main_stack) */ 
        
           #endif

and after initialization is finished, PSP points to the same stack as MSP, just a little below MSP. Then, if an interrupt routine uses the stack (pointed by MSP) more intensively, it can overwrite the values stacked there on the exception entry (using PSP) and the return from exception may fail in various ways (most likely with UsageFault). But if all interrupt routines don't use too much stack, everything can work correctly for quite a long time. As @de-nordic already signaled:

Waiting for @nordic-krch input on the 2881df3, but I am afraid that the change might have uncovered the issue rather than introduce it.

And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.

carlescufi · 2020-07-30T13:33:48Z

@anangl thanks for the extensive analysis!

And it seems this issue may occur on all Cortex-M SoCs. I'm not sure who would be the best person to look at this problem.

@ioannisg should be able to look at this.

de-nordic · 2020-07-30T13:39:48Z

@anangl thanks!

ioannisg · 2020-07-31T09:54:57Z

@anangl thanks for the study you did - it's half of the work already :)

pabigot · 2020-08-03T13:01:34Z

This was meant for #27343 but is still relevant here. I've updated the issue title.

Using #27136 on current master I've confirmed broken CONFIG_MULTITHREADING=n support on:

nucleo_l476rg (program dies after BOOT_BANNER)
nrf52840dk_nrf52840 (program runs, but once interrupt fires everything stops)
frdm_k64f (program dies after BOOT_BANNER)

Clearly CONFIG_MULTITHREADING=n is a poorly tested configuration, and the failure is not Nordic-specific.

tcpipchip · 2022-11-06T11:58:33Z

Hi
I am having the same problem with STM32L072 + SX1276 (Zephyr /samples/subsys/lorawan/class_a
[00:00:00.200,000] sx127x: SX127x version 0x12 found
[00:00:00.302,000] lorawan_class_a: Joining network over OTAA
[00:00:00.315,000] os: ***** HARD FAULT *****
[00:00:00.315,000] os: r0/a1: 0x000000fd r1/a2: 0x000000f4 r2/a3: 0x00000041
[00:00:00.315,000] os: r3/a4: 0x000000f1 r12/ip: 0x000000ae r14/lr: 0x00000098
[00:00:00.315,000] os: xpsr: 0x00000000
[00:00:00.315,000] os: Faulting instruction address (r15/pc): 0x00000046
[00:00:00.315,000] os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:00.315,000] os: Current thread: 0x20000740 (unknown)
[00:00:00.379,000] os: Halting system

Some suggestion ?

thedjnK · 2022-11-07T17:10:30Z

@tcpipchip You are probably trashing memory

tcpipchip · 2022-11-07T18:03:22Z

yes, looks stack memory!

nordic-krch added the bug The issue is a bug, or the PR is fixing a bug label Jul 10, 2020

nordic-krch mentioned this issue Jul 10, 2020

drivers: clock_control: nrf: Use onoff service #24334

Merged

carlescufi assigned andyross Jul 10, 2020

carlescufi added the area: Kernel label Jul 10, 2020

MaureenHelm assigned nordic-krch and unassigned andyross Jul 28, 2020

MaureenHelm added the priority: medium Medium impact/importance bug label Jul 28, 2020

carlescufi changed the title ~~Interrupts do not work with CONFIG_MULTITHREADING=n~~ Interrupts on nRF devices do not work with CONFIG_MULTITHREADING=n Jul 28, 2020

carlescufi added the platform: nRF Nordic nRFx label Jul 28, 2020

carlescufi assigned ioannisg and stephanosio Jul 30, 2020

ioannisg mentioned this issue Aug 3, 2020

ARM Cortex-M: Fix booting in no-multithreading mode #27343

Merged

pabigot changed the title ~~Interrupts on nRF devices do not work with CONFIG_MULTITHREADING=n~~ Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n Aug 3, 2020

pabigot mentioned this issue Aug 3, 2020

Verify that single-threaded can be made to work on all architectures #27352

Closed

ioannisg added the Enhancement Changes/Updates/Additions to existing features label Aug 3, 2020

carlescufi mentioned this issue Aug 6, 2020

Decide if we keep a single thread support (CONFIG_MULTITHREADING=n) in Zephyr #27415

Closed

4 tasks

carlescufi closed this as completed in #27343 Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

nordic-krch commented Jul 10, 2020

nordic-krch commented Jul 10, 2020

carlescufi commented Jul 21, 2020 •

edited

thedjnK commented Jul 23, 2020

nordic-krch commented Jul 27, 2020

carlescufi commented Jul 27, 2020

carlescufi commented Jul 27, 2020

de-nordic commented Jul 27, 2020

de-nordic commented Jul 27, 2020

carlescufi commented Jul 27, 2020

de-nordic commented Jul 27, 2020

carlescufi commented Jul 28, 2020

carlescufi commented Jul 28, 2020

de-nordic commented Jul 28, 2020

anangl commented Jul 30, 2020

carlescufi commented Jul 30, 2020

de-nordic commented Jul 30, 2020

ioannisg commented Jul 31, 2020

pabigot commented Aug 3, 2020 •

edited

tcpipchip commented Nov 6, 2022 •

edited

thedjnK commented Nov 7, 2022

tcpipchip commented Nov 7, 2022

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

Interrupts on Cortex-M do not work with CONFIG_MULTITHREADING=n #26796

Comments

nordic-krch commented Jul 10, 2020

nordic-krch commented Jul 10, 2020

carlescufi commented Jul 21, 2020 • edited

thedjnK commented Jul 23, 2020

nordic-krch commented Jul 27, 2020

carlescufi commented Jul 27, 2020

carlescufi commented Jul 27, 2020

de-nordic commented Jul 27, 2020

de-nordic commented Jul 27, 2020

carlescufi commented Jul 27, 2020

de-nordic commented Jul 27, 2020

carlescufi commented Jul 28, 2020

carlescufi commented Jul 28, 2020

de-nordic commented Jul 28, 2020

anangl commented Jul 30, 2020

carlescufi commented Jul 30, 2020

de-nordic commented Jul 30, 2020

ioannisg commented Jul 31, 2020

pabigot commented Aug 3, 2020 • edited

tcpipchip commented Nov 6, 2022 • edited

thedjnK commented Nov 7, 2022

tcpipchip commented Nov 7, 2022

carlescufi commented Jul 21, 2020 •

edited

pabigot commented Aug 3, 2020 •

edited

tcpipchip commented Nov 6, 2022 •

edited