Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard fault if CONFIG_LOG2_MODE_DEFERRED is enabled #41517

Closed
ycsin opened this issue Dec 31, 2021 · 13 comments
Closed

Hard fault if CONFIG_LOG2_MODE_DEFERRED is enabled #41517

ycsin opened this issue Dec 31, 2021 · 13 comments
Assignees
Labels
area: Logging bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale

Comments

@ycsin
Copy link
Member

ycsin commented Dec 31, 2021

Describe the bug
I'm trying to enable LOG2 to log floating point by setting CONFIG_LOG2_MODE_DEFERRED in prj.conf but I bump into a hard fault.

Impact
Unable to use LOG2

Logs and console output

I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x1
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0x10000
I: Jumping to the first image slot


gmoc:~ $
[00:00:02.009,643] <err> os: r0/a1:  0x00000002  r1/a2:  0x00000000  r2/a3:  0xf0f0f0f0

[00:00:02.009,704] <err> os: r3/a4:  0x2000c3a0 r12/ip:  0xaaaaaaaa r14/lr:  0x0804c111

This is the callstack when I debug the device with a probe:
image

The debugger can't seem to display the content of the stack, it says something like "optimized out" for some reason:
image

image

What I know is that the stack sentinel detected a stack overflow and triggered a hardfault, which goes to my fatal_error_handler, which invoke LOG_PANIC():

image

and the LOG_PANIC() attempts to print the logs, but triggered another assert as all this are invoked from the stack sentinel's trigger hard fault

image

which triggers k_panic

image

which will trigger a hard fault and calls my hard fault error handler again which calls the LOG_PANIC, and I think it will just go on and on?

My system workqueue stack size is 4096 bytes. My main stack size was 1024 bytes, I increased that to 2048 bytes but the error remained the same, I modified my main() like so to reduce the scope:

void main(void)
{
	return;
}

Environment (please complete the following information):

@ycsin ycsin added the bug The issue is a bug, or the PR is fixing a bug label Dec 31, 2021
@ycsin ycsin changed the title Hard fault if en Hard fault if CONFIG_LOG2_MODE_DEFERRED is enabled Dec 31, 2021
@ycsin
Copy link
Member Author

ycsin commented Dec 31, 2021

I spotted that the stack sentinel stack overflow was related to uart_mux's workq, and increased that to 1024 bytes from 512, and now I have another type of hard fault, seems to be related to UART_BACKEND_RTTLOG_BACKEND_RTT

image

@belolap
Copy link
Contributor

belolap commented Jan 1, 2022

May be I got different error, but in similar situation with floating point I increase CONFIG_LOG_PROCESS_THREAD_STACK_SIZE and all is ok.

@dkalowsk
Copy link
Contributor

dkalowsk commented Jan 4, 2022

@ycsin Any chance you can provide the west command to build for debug and reproduction purposes?

@dkalowsk dkalowsk added the priority: medium Medium impact/importance bug label Jan 4, 2022
@ycsin
Copy link
Member Author

ycsin commented Jan 5, 2022

@ycsin Any chance you can provide the west command to build for debug and reproduction purposes?

I use

west build -b gtsb_ivm21 --pristine -- -DBOARD_ROOT="$env:BOARD_ROOT" -DCMAKE_EXPORT_COMPILE_COMMANDS=1;

gtsb_ivm21 is my custom board, basically a nucleo_g0b1re with Quectel EC21 modem and AT45 flash, so it has quite a few things enabled, apart from the typical ones enabled by default in nucleo_g0b1re, some of the more notable ones:

  • gsm_ppp
  • uart_mux
  • at45 driver
  • dma async
  • networking
  • MQTT
  • PM
  • log, shell
  • etc

@ycsin
Copy link
Member Author

ycsin commented Jan 5, 2022

I have another type of hard fault, seems to be related to LOG_BACKEND_RTT

A small update following this:

I had SEGGER_SYSTEMVIEW enabled in my system and thus LOG_BACKEND_RTT, I disabled SEGGER related things (and enabled LOG_BACKEND_UART, I believe. I'm not sure why I did that, maybe just trial and error), then it was able to print logs to the terminal, but it was pretty slow with quite a few of dropped messages, and I wasn't able to type anything into the console (SHELL seems to be not working).

It's a bit blurry now and I decided to continue with the legacy LOG for the time being

@nordic-krch
Copy link
Contributor

Issue reported in shell was fixed by #38960. I will take a look at RTT case.

@nordic-krch
Copy link
Contributor

nordic-krch commented Jan 5, 2022

@ycsin can you paste the configuration you were using? Was shell used on RTT as well? In that case you shouldn't need to enable rtt backend(it will go through rttshell) but still it should work.

First report (stack overflow) was happening at startup. When second fault occurs?

Legacy LOG will be deprecated at some point so it would be good to resolve issues like that.

@ycsin
Copy link
Member Author

ycsin commented Jan 6, 2022

Thanks for all the support!

Was shell used on RTT as well? In that case you shouldn't need to enable rtt backend(it will go through rttshell) but still it should work.

I'm not sure about the backends, I generally don't mess with the default configurations unless something doesn't work as expected. I'm not really sure what is RTT, I'm just trying out the SEGGER_SYSTEMVIEW for debugging.

My usual prj.conf contains these shell/log Kconfigs:

CONFIG_SHELL=y
CONFIG_SHELL_MINIMAL=n
CONFIG_SHELL_VT100_COLORS=y
CONFIG_SHELL_PROMPT_UART="gmoc:~ $ "
CONFIG_DATE_SHELL=n
CONFIG_KERNEL_SHELL=y
CONFIG_MODEM_SHELL=y
CONFIG_ADC_SHELL=n
CONFIG_I2C_SHELL=n
CONFIG_SENSOR_SHELL=n
CONFIG_FLASH_SHELL=n

CONFIG_LOG=y

Anything else is implied by some other Kconfigs that I might not know that they have some dependencies with LOG/SHELL.

I'll post more details when I come back to this issue later, maybe after I rebased and improved my custom application to a stable state. For the configurations, would the autoconf.h suffice?

@nordic-krch
Copy link
Contributor

@ycsin could you try to check if issue exists still on latest? There was #42164 where RTT backend was used and unexpected hardfault occurred. Turns out that a40ca6f fixed it.

@nordic-krch
Copy link
Contributor

@ycsin did you manage to check it?

@ycsin
Copy link
Member Author

ycsin commented Feb 11, 2022

@nordic-krch sorry, not yet. I just got the 2.7 LTS branch ready for my application today, this issue was filed for 2.7.0 RC3. I'll probably wait for @manoj153's test result first before I test it again.

@manoj153
Copy link
Contributor

@nordic-krch sorry, not yet. I just got the 2.7 LTS branch ready for my application today, this issue was filed for 2.7.0 RC3. I'll probably wait for @manoj153's test result first before I test it again.

Done my test #42207

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Logging bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

5 participants