Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nRF52: MPU Fault issue #10055

Closed
vikrant8052 opened this issue Sep 18, 2018 · 21 comments
Closed

nRF52: MPU Fault issue #10055

vikrant8052 opened this issue Sep 18, 2018 · 21 comments
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: high High impact/importance bug

Comments

@vikrant8052
Copy link
Contributor

vikrant8052 commented Sep 18, 2018

Hi,
This zephyr/samples/boards/nrf52/mesh/onoff_level_lighting_vnd_app App in latest master branch
works perfectly normal with Zephyr v1.12.99 (with last commit ba6763a).

But with latest master branch or after v1.13 onward, I am facing issue of MPU FAULT intermittently.
If we set "LIGHT_CTL_TT" in publisher.c & configure buttons to publish Light CTL set (acknowledged) messages then it get easily encountered while playing with on-boards buttons on #nRF52840_PDK boards.

@jhedberg
Copy link
Member

Have you verified that this is not because of some too small thread stack? You should check all threads, but the two most likely culprits are the Bluetooth RX thread and the system workqueue thread. It's not really worth investigating this further until stack overflow has been excluded as a potential cause.

@vikrant8052
Copy link
Contributor Author

CONFIG_MAIN_STACK_SIZE=512 ......set to 1024
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048 ....set to 4096

even after that getting MPU_FAULT.

If stack size is issue then it should also get replicate in case of 1.12.99 ...but it is not like that.

There are some commit which are related to MPU after ba6763a which may be cause of it.

May be there is bug in App itself ..but then how it is working with v1.12.99 ?

@jhedberg
Copy link
Member

There are many aspects of the system that can cause an increase stack consumption. You didn't mention the Bluetooth RX stack size. Why not? I'd recommend setting both it and the system workque to 4k. The Kconfig option for the RX stack is CONFIG_BT_RX_STACK_SIZE. It'd also be good to get the exact consumption numbers for all threads, for which you'll need to enable CONFIG_INIT_STACKS and CONFIG_THREAD_STACK_INFO.

@vikrant8052
Copy link
Contributor Author

vikrant8052 commented Sep 18, 2018

Now I've set

CONFIG_INIT_STACKS=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096

For other configuration, please refer
https://github.com/vikrant8051/zephyr/blob/fix_bugs10/samples/boards/nrf52/mesh/onoff_level_lighting_vnd_app/prj.conf

After that I got following log ....


prio recv thread stack (real size 448): unused 336      usage 112 / 448 (25 %)
recv thread stack (real size 4096):     unused 3920     usage 176 / 4096 (4 %)
prio recv thread stack (real size 448): unused 152      usage 296 / 448 (66 %)
recv thread stack (real size 4096):     unused 3844     usage 252 / 4096 (6 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 0320
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 100, color-> 10
power-> 100, color-> 20
power-> 100, color-> 30
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 19a0
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 92, color-> 27
power-> 0, color-> 27
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 100, color-> 27
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 1760
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 1760
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
prio recv thread stack (real size 448): unused 152      usage 296 / 448 (66 %)
recv thread stack (real size 4096):     unused 3844     usage 252 / 4096 (6 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 92, color-> 24
power-> 0, color-> 24
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 100, color-> 24
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 155a
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = ffff
Present CTL Temperature = 155a
Target CTL Lightness = 4000
Target CTL Temperature = 0320
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 92, color-> 21
power-> 0, color-> 21
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = 0000
Present CTL Temperature = 1388
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
power-> 100, color-> 21
adv stack (real size 512):      unused 48       usage 464 / 512 (90 %)
***** MPU FAULT *****
  Instruction Access Violation
***** Hardware exception *****
Current thread ID = 0x2000188c
Faulting instruction address = 0x20001ca0
Fatal fault in ISR! Spinning...

@vikrant8052
Copy link
Contributor Author

Is it because of adv_stack ?
How to increase its size ?

@carlescufi
Copy link
Member

@Vikrant8051 this does looks suspicious.
https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/bluetooth/host/mesh/adv.c#L53

Can you try increasing that one?

@vikrant8052
Copy link
Contributor Author

I simply copy & paste this app from 1.13 to v1.12.99
here too I got

adv stack (real size 512): unused 48 usage 464 / 512 (90 %)

But that does't cause any MPU_FAULT.

@carlescufi I will increase it & re-check.

@vikrant8052
Copy link
Contributor Author

@carlescufi I set it to 1024 but no effect.

Remaining Time = 45
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
Acknownledgement from LIGHT_CTL_SRV
Present CTL Lightness = 0000
Present CTL Temperature = 32fe
Target CTL Lightness = ffff
Target CTL Temperature = 4e20
Remaining Time = 45
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
power-> 100, color-> 63
adv stack (real size 1024): unused 560 usage 464 / 1024 (45 %)
***** MPU FAULT *****
Instruction Access Violation
***** Hardware exception *****
Current thread ID = 0x2000188c
Faulting instruction address = 0x20001ca0
Fatal fault in ISR! Spinning...

@ioannisg
Copy link
Member

@Vikrant8051 @carlescufi

Faulting instruction address = 0x20001ca0

It looks like we try to execute from SRAM.

@ioannisg ioannisg added the platform: nRF Nordic nRFx label Sep 19, 2018
@vikrant8052
Copy link
Contributor Author

Hello @ioannisg,
Is it due to any bug in App itself ?

@ioannisg
Copy link
Member

Is it due to any bug in App itself ?

I have no idea.

All I see by inspecting the fault dump is that:

  • the MPU fault is due to program trying to execute from SRAM, which is not allowed for nRF52 builds that are supposed to be XIP
  • the fault is actually occurring in ISR. This could be some hardware ISR or, perhaps, even pendSV or SVC.

@jhedberg @carlescufi some debugging, here, might be needed, IMHO.

@Vikrant8051 (could you enable MPU_STACK_GUARD) and see if you get stack overflow?

@vikrant8052
Copy link
Contributor Author

CONFIG_MPU_STACK_GUARD=y after enabling this, on reset getting following fault....

***** Booting Zephyr OS zephyr-v1.13.0-152-g6770919e7 *****
power-> 100, color-> 0
Initializing...
Bluetooth initialized
***** MPU FAULT *****
Stacking error
Data Access Violation
MMFAR Address: 0x20003538
***** Hardware exception *****
Current thread ID = 0x200006ac
Faulting instruction address = 0x1d9ec
Fatal fault in thread 0x200006ac! Aborting.
Mesh initialized
ecc stack (real size 1024): unused 120 usage 904 / 1024 (88 %)

@vikrant8052
Copy link
Contributor Author

vikrant8052 commented Sep 19, 2018

After some testing with onboard buttons, get following log on terminal ..... (buttons suddenly stop publishing )....

power-> 100, color-> 18
power-> 0, color-> 18
power-> 100, color-> 18
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096): unused 3768 usage 328 / 4096 (8 %)
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096): unused 3768 usage 328 / 4096 (8 %)
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
bt_mesh_model_publish: err: -55
prio recv thread stack (real size 448): unused 120 usage 328 / 448 (73 %)
recv thread stack (real size 4096)

@carlescufi
Copy link
Member

@Vikrant8051 could you please try this for us:

git revert d8d5ec3f913e69b8b2d3bf46692da818567402d9

And run the test again.

@carlescufi carlescufi added the bug The issue is a bug, or the PR is fixing a bug label Sep 19, 2018
@carlescufi
Copy link
Member

carlescufi commented Sep 19, 2018

@Vikrant8051

Could you try to bisect?

$ git bisect start
$ git bisect good <commit sha that you know works>
$ git bisect bad HEAD

Then test each revision presented to you by git and type:

git bisect good if it works
git bisect bad if it doesn't work (you get an MPU fault)

until Git tells you which commit is responsible for the error.

@vikrant8052
Copy link
Contributor Author

@carlescufi
After executing following command to remove d8d5ec3 I'm not facing MPU_FAULT issue.

git revert d8d5ec3

@vikrant8052
Copy link
Contributor Author

vikrant8052 commented Sep 19, 2018

Hooray, finally bug has found.

@carlescufi
Copy link
Member

@Vikrant8051 thank you for testing this.
@andyross seems like #9620 introduces MPU faults for some users

CC @nashif

@andyross
Copy link
Contributor

Will investigate. Pretty sure the handling is correct now, but the problem was subtle and I might have messed something up...

@nashif
Copy link
Member

nashif commented Sep 21, 2018

there is a possible fix in a PR already

#9724

@vikrant8052
Copy link
Contributor Author

@nashif @carlescufi @andyross
After merging #9724 in latest local master branch, I am not facing issue of MPU_FAULT while testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: high High impact/importance bug
Projects
None yet
Development

No branches or pull requests

6 participants