Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/arch/arm/arm_thread_swap failed on frdm_k64f board. #20595

Closed
chen-png opened this issue Nov 12, 2019 · 13 comments · Fixed by #26917
Closed

tests/arch/arm/arm_thread_swap failed on frdm_k64f board. #20595

chen-png opened this issue Nov 12, 2019 · 13 comments · Fixed by #26917
Assignees
Labels
area: ARM ARM (32-bit) Architecture area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: medium Medium impact/importance bug

Comments

@chen-png
Copy link
Collaborator

Describe the bug
arch.arm.swap.common.no_optimizations and arch.arm.swap.common.fp_sharing.no_optimizations test cases failed on frdm_k64f board, there is nothing output int the handler.log.

To Reproduce
Steps to reproduce the behavior:

  1. sanitycheck -p frdm_k64f --device-testing --device-serial /dev/ttyACM0
    -T tests/arch/arm/arm_thread_swap
  2. See error and the handler.log

Screenshots or console output
sanitycheck -p frdm_k64f --device-testing --device-serial /dev/ttyACM0
-T tests/arch/arm/arm_thread_swap
Renaming output directory to /home/ztest/work/zephyrproject/zephyr/sanity-out.2
JOBS: 40
Building initial testcase list...
4 test configurations selected, 0 configurations discarded due to filters.
Adding tasks to the queue...

frdm_k64f tests/arch/arm/arm_thread_swap/arch.arm.swap.common.no_optimizations FAILED: N/A

    see: sanity-out/frdm_k64f/tests/arch/arm/arm_thread_swap/arch.arm.swap.common.no_optimizations/handler.log

total complete: 2/ 4 50% skipped: 0, failed: 1
frdm_k64f tests/arch/arm/arm_thread_swap/arch.arm.swap.common.fp_sharing.no_optimizations FAILED: N/A

    see: sanity-out/frdm_k64f/tests/arch/arm/arm_thread_swap/arch.arm.swap.common.fp_sharing.no_optimizations/handler.log

total complete: 4/ 4 100% skipped: 0, failed: 2
2 of 4 tests passed (50.00%), 2 failed, 0 skipped with 0 warnings in 121.41 seconds
In total 4 test cases were executed on 1 out of total 212 platforms (0.47%)

Hardware distribution summary:

frdm_k64f (None): 4

Environment (please complete the following information):

  • OS: Fedora28
  • Toolchain: Zephyr-sdk-0.10.3
  • Commit ID: 0297756
@chen-png chen-png added the bug The issue is a bug, or the PR is fixing a bug label Nov 12, 2019
@ioannisg
Copy link
Member

Need to investigate why the no-optimizations' test is failing, but I assume the regular test is still passing, @chen-png right?

@ioannisg ioannisg self-assigned this Nov 12, 2019
@ioannisg ioannisg added the priority: low Low impact/importance bug label Nov 12, 2019
@ioannisg
Copy link
Member

Adding low as this is seems to be specific to FRDM-board. Test passes on nRF platforms, and QEMU

@chen-png
Copy link
Collaborator Author

chen-png commented Nov 13, 2019

Need to investigate why the no-optimizations' test is failing, but I assume the regular test is still passing, @chen-png right?

yes, the regular tests can pass, but if with the CONFIG_NO_OPTIMIZATIONS, it failed.

@chen-png chen-png reopened this Nov 13, 2019
@pabigot
Copy link
Collaborator

pabigot commented Nov 13, 2019

FYI: hello_world fails on frdm_k64f when CONFIG_NO_OPTIMIZATIONS is enabled. There's a segfault in nxp_mpu.c at line 103 within region_init. AFAICT this also failed at v2.0.

@enjiamai enjiamai assigned ioannisg, enjiamai and KangJianX and unassigned ioannisg and enjiamai Jan 13, 2020
@carlescufi carlescufi added area: ARM ARM (32-bit) Architecture area: Tests Issues related to a particular existing or missing test platform: NXP NXP labels Apr 30, 2020
@dleach02
Copy link
Member

dleach02 commented Jul 10, 2020

I'm not able to reproduce this. Both the hello world commented on by @pabigot and the actual test case. @chen-png can you confirm if this problem is still occurring for you?

(tagging @MaureenHelm )

@chen-png
Copy link
Collaborator Author

yes, it still failed with latest commit 5c603d9.

@ioannisg
Copy link
Member

Yeap, still failing for me as well.

@dleach02
Copy link
Member

Dang, I hate not being able to reproduce... 👎

@ioannisg
Copy link
Member

Dang, I hate not being able to reproduce... 👎

@dleach02 are you using CONFIG_NO_OPTIMIZATIONS=y?

@dleach02
Copy link
Member

yes I did... I'll go through it again to verify I didn't fat finger something.

@dleach02
Copy link
Member

Okay. I see the failure on arm_thread_swap. Debugging now

@dleach02 dleach02 added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels Jul 15, 2020
@dleach02
Copy link
Member

The bug is that when reprogramming an NXP MPU descriptor, the hardware clears the valid bit for that memory region when the code writes the start address, end address, and attribute descriptor registers. If this memory area is accessed during this time it is invalid a memory fault occurs.

In this particular bug, the code in arch/arm/core/aarch32/cortex_m/mpu/nxp_mpu.c:mpu_configure_dynamic_mpu_regions() attempts to do a temporary mapping of all of RAM before applying the new set of memory maps. The function region_init() does the actual programming of the MPU descriptor registers and when the CONFIG_NO_OPTIMIZATIONS is set, the compiler generated code for region_init() uses locations on the program stack to store the local variables used to write the MPU descriptor. If the stack memory is in the descriptor being updated, the first write to the descriptor will clear the valid bit and the next read of the stack stored variable to write the next descriptor register will result in a memory fault.

At initial startup of the system, the first dynamic MPU descriptor is the only descriptor set to all of RAM. The code attempts to "reset" this descriptor to all of memory again causing the fault. A PR is being prepared which changes the temporary remapping of RAM to use the last of the dynamic descriptors to avoid this sequence.

@ioannisg
Copy link
Member

Great explanation @dleach02 and agree also with setting to Medium severity

@dleach02 dleach02 self-assigned this Jul 21, 2020
ioannisg pushed a commit that referenced this issue Jul 22, 2020
Race conditions exist when remapping the NXP MPU. When writing the
start, end, or attribute registers of a MPU descriptor, the hardware
will automatically clear the region's valid bit. If that region gets
accessed before the code is able to set the valid bit, the core will
fault.

Issue #20595 revealled this problem with the code in region_init()
when the compiler options are set to no optimizations. The code
generated by the compiler put local variables on the stack and then
read those stack based variables when writing the MPU descriptor
registers. If that region mapped the stack a memory fault would occur.
Higher compiler optimizations would store these local variables in
CPU registers which avoided the memory access when programming the
MPU descriptor.

Because the NXP MPU uses a logic OR operation of the MPU descriptors,
the fix uses the last descriptor in the MPU hardware to remap all of
dynamic memory for access instead of the first of the dynamic memory
descriptors as was occuring before. This allows reprogramming of the
primary discriptor blocks without having a memory fault. After all
the dynamic memory blocks are mapped, the unused blocks will have
their valid bits cleared including this temporary one, if it wasn't
alread changed during the mapping of the current set.

Fixes #20595

Signed-off-by: David Leach <david.leach@nxp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants