Skip to content

S32K344 halts after a period of time #89852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cmoser-crl opened this issue May 12, 2025 · 5 comments · Fixed by #90305
Closed

S32K344 halts after a period of time #89852

cmoser-crl opened this issue May 12, 2025 · 5 comments · Fixed by #90305
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: NXP Robotics NXP Robotics Module Platform Products priority: low Low impact/importance bug

Comments

@cmoser-crl
Copy link

Describe the bug

I'm working on an application for the NXP S32K344 and am currently using the MR-CANHUBK3 board for development. After a period of time, the MCU appears to halt/hang. The time it takes to fail is consistent across reboots for a given program. Initially, I thought the issue was with Logging/UART output as it seems to occur faster when that's enabled, but after more testing, it also happens with it disabled.

I've tried to do some debugging with Trace32. After the issue occurs, I'm not able to pause the processor with the break command. If I use the ITM trace with PC sampler, it looks like the cpu hangs on an instruction in the main thread. The instruction it hangs on varries across runs. The cpu continues handling the systick interrupts and then hangs each time it returns to the main thread. Other interrupts such as GPIO interrupts are also handled. After the issue occurs, I'm still able to view the peripheral registers in Trace32 and I've comfirmed that the safety peripherals haven't caused a functional reset.

If I disable threading with CONFIG_MULTITHREADING=n or disable data cache with CONFIG_DCACHE=n, the issue doesn't appear to occur. I've run builds with both for over 48 hours without any issue.

To Reproduce

I've created a repo here that consistently reproduces the issue after about a minute. I've also tested with the samples/basic/blinky sample program and the issue occurs after ~8.5 hours.

Expected behavior

MCU shouldn't halt/hang.

Impact

We're able to work around it by modifying existing drivers to not use threads, but it's a pretty big annoyance.

Logs and console output

[00:01:17.368] hello world 2822
[00:01:17.384] hello world 2823
[00:01:17.416] hello world 2824
[00:01:17.432] hello world 2825
[00:01:17.448] hello world 2826
*stops outputting*

Environment

Additional context

All of the testing was done with the FS26 SBC in Debug mode.

I've tested the same code on a Teensy (MIMXRT1062, also Cortex M7) without seeing this issue.

image

systick is still handled after the main thread hangs

image

the systick handler hangs when trying to return to the main thread

@cmoser-crl cmoser-crl added the bug The issue is a bug, or the PR is fixing a bug label May 12, 2025
Copy link

Hi @cmoser-crl! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

@Dat-NguyenDuy Dat-NguyenDuy added the platform: NXP Robotics NXP Robotics Module Platform Products label May 13, 2025
@danieldegrasse danieldegrasse added the priority: low Low impact/importance bug label May 13, 2025
@manuargue
Copy link
Member

manuargue commented May 14, 2025

Hi @cmoser-crl , this may be related to Cortex-M core trying to make speculative access to certain memory address that do not allow to be read. Would you mind trying this patch with your sample and feedback if the issue still persist?

diff --git a/soc/nxp/s32/s32k3/mpu_regions.c b/soc/nxp/s32/s32k3/mpu_regions.c
index acd3ee9d916..66b027e5cc7 100644
--- a/soc/nxp/s32/s32k3/mpu_regions.c
+++ b/soc/nxp/s32/s32k3/mpu_regions.c
@@ -14,6 +14,13 @@ extern char _rom_attr[];

static struct arm_mpu_region mpu_regions[] = {

+       /* Prevent speculative access in entire memory region */
+       {
+               .name = "BACKGROUND",
+               .base = 0,
+               .attr = { REGION_4G | MPU_RASR_XN_Msk | P_RW_U_RW_Msk },
+       },
+
        /* Keep before CODE region so it can be overlapped by SRAM CODE in non-XIP systems */
        {
                .name = "SRAM",

@cmoser-crl
Copy link
Author

Hi @manuargue, thanks for the response - I rebuilt my sample application with this patch. It's made it past the previous point that it was hanging. I'll leave it running for a few days and update this issue with the results

manuargue added a commit to nxp-upstream/zephyr that referenced this issue May 15, 2025
Due to erratum ERR011573, speculative accesses might be performed
to normal memory unmapped in the MPU. This can be avoided by using
MPU region 0 to cover all unmapped memory and make this region
execute-never and inaccessible.

Fixes zephyrproject-rtos#89852

Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com>
Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>
@cmoser-crl
Copy link
Author

This patch fixed the issue I was seeing. I checked this morning and the MCU has been running fine for the last 4 days.

@manuargue
Copy link
Member

thanks for the feedback @cmoser-crl , I'll open a pr with this patch

manuargue added a commit to nxp-upstream/zephyr that referenced this issue May 21, 2025
Due to erratum ERR011573, speculative accesses might be performed
to normal memory unmapped in the MPU. This can be avoided by using
MPU region 0 to cover all unmapped memory and make this region
execute-never and inaccessible.

Fixes zephyrproject-rtos#89852

Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com>
Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>
manuargue added a commit to nxp-upstream/zephyr that referenced this issue May 21, 2025
Due to erratum ERR011573, speculative accesses might be performed
to normal memory unmapped in the MPU. This can be avoided by using
MPU region 0 to cover all unmapped memory and make this region
execute-never and inaccessible.

Fixes zephyrproject-rtos#89852

Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com>
Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>
kartben pushed a commit that referenced this issue May 29, 2025
Due to erratum ERR011573, speculative accesses might be performed
to normal memory unmapped in the MPU. This can be avoided by using
MPU region 0 to cover all unmapped memory and make this region
execute-never and inaccessible.

Fixes #89852

Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com>
Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>
Shreyas-Shankar155 pushed a commit to MihiraMadhava/zephyr that referenced this issue Jun 3, 2025
Due to erratum ERR011573, speculative accesses might be performed
to normal memory unmapped in the MPU. This can be avoided by using
MPU region 0 to cover all unmapped memory and make this region
execute-never and inaccessible.

Fixes zephyrproject-rtos#89852

Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com>
Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: NXP Robotics NXP Robotics Module Platform Products priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants