S32K344 halts after a period of time #89852

cmoser-crl · 2025-05-12T17:52:57Z

Describe the bug

I'm working on an application for the NXP S32K344 and am currently using the MR-CANHUBK3 board for development. After a period of time, the MCU appears to halt/hang. The time it takes to fail is consistent across reboots for a given program. Initially, I thought the issue was with Logging/UART output as it seems to occur faster when that's enabled, but after more testing, it also happens with it disabled.

I've tried to do some debugging with Trace32. After the issue occurs, I'm not able to pause the processor with the break command. If I use the ITM trace with PC sampler, it looks like the cpu hangs on an instruction in the main thread. The instruction it hangs on varries across runs. The cpu continues handling the systick interrupts and then hangs each time it returns to the main thread. Other interrupts such as GPIO interrupts are also handled. After the issue occurs, I'm still able to view the peripheral registers in Trace32 and I've comfirmed that the safety peripherals haven't caused a functional reset.

If I disable threading with CONFIG_MULTITHREADING=n or disable data cache with CONFIG_DCACHE=n, the issue doesn't appear to occur. I've run builds with both for over 48 hours without any issue.

To Reproduce

I've created a repo here that consistently reproduces the issue after about a minute. I've also tested with the samples/basic/blinky sample program and the issue occurs after ~8.5 hours.

Expected behavior

MCU shouldn't halt/hang.

Impact

We're able to work around it by modifying existing drivers to not use threads, but it's a pretty big annoyance.

Logs and console output

[00:01:17.368] hello world 2822
[00:01:17.384] hello world 2823
[00:01:17.416] hello world 2824
[00:01:17.432] hello world 2825
[00:01:17.448] hello world 2826
*stops outputting*

Environment

OS: Linux
Toolchain: Zyphyr sdk 0.16.6 (also tested 0.16.9)
Zephyr version: v3.7.0 (also tested v4.0.0, v4.1.0)
Code: https://github.com/cmoser-crl/simple-zephyr-example

Additional context

All of the testing was done with the FS26 SBC in Debug mode.

I've tested the same code on a Teensy (MIMXRT1062, also Cortex M7) without seeing this issue.

systick is still handled after the main thread hangs

the systick handler hangs when trying to return to the main thread

The text was updated successfully, but these errors were encountered:

github-actions · 2025-05-12T17:53:40Z

Hi @cmoser-crl! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

manuargue · 2025-05-14T22:56:33Z

Hi @cmoser-crl , this may be related to Cortex-M core trying to make speculative access to certain memory address that do not allow to be read. Would you mind trying this patch with your sample and feedback if the issue still persist?

diff --git a/soc/nxp/s32/s32k3/mpu_regions.c b/soc/nxp/s32/s32k3/mpu_regions.c
index acd3ee9d916..66b027e5cc7 100644
--- a/soc/nxp/s32/s32k3/mpu_regions.c
+++ b/soc/nxp/s32/s32k3/mpu_regions.c
@@ -14,6 +14,13 @@ extern char _rom_attr[];

static struct arm_mpu_region mpu_regions[] = {

+       /* Prevent speculative access in entire memory region */
+       {
+               .name = "BACKGROUND",
+               .base = 0,
+               .attr = { REGION_4G | MPU_RASR_XN_Msk | P_RW_U_RW_Msk },
+       },
+
        /* Keep before CODE region so it can be overlapped by SRAM CODE in non-XIP systems */
        {
                .name = "SRAM",

cmoser-crl · 2025-05-15T17:28:08Z

Hi @manuargue, thanks for the response - I rebuilt my sample application with this patch. It's made it past the previous point that it was hanging. I'll leave it running for a few days and update this issue with the results

Due to erratum ERR011573, speculative accesses might be performed to normal memory unmapped in the MPU. This can be avoided by using MPU region 0 to cover all unmapped memory and make this region execute-never and inaccessible. Fixes zephyrproject-rtos#89852 Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com> Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>

cmoser-crl · 2025-05-19T15:29:20Z

This patch fixed the issue I was seeing. I checked this morning and the MCU has been running fine for the last 4 days.

manuargue · 2025-05-19T20:50:10Z

thanks for the feedback @cmoser-crl , I'll open a pr with this patch

Due to erratum ERR011573, speculative accesses might be performed to normal memory unmapped in the MPU. This can be avoided by using MPU region 0 to cover all unmapped memory and make this region execute-never and inaccessible. Fixes zephyrproject-rtos#89852 Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com> Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>

Due to erratum ERR011573, speculative accesses might be performed to normal memory unmapped in the MPU. This can be avoided by using MPU region 0 to cover all unmapped memory and make this region execute-never and inaccessible. Fixes #89852 Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com> Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>

Due to erratum ERR011573, speculative accesses might be performed to normal memory unmapped in the MPU. This can be avoided by using MPU region 0 to cover all unmapped memory and make this region execute-never and inaccessible. Fixes zephyrproject-rtos#89852 Co-authored-by: Peter van der Perk <peter.vanderperk@nxp.com> Signed-off-by: Manuel Argüelles <manuel.arguelles@nxp.com>

cmoser-crl added the bug label May 12, 2025

Dat-NguyenDuy added the platform: NXP Robotics label May 13, 2025

github-actions bot assigned bperseghetti and PetervdPerk-NXP May 13, 2025

danieldegrasse added the priority: low label May 13, 2025

manuargue mentioned this issue May 21, 2025

fix potential speculative access that might trigger bus faults on Cortex-M7 #90305

Merged

kartben closed this as completed in #90305 May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S32K344 halts after a period of time #89852

S32K344 halts after a period of time #89852

cmoser-crl commented May 12, 2025

github-actions bot commented May 12, 2025

Uh oh!

manuargue commented May 14, 2025 •

edited

Loading

Uh oh!

cmoser-crl commented May 15, 2025

Uh oh!

cmoser-crl commented May 19, 2025

Uh oh!

manuargue commented May 19, 2025

Uh oh!

S32K344 halts after a period of time #89852

S32K344 halts after a period of time #89852

Comments

cmoser-crl commented May 12, 2025

github-actions bot commented May 12, 2025

Uh oh!

manuargue commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmoser-crl commented May 15, 2025

Uh oh!

cmoser-crl commented May 19, 2025

Uh oh!

manuargue commented May 19, 2025

Uh oh!

manuargue commented May 14, 2025 •

edited

Loading