Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling in ubxlib triggers SecureFault in Trusted Firmware-M on nrf9160 #206

Closed
Irockasingranite opened this issue Feb 26, 2024 · 12 comments

Comments

@Irockasingranite
Copy link

Hi, we're trying to use a u-blox gnss module on an nrf9160 based board, using the nrf sdk and pulling in ubxlib as a zephyr module.
We recently updated the nrf sdk to v2.5.x, and since then our application cannot start, as Trusted Firmware-M crashes on a SecureFault:

[Sec Thread] Secure image initializing!
TF-M isolation level is: 0x00000001
Booting TF-M v1.8.0
FATAL ERROR: SecureFault
Here is some context for the exception:
    EXC_RETURN (LR): 0xFFFFFFBD
    Exception came from non-secure FW in thread mode.
    xPSR:    0x60000007
    MSP:     0x20000BF8
    PSP:     0x20002468
    MSP_NS:  0x2000DD90
    PSP_NS:  0x2000DCB0
    Exception frame at: 0x2000DCB0
        R0:   0x00000000
        R1:   0x00000000
        R2:   0x00000001
        R3:   0x00000008
        R12:  0x0000A2F8
        LR:   0x0000842B
        PC:   0x0000842C
        xPSR: 0x21000000
    CFSR:  0x00000000
    BFSR:  0x00000000
    BFAR:  Not Valid
    MMFSR: 0x00000000
    MMFAR: Not Valid
    UFSR:  0x00000000
    HFSR:  0x00000000
    SFSR:  0x00000048
    SFAR: 0x00000008

The issue does not appear when using nrf sdk v2.4.3, but does on v2.5.0.

This is likely not an issue specific to ubxlib, but it is the only correlation we've been able to identify. Since according to the documentation you support and test with nrf sdk v2.5.0, we're wondering if you've seen this or similar issues yourself, and know of a fix or workaround.

For a minimal example of the issue, see this repository.

@RobMeades
Copy link
Contributor

Thanks for posting, sorry you're having trouble. Not something we've seen I'm afraid: do you have a .map file that might tell you what has been loaded at 0x0000842C?

@Irockasingranite
Copy link
Author

Checking zephyr.map, the relevant part appears to be this:

 .text.ubxlib_preinit
                0x0000000000008424       0x18 app/libapp.a(u_port.c.obj)
 .text.main     0x000000000000843c       0x28 app/libapp.a(main.c.obj)
                0x000000000000843c                main

, which further points at ubxlib as the 'culprit'.

Looking at ubxlib_preinit in the zephyr specific u_port.c, all it has is a call to k_thread_system_pool_assign. Commenting out that line does in fact get rid of the securefault at boot (but probably badly breaks ubxlib :) ). The fact that it's an init function also explains why simply compiling the library is enough to trigger the issue, without explicitly linking to it from the application.

Checking the same zephyr.map again, I see that the system heap that's being assigned is sitting at 0x2000c68c, though, which should be well inside the 'nonsecure' ram area (the 'secure' region spans 0x20000000 to 0x20008000):

k_heap_area     0x000000002000c68c       0x14 load address 0x0000000000011874
                0x000000002000c68c                _k_heap_list_start = .
 *(SORT_BY_NAME(SORT_BY_ALIGNMENT(._k_heap.static.*)))
 ._k_heap.static._system_heap_
                0x000000002000c68c       0x14 zephyr/kernel/libkernel.a(mempool.c.obj)
                0x000000002000c68c                _system_heap
                0x000000002000c6a0                _k_heap_list_end = .

@RobMeades
Copy link
Contributor

Maybe k_thread_system_pool_assign() somehow refers to something within the secured area? Is it possible to attach a debugger and break point at the function then single-step, see in detail what it is trying to do?

@Irockasingranite
Copy link
Author

k_thread_system_pool_assign() itself isn't very complicated. From zephyr/kernel/mempool.c:

K_HEAP_DEFINE(_system_heap, CONFIG_HEAP_MEM_POOL_SIZE);
#define _SYSTEM_HEAP (&_system_heap)

/* [...] */

void k_thread_system_pool_assign(struct k_thread *thread)
{
	thread->resource_pool = _SYSTEM_HEAP;
}

@RobMeades
Copy link
Contributor

Hmph. Stating the obvious, if both thread->resource_pool and _SYSTEM_HEAP are outside the range 0x20000000 to 0x20008000, that suggests that, somehow, more than is intended is being secured. Could you maybe try reading from _SYSTEM_HEAP and thread->resource_pool, just as a standalone operation, slightly before this, see if those also hit the security issue?

@Irockasingranite
Copy link
Author

Yes, testing those individually got me further: It's not the heap at all, it's writing to thread. Stepping through shows that what k_current_get() returns is 0xf75, which... doesn't seem right.

So k_current_get() isn't working correctly, which shouldn't really be surprising. ubxlib_preinit is running at init level PRE_KERNEL_1, for which the Zephyr documentation states: "These devices cannot use any kernel services during configuration, since the kernel services are not yet available".

Is there a reason ubxlib_preinit needs to run this early? If I change it to POST_KERNEL, k_current_get() gives me 0x2000cc98, which seems much more reasonable, and there is no securefault.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 26, 2024

Progress, well done!

ubxlib_preinit() has always been set to run at PRE_KERNEL_1, I guess so that the heap is set up before there's a chance that any tasks get run? The change that introduced it, two years ago, is fb63a1f, where the commit comment says:

Systempool is now allocated during startup in z_sys_init_run_level, threads created afterwards will inherit this pool. This resolves a known issue in Zephyr when calling UBXLIB API from threads that isn't the Zephyr main thread.

@RobMeades
Copy link
Contributor

I'd guess that if POST_KERNEL works for you then it is probably OK. Let me run that change through the test system and see what happens.

@RobMeades
Copy link
Contributor

RobMeades commented Feb 26, 2024

What I don't understand is why calling k_thread_system_pool_assign() with the parameter k_current_get() has worked for the last 2 years (and still works on nRF52/nRF53), if the return value of k_current_get() is garbage at that time...?

@Irockasingranite
Copy link
Author

Checking the blame on zephyr/include/zephyr/kernel.h I see some changes relating to thread local storage in k_current_get() a few months ago, perhaps that's the change pulled in by the newer sdk versions that broke this?

@RobMeades
Copy link
Contributor

RobMeades commented Feb 26, 2024

Maybe, there do seem to have been changes in this area, though we don't use CONFIG_THREAD_LOCAL_STORAGE or CONFIG_CURRENT_THREAD_USE_TLS (at least not directly) and we are definitely using nRFConnect 2.5.0 which I hope isn't a moveable fast as far as the underlying Zephyr is concerned...?

EDIT: Ah, I see there has been a 2.5.1, a 2.5.2 and there seems to be a kind of "head-rev" 2.5.99-dev1 lying around as well.

Anyway, the statement you have quoted from the Zephyr documentation is pretty clear and replacing PRE_KERNEL1 with POST_KERNEL in the call to SYS_INIT() passes our tests. Let me get that change reviewed and, assuming everyone is OK with it, I will push it to master here and update this issue when I've done that.

@RobMeades
Copy link
Contributor

Fix is now in master here, see 3c8d2d9. I will close this issue now: please feel free to re-open, or open another, if there is more to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants