Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: arm: Add code for swapping threads between secure and non-secure #26414

Merged
merged 7 commits into from May 5, 2021

Conversation

oyvindronningstad
Copy link
Collaborator

@oyvindronningstad oyvindronningstad commented Jun 24, 2020

This adds code to swap_helper.S which does special handling of LR when
the interrupt came from secure. The LR value is stored to memory, and
put back into LR when swapping back to the relevant thread.

Signed-off-by: Øyvind Rønningstad oyvind.ronningstad@nordicsemi.no

Longer explanation:

When a pendsv interrupt happens (to switch threads), LR contains a very special value (EXC_RETURN) describing the thread it came from (https://developer.arm.com/docs/100235/latest/the-cortexm33-processor/exception-model/exception-entry-and-return). The current implementation of swap_helper.S (which contains the pendsv interrupt handler) assumes a number of things about EXC_RETURN that do not hold if the CPU is executing a call into secure mode. This leads to undefined behavior if this happens, e.g. if a delayed work hits its timeout, while the CPU is in secure mode.

The current implementation leaves EXC_RETURN unchanged when switching between threads. The exception to this is when FPU is enabled. In this case, the FType bit can have different values based on whether FPU is used by the thread. This individual bit is handled correctly by swap_helper.S.

The pendsv handler normally doesn't touch the EXC_RETURN, except to set or clear a single bit to do with floating point registers. With TrustZone, there are one or two more bits that might be different between threads, the S bit (secure/nonsecure stack) and possibly the DCRS bit.

This PR introduces storing of the entire EXC_RETURN on entering the pendsv handler, and restoring the EXC_RETURN of the new thread before exiting the pendsv handler. We also set a default EXC_RETURN value when creating the thread.

This should also work for secure threads that are switched out while calling a secure function.

Why does this work?
As long as LR contains the same EXC_RETURN when a thread is switched in as when it was switched out, all the registers should be correctly popped from the relevant stack. The EXC_RETURN bits collectively specify which/how many registers are pushed, and to which stack.

How to choose the default EXC_RETURN value?
Because of the standardized way threads are initialized, there are only two valid values: 0xFFFFFFFD for secure/regular code, and 0xFFFFFFBC for non-secure code.

What are the limitations of this approach?
If one thread is executing in secure mode, no other threads can enter secure mode safely. It might work if the second thread is allowed to exit before the first thread is rescheduled, but since the NS kernel can do nothing about the S stacks, the two secure calls will be sharing stacks so the stack might overflow. As a consequence, all calls to secure mode should be prefaced with grabbing a global mutex. It might be possible to do something clever in swap_helper to not swap in a secure thread when another is running, but that is out of scope here.

@ioannisg
Copy link
Member

@oyvindronningstad I would appreciate if you could elaborate here, on which exact problem you are trying to address. Then we can look at the PR together.

@oyvindronningstad
Copy link
Collaborator Author

oyvindronningstad commented Jun 25, 2020

@oyvindronningstad I would appreciate if you could elaborate here, on which exact problem you are trying to address. Then we can look at the PR together.

@ioannisg I updated the PR description now. Please tell me your thoughts.

@ioannisg
Copy link
Member

@oyvindronningstad thanks for the longer explanation. That's interesting and inline with our earlier discussion in the downstream ticket.

I am basically going to comment on the idea, for now - not the specific implementation details.

So, as far as I understand you are trying to solve the problem of a NSPE thread properly continuing a Secure call while being context-switched out and in again, is that correct? Then, the solution you propose could theoretically work; the ARCHitecture allows you to manipulate the EXC_RETURN and return from the exception while transiting to the alternative security state. However, we must be very careful, because during the exception return, the ARCHitecture performs a series of integrity checks that need to pass, and in this case, I am not sure all these checks will always pass. For example: if the PendSV interrupt preempts a Secure interrupt then the unstacked XPSR won't match the execution mode the PE is supposed to return to. (This is of course a corner case, only valid when the Secure domain has a low priority interrupt, lower than the Non-Secure PendSV, but it is a scenario you cannot handle with this idea).

Anyway, if the integrity checks pass (we must check this), then your idea will work, because a NS thread's PSP (PSP_NS) remains un-touched when doing a secure call, and Zephyr Kernel ensures it is properly saved/restored. So when a thread switches back -in, to finish the secure call, when it returns from the call, the correct PSP_NS will be loaded.

How to choose the default EXC_RETURN value?

This is indeed a "slightly" weak side of the idea, because we are constructing an EXC_RETURN value based on assumptions on how Zephyr ARCH code works for ARM. For Non-Secure PendSV we need to assume that:

  • return mode is always THREAD (never HANDLER)
  • Zephyr threads use PSP (never MSP)
    [the FType flag is anyways manipulated]

These are all reasonable assumptions to make.

If one thread is executing in secure mode, no other threads can enter secure mode safely AFAIK

Yes, this is correct. We cannot read/modify the Secure stack pointer :)

As a consequence, all calls to secure mode should be prefaced with grabbing a global mutex.

Yes, so, then, I need to question the real value of this idea. Generally, we expect NSPE to be a multi-threaded RTOS (Zephyr) so we should be locking the scheduler, anyways, before doing a secure call. Then there is no need for all this infrastructure. In summary, the idea has value only if there is a single NSPE execution context that interacts with the SPE. How probable is this scenario? Do we have such cases in our downstream project?

Side note:
The equivalent case with respect to floating-point is with FPU and FPU_SHARING:
FPU: un-shared register mode, a single thread can use the FP
FP_SHARING: many threads can use the FP

In the downstream projects I have not seen examples that enable FPU but not FP_SHARING. Because developers usually have multiple contexts that want to use the FP.

@oyvindronningstad
Copy link
Collaborator Author

Thanks a lot for the answer.

For Non-Secure PendSV we need to assume that:

  • return mode is always THREAD (never HANDLER)
  • Zephyr threads use PSP (never MSP)

If it's a problem, can we infer these from CONFIG_s and set the default EXC_RETURN accordingly?

Generally, we expect NSPE to be a multi-threaded RTOS (Zephyr) so we should be locking the scheduler, anyways, before doing a secure call.

With this patch, we shouldn't need to lock the scheduler. If we still need to lock the scheduler for some other reason, I agree with you that it has limited utility. If we don't need to lock the scheduler (even if we do need a global mutex), time-sensitive NS threads can be executed while a long running secure service is executed (RNG for example takes ~50-100 ms), making the system more responsive.

@joh06937
Copy link

joh06937 commented Jul 9, 2020

Sorry to randomly jump in on this, but for the last week or two I've been going down the route of trying to have Zephyr and our primary application running entirely out of Secure flash/RAM on the nRF9160 with a dedicated thread that calls into a user-written application in Non Secure flash/RAM. This PR was just pointed out to me.

The intent is that we can have our code running on our part and use TrustZone as a nice hardware protection mechanism (that requires very little intricate configuration) to allow our code to be shipped to a customer and lets them flash their own restricted application onto our device, which we would run and manage on their behalf, all without risking our IP/keys/sensitive info from being accessible in any way by them (and while providing Secure APIs for various functionality).

Unfortunately, I have run into a few roadblocks (not the least of which is the manual boundary management found in this PR and the concern over Non Secure-side interrupt management, as mentioned above). I've also had some trouble with the stacking of the floating point registers causing a fault on my part, even with a bit of extra configuration with CP11/10 in SCB->NSACR.

Anyway, my point here is that there is at least one person out there who's interested in a robust way to solve this issue :-)

@github-actions
Copy link

github-actions bot commented Sep 8, 2020

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Sep 8, 2020
@github-actions github-actions bot closed this Sep 22, 2020
@oyvindronningstad oyvindronningstad force-pushed the swap-secure branch 2 times, most recently from 8e7e9f0 to 4828524 Compare March 24, 2021 12:04
@github-actions github-actions bot added area: Kernel area: Modules area: Samples Samples area: Tests Issues related to a particular existing or missing test labels Mar 24, 2021
@oyvindronningstad
Copy link
Collaborator Author

Rebased on top of #33510

@ioannisg
Copy link
Member

ioannisg commented May 3, 2021

@microbuilder @galak @stephanosio @carlocaione if any of you could review this patch before the 2.6 freeze :)

@github-actions github-actions bot added area: Documentation Release Notes To be mentioned in the release notes labels May 4, 2021
ioannisg and others added 5 commits May 4, 2021 15:38
Introduce a Kconfig option to allow Secure function calls to be
pre-empted.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
Signed-off-by: Øyvind Rønningstad <oyvind.ronningstad@nordicsemi.no>
This adds code to swap_helper.S which does special handling of LR when
the interrupt came from secure. The LR value is stored to memory, and
put back into LR when swapping back to the relevant thread.

Also, add special handling of FP state when switching from secure to
non-secure, since we don't know whether the original non-secure thread
(which called a secure service) was using FP registers, so we always
store them, just in case.

Signed-off-by: Øyvind Rønningstad <oyvind.ronningstad@nordicsemi.no>
When ARM_NONSECURE_PREEMPTIBLE_SECURE_CALLS is enabled, if FPU is
being used (CONTROL.FPCA == 1), store all FP registers before
entering the secure function, and restore them afterwards.

This is needed if any NS thread or ISR that interrupts the secure
function uses FP registers. If they do, a secure UsageFault occurs
unless this change is applied.

This allows k_sched_lock() and k_sched_unlock() to be dropped when
ARM_NONSECURE_PREEMPTIBLE_SECURE_CALLS is enabled.

Enable ARM_NONSECURE_PREEMPTIBLE_SECURE_CALLS by default when
building TF-M.

Signed-off-by: Øyvind Rønningstad <oyvind.ronningstad@nordicsemi.no>
For testing secure->non-secure thread swapping.
This also tests that the FP context is correctly preserved
when calling a secure function.

Signed-off-by: Øyvind Rønningstad <oyvind.ronningstad@nordicsemi.no>
Add an entry in the release notes regarding the
support for non-blocking secure calls in Zephyr.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
Copy link
Member

@microbuilder microbuilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this @oyvindronningstad

@ioannisg
Copy link
Member

ioannisg commented May 4, 2021

I'll add a patch to fix the test failure for the STM boards - apparently the overlay is not correct. To make this easy to track, I will not force-push, so just apply the patch on top of the PR. Please don't merge.

@oyvindronningstad
Copy link
Collaborator Author

I'll add a patch to fix the test failure for the STM boards - apparently the overlay is not correct. To make this easy to track, I will not force-push, so just apply the patch on top of the PR. Please don't merge.

Which test failure? Looks like the CI is green.

@ioannisg
Copy link
Member

ioannisg commented May 4, 2021

I'll add a patch to fix the test failure for the STM boards - apparently the overlay is not correct. To make this easy to track, I will not force-push, so just apply the patch on top of the PR. Please don't merge.

Which test failure? Looks like the CI is green.

The overlay is not correct, and the NS test does not boot. After changing the overlay the test is able to run, however it fails. Attaching log

*** Booting Zephyr OS build zephyr-v2.5.0-3129-gb84e7d4cdcb2  ***
Running test suite test_thread_swap
===================================================================
START - test_thread_swap
    Assertion failed at ../s
    Assertion failed at ../src/main.c:41: work_func: (main_thread->arch.mode_exc_return & EXC_RETURN_S is false)
EXC_RETURN not secure: 0xac
 FAIL - test_thread_swap in 0.19 seconds
===================================================================
Test suite test_thread_swap failed.
===================================================================
PROJECT EXECUTION FAILED

@oyvindronningstad
Copy link
Collaborator Author

After changing the overlay the test is able to run, however it fails.

That error means there's probably some timing issue in the test. The work function didn't manage to interrupt the secure service. Likely the crypto is faster and finishes before the work fires.

@ioannisg
Copy link
Member

ioannisg commented May 4, 2021

After changing the overlay the test is able to run, however it fails.

That error means there's probably some timing issue in the test. The work function didn't manage to interrupt the secure service. Likely the crypto is faster and finishes before the work fires.

I understand that. What is your proposal to fix this? Or should we remove the ST platforms from the allow-list?

@oyvindronningstad
Copy link
Collaborator Author

Try decreasing the delay on the work

Decrease the submitted work delay, to ensure that the PSA
crypto operation will be preempted when the work fires. The
modification is required for devices with fast crypto
operations. Also minor corrections to the test ase name,
so it is not the same with the other arm_thread_swap test.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
Correct thee dt overlays for the STM32 boards, so the Zephyr
image starting address is in accordance with what TF-m expects
it to be.

Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
@ioannisg
Copy link
Member

ioannisg commented May 5, 2021

Test is passing now on

  • nrf5340
  • nrf9160
  • nucleo_l552ze_q
  • stm32l562e_q
  • mps2_an521

@ioannisg ioannisg merged commit 0eb3b3c into zephyrproject-rtos:master May 5, 2021
@microbuilder
Copy link
Member

A bit late to the party, but tests also passing on lpcxpresso55s69_ns

@MZachmann
Copy link

on Windows I'm getting a warning with gnu arm (q4 2020) that __packed in the struct near line 114 is being ignored and it needs to be after the struct keyword. It's fine with struct __packed.

@ioannisg
Copy link
Member

ioannisg commented May 9, 2021

etting a warning with gnu arm (q4 2020) that __packed in the stru

@oyvindronningstad you might want to take a look?

@oyvindronningstad oyvindronningstad deleted the swap-secure branch May 10, 2021 07:12
@oyvindronningstad
Copy link
Collaborator Author

on Windows I'm getting a warning with gnu arm (q4 2020) that __packed in the struct near line 114 is being ignored and it needs to be after the struct keyword. It's fine with struct __packed.

@MZachmann Thanks, do you mind checking that this removes your warning? #35031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: ARM ARM (32-bit) Architecture area: Devicetree area: Documentation area: Kernel area: Modules area: Tests Issues related to a particular existing or missing test Maintainer At least 2 days before merge, maintainer review required Release Notes To be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants