Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMP and MPU Improvements #1873

Merged
merged 3 commits into from Sep 10, 2020
Merged

Conversation

alistair23
Copy link
Contributor

Pull Request Overview

This PR replaces:

And combines a few different MPU/PMP improvements

  1. Rename the enable/disable_mpu() functions to be enable/disable_app_mpu() functions. This is because these functions only enable and disable the MPU for apps.
  2. Conver the current disable_mpu() function to be clear_mpu() and run it at boot. Instead of clearing the MPU/PMP after returning to the kernel from an app let's instead do nothing as we don't need to. We still leave the function around though, encase a future architectures wants to use it. This should improve syscall performance, especially for RISC-V. We also clear the MPU/PMP at boot encase a previous stage has set anything up.
  3. We also add new functions in preperation for RISC-V ePMP (see below)

RISC-V ePMP

The RISC-V enhanced PMP spec is making progress and it seems like it is unlikely to have any major changes before ratification.

In anticipation of this let's start to prepare adding support to Tock.

This PR updates the MPU trait to allow marking regions of kernel memory with permissions.

The idea here is that before the MPU is enabled Tock will set read/write/execute permissions for itself. This will limit what Tock itself can access, for example by removing write permissions from code.

We can also add a check to see if it is already enabled. If the previous stage has enabled lock down mode we won't set it ourselves. We will have to trust the previous stage (that would have loaded us) to have done it correctly. If no one else has set it, we can set it for enhanced security.

Once enabled we can't change the execute permission (read/write can change) so let's enforce that all regions are added before enabling.

Testing Strategy

Running on QEMU, I'll test the Unleashed and Apollo3 as well.

TODO or Help Wanted

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make format.
  • Fixed errors surfaced by make clippy.

@alistair23 alistair23 changed the title Alistair/mpu improvements PMP and MPU Improvements May 21, 2020
@alistair23
Copy link
Contributor Author

I pushed an update that splits out the KernelMPU and fixes some comments.

@alistair23
Copy link
Contributor Author

Ping!

arch/cortex-m3/src/mpu.rs Outdated Show resolved Hide resolved
arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
kernel/src/platform/mod.rs Outdated Show resolved Hide resolved
kernel/src/platform/mpu.rs Outdated Show resolved Hide resolved
kernel/src/platform/mpu.rs Show resolved Hide resolved
Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait changes look good, but I'm wary of changing every board when we don't need to add the clear_mpu call.

kernel/src/platform/mod.rs Outdated Show resolved Hide resolved
@phil-levis
Copy link
Contributor

I will take a look by end of week.

Copy link
Contributor

@phil-levis phil-levis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this looks good. I think clear should be taken out of the trait, and the comments describing what these functions do need to be rewritten -- I think they are in a few cases saying the opposite of what they want to say. It seems like something we could easily hammer out in 30 minutes or so.

kernel/src/platform/mod.rs Outdated Show resolved Hide resolved
kernel/src/platform/mod.rs Outdated Show resolved Hide resolved
kernel/src/platform/mpu.rs Show resolved Hide resolved
kernel/src/platform/mpu.rs Show resolved Hide resolved
@alistair23
Copy link
Contributor Author

I just pushed an update. I have tried to improve the documentation.

I removed the boards clearing the MPU. I have left the clear_mpu() function in the trait as I think it is very useful to have. If we remove it from the trait we will just end up implementing the function publicly in the MPU implementations anyway.

Let me know what you think.

@alistair23
Copy link
Contributor Author

Ping!

@alistair23
Copy link
Contributor Author

Ping^2

@phil-levis or @bradjc any comments?

@alistair23
Copy link
Contributor Author

Ping^3

This patch does a few things:
 - Convert the disable_mpu() function to clear_mpu()
 - Make the disable_mpu() function a NOP on ARMv7 and RV32
 - Rename the enable/disable functions to be app specific

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
hudson-ayers
hudson-ayers previously approved these changes Aug 17, 2020
@alistair23
Copy link
Contributor Author

Can this be merged?

@alistair23
Copy link
Contributor Author

Ping again

@hudson-ayers hudson-ayers added the last-call Final review period for a pull request. label Sep 8, 2020
@hudson-ayers
Copy link
Contributor

cc @phil-levis bors can't merge this until you remove your request for changes

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
phil-levis
phil-levis previously approved these changes Sep 10, 2020
Copy link
Contributor

@phil-levis phil-levis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@hudson-ayers
Copy link
Contributor

bors r+

@bors
Copy link
Contributor

bors bot commented Sep 10, 2020

Canceled.

@alistair23
Copy link
Contributor Author

alistair23 commented Sep 10, 2020

I just pushed a patch adding the comments discussed in the OT call.

Can we re-run bors?

Sorry for the confusion @hudson-ayers, you were too quick :)

@bradjc
Copy link
Contributor

bradjc commented Sep 10, 2020

bors r+

@bors bors bot merged commit add1e63 into tock:master Sep 10, 2020
@alistair23 alistair23 deleted the alistair/mpu-improvements branch September 10, 2020 16:54
@ia0
Copy link
Contributor

ia0 commented Sep 21, 2020

Hi @alistair23,

Is there an issue I could follow tracking the implementation of KernelMpuConfig for Cortex-M? In particular, I believe this PR breaks support for writable flash regions for boards with direct write to the flash like for the nRF52 family, because the kernel cannot write to the flash now that the MPU is left enabled when transitioning from app back to kernel (i.e. using disable_app_mpu instead of clear_mpu).

Thanks!

@alistair23
Copy link
Contributor Author

Hey @ia0. I am not doing anything related to KernelMpuConfig for Cortex-M. My focus is extending Tock to support RISC-V ePMP using KernelMpuConfig.

Can you point to documentation that describes the issue? Can you also point to a way to reproduce it?

You are saying that because we no longer do regs.ctrl.write(Control::ENABLE::CLEAR); after returning from an app that the kernel can't write to direct write flash? A partial revert of 128782d should fix this right?

@ia0
Copy link
Contributor

ia0 commented Sep 21, 2020

A partial revert of 128782d should fix this right?

Yes, this is essentially what we currently do as a workaround until we can setup a kernel MPU config.

Can you also point to a way to reproduce it?

I'll try to create a "minimal" reproduction case when I get time.

I am not doing anything related to KernelMpuConfig for Cortex-M.

I see. However, we might need some work in that direction to avoid the regression. Implementing KernelMpuConfig should probably not be complicated for someone familiar with the architecture. And updating Tock to configure the Kernel MPU according to the writable flash regions should also not be complicated (I might be able to do it if I look into it).

@ppannuto
Copy link
Member

@tock/core-wg this sounds like it could be a regression that we may want to fix before a 1.6 release? @alistair23 or @ia0, would it be possible to get the partial revert fix into a PR relatively quickly?

@alistair23
Copy link
Contributor Author

The disable_app_mpu() function "must be disabled for the kernel to effectively manage processes", so if it is not doing that we should fix the function for Cortex-M. I don't think we need any changes or implementation of KernelMpuConfig for this.

We could either clear the enable bit when returning to the kernel (as we did previously) or we could change the MPU permissions to only affect userspace. The second option matches what we do for RISC-V, so I think that's better.

Something like this diff:

diff --git a/arch/cortex-m3/src/mpu.rs b/arch/cortex-m3/src/mpu.rs
index 004d52194..729a65977 100644
--- a/arch/cortex-m3/src/mpu.rs
+++ b/arch/cortex-m3/src/mpu.rs
@@ -275,14 +275,14 @@ impl CortexMRegion {
                 RegionAttributes::XN::Disable,
             ),
             mpu::Permissions::ReadExecuteOnly => {
-                (RegionAttributes::AP::ReadOnly, RegionAttributes::XN::Enable)
+                (RegionAttributes::AP::UnprivilegedReadOnly, RegionAttributes::XN::Enable)
             }
             mpu::Permissions::ReadOnly => (
-                RegionAttributes::AP::ReadOnly,
+                RegionAttributes::AP::UnprivilegedReadOnly,
                 RegionAttributes::XN::Disable,
             ),
             mpu::Permissions::ExecuteOnly => {
-                (RegionAttributes::AP::NoAccess, RegionAttributes::XN::Enable)
+                (RegionAttributes::AP::PrivilegedOnly, RegionAttributes::XN::Enable)
             }
         };

@ia0
Copy link
Contributor

ia0 commented Sep 22, 2020

I confirm that the diff in #1873 (comment) fixes the issue.

It would take some work to make a minimal or even small reproduction case. But here are the reproduction steps for the actual issue:

  • Install OpenSK on a nRF52840:
    git clone git@github.com:jmichelp/OpenSK.git
    cd OpenSK
    rm patches/tock/05-mpu-fix.patch
    ./setup.sh
    # The following line assumes a dev-kit board is plugged and configured
    ./deploy.py --board=nrf52840dk --opensk --panic-console --clear-storage --programmer=jlink
  • Look at RTT debugging:
    JLinkExe -device nrf52840_xxAA -if swd -speed 1000 -AutoConnect 1
    # In another shell
    JLinkRTTClientExe
  • Press the reset button of the dev-kit and look at the kernel hardfault due to data access violation:
    panicked at 'Kernel HardFault.
    	Kernel version release-1.5-1043-gc5b7a4f2c
    	r0  0xffff8000
    	r1  0xffffffff
    	r2  0xffff8000
    	r3  0x4
    	r12 0x2003fb90
    	lr  0xc99d
    	pc  0xc9b4
    	prs 0x81001800 [ N 1 Z 0 C 0 V 0 Q 0 GE 0000 ; ICI.IT 6 T true ; Exc 0-Thread Mode ]
    	sp  0x20000a88
    	top of stack     0x20001000
    	bottom of stack  0x20000000
    	SHCSR 0x0
    	CFSR  0x82
    	HSFR  0x40000000
    	Instruction Access Violation:       false
    	Data Access Violation:              true
    	Memory Management Unstacking Fault: false
    	Memory Management Stacking Fault:   false
    	Memory Management Lazy FP Fault:    false
    	Instruction Bus Error:              false
    	Precise Data Bus Error:             false
    	Imprecise Data Bus Error:           false
    	Bus Unstacking Fault:               false
    	Bus Stacking Fault:                 false
    	Bus Lazy FP Fault:                  false
    	Undefined Instruction Usage Fault:  false
    	Invalid State Usage Fault:          false
    	Invalid PC Load Usage Fault:        false
    	No Coprocessor Usage Fault:         false
    	Unaligned Access Usage Fault:       false
    	Divide By Zero:                     false
    	Bus Fault on Vector Table Read:     false
    	Forced Hard Fault:                  true
    	Faulting Memory Address: (valid: true) 0x000C0000
    	Bus Fault Address:       (valid: false) 0x000C0000
    ', arch/cortex-m/src/lib.rs:300:5
    	Kernel version release-1.5-1043-gc5b7a4f2c
    
    ---| No debug queue found. You can set it with the DebugQueue component.
    
    ---| Fault Status |---
    No faults detected.
    
    ---| App Status |---
    App: ctap2   -   [Running]
     Events Queued: 0   Syscall Count: 18   Dropped Callback Count: 0
     Restart Count: 0
     Last Syscall: Some(COMMAND { driver_number: 327683, subdriver_number: 2, arg0: 786432, arg1: 4 })
    
     ╔═══════════╤══════════════════════════════════════════╗
     ║  Address  │ Region Name    Used | Allocated (bytes)  ║
     ╚0x20040000═╪══════════════════════════════════════════╝
                 │ ▼ Grant        1156 |   1156
      0x2003FB7C ┼───────────────────────────────────────────
                 │ Unused
      0x20039FA4 ┼───────────────────────────────────────────
                 │ ▲ Heap        90000 | 113512               S
      0x20024014 ┼─────────────────────────────────────────── R
                 │ Data             20 |     20               A
      0x20024000 ┼─────────────────────────────────────────── M
                 │ ▼ Stack        1008 |  16384
      0x20023C10 ┼───────────────────────────────────────────
                 │ Unused
      0x20020000 ┴───────────────────────────────────────────
                 .....
      0x00080000 ┬─────────────────────────────────────────── F
                 │ App Flash    262080                        L
      0x00040040 ┼─────────────────────────────────────────── A
                 │ Protected        64                        S
      0x00040000 ┴─────────────────────────────────────────── H
    
      R0 : 0x00050003    R6 : 0x00000001
      R1 : 0x00000002    R7 : 0x20023C48
      R2 : 0x000C0000    R8 : 0x20024014
      R3 : 0x00000004    R10: 0x000C0000
      R4 : 0x00000000    R11: 0x00000000
      R5 : 0x000C0000    R12: 0x00000004
      R9 : 0xFFFFFFF8 (Static Base Register)
      SP : 0x20023C10 (Process Stack Pointer)
      LR : 0x00000003
      PC : 0x000513FA
     YPC : 0x000513FA
    
     APSR: N 0 Z 0 C 0 V 0 Q 0
           GE 0 0 0 0
     EPSR: ICI.IT 0x00
           ThumbBit true
    
     Cortex-M MPU
      Region 0: [0x20020000:0x20040000], length: 131072 bytes; ReadWrite (0x3)
        Sub-region 0: [0x20020000:0x20024000], Enabled
        Sub-region 1: [0x20024000:0x20028000], Enabled
        Sub-region 2: [0x20028000:0x2002C000], Enabled
        Sub-region 3: [0x2002C000:0x20030000], Enabled
        Sub-region 4: [0x20030000:0x20034000], Enabled
        Sub-region 5: [0x20034000:0x20038000], Enabled
        Sub-region 6: [0x20038000:0x2003C000], Enabled
        Sub-region 7: [0x2003C000:0x20040000], Disabled
      Region 1: [0x00040000:0x00080000], length: 262144 bytes; ReadOnly (0x6)
        Sub-region 0: [0x00040000:0x00048000], Enabled
        Sub-region 1: [0x00048000:0x00050000], Enabled
        Sub-region 2: [0x00050000:0x00058000], Enabled
        Sub-region 3: [0x00058000:0x00060000], Enabled
        Sub-region 4: [0x00060000:0x00068000], Enabled
        Sub-region 5: [0x00068000:0x00070000], Enabled
        Sub-region 6: [0x00070000:0x00078000], Enabled
        Sub-region 7: [0x00078000:0x00080000], Enabled
      Region 2: [0x000C0000:0x00100000], length: 262144 bytes; ReadOnly (0x6)
        Sub-region 0: [0x000C0000:0x000C8000], Enabled
        Sub-region 1: [0x000C8000:0x000D0000], Enabled
        Sub-region 2: [0x000D0000:0x000D8000], Enabled
        Sub-region 3: [0x000D8000:0x000E0000], Enabled
        Sub-region 4: [0x000E0000:0x000E8000], Enabled
        Sub-region 5: [0x000E8000:0x000F0000], Enabled
        Sub-region 6: [0x000F0000:0x000F8000], Enabled
        Sub-region 7: [0x000F8000:0x00100000], Enabled
      Region 3: Unused
      Region 4: Unused
      Region 5: Unused
      Region 6: Unused
      Region 7: Unused
    

Address 0xC0000 is where OpenSK has its persistent storage on nRF52840 boards. After reset, it tries to initialize the storage by writing some metadata.

@alistair23
Copy link
Contributor Author

PR created: #2120

alistair23 added a commit to alistair23/tock that referenced this pull request Sep 22, 2020
Commit 128782d "mpu: Change the disable_mpu API" converted the
disable_app_mpu() function to not make any changes. This means when we
return from an app to the kernel we don't disable the MPU.

This resulted in some access failures when the kernel tried to access
direct write flash regions (tock#1873 (comment)).

We can either disable the MPU when returning to the kernel or change the
app MPU configuration permissions to not impact the kernel.

This patch converts the Cortex-M3 MPU implementation to always allow the
kernel access when configuring the app. This way we can avoid the
overhead of disabling the MPU on context switches. There is no security
gap here as the kernel could just disable the MPU anyway if it was
malicious.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
bors bot added a commit that referenced this pull request Sep 23, 2020
2120: arch/cortex-m3: Allow the kernel to access protected memory r=bradjc a=alistair23

### Pull Request Overview

Commit 128782d "mpu: Change the disable_mpu API" converted the
disable_app_mpu() function to not make any changes. This means when we
return from an app to the kernel we don't disable the MPU.

This resulted in some access failures when the kernel tried to access
direct write flash regions (#1873 (comment)).

We can either disable the MPU when returning to the kernel or change the
app MPU configuration permissions to not impact the kernel.

This patch converts the Cortex-M3 MPU implementation to always allow the
kernel access when configuring the app. This way we can avoid the
overhead of disabling the MPU on context switches. There is no security
gap here as the kernel could just disable the MPU anyway if it was
malicious.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

### Testing Strategy

None.

### TODO or Help Wanted

### Documentation Updated

- [X] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [X] Ran `make prepush`.


Co-authored-by: Alistair Francis <alistair.francis@wdc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
last-call Final review period for a pull request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants