kernel: configure MPU only if switching to a different process #1822

alphan · 2020-05-04T19:37:35Z

Pull Request Overview

Currently, the scheduler configures the MPU every time when it switches
to a process. This change aims to improve syscall latency by caching the
last running process and configuring the MPU only when switching to a
different process.

Testing Strategy

I ran a simple libtock-rs application for measuring syscall latency with
and without this change on opentitan on verilator. This change reduces the
latency of a command syscall by ~~2751~~ 1711 cycles (the exact number of cycles
saved obviously depends on the actual implementation, e.g. would probably
be less after #1821).

Also ran make allcheck.

TODO or Help Wanted

I couldn't find any doc files or tests that should be updated. Since
this is my first real Tock PR, it would be great if you could point me
in the right direction if I missed anything.

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make formatall.

ppannuto

Welcome! And thanks for the PR!

I don't think there are any docs or tests that need updating for this. Did your testing include running multiple apps as well?

Generally, I think this is a good change. However, I'm concerned there may be some extra considerations where we may need to identify the MPU configuration as "tainted". Specifically:

if there are additional grant allocations, then the deny region may need to grow (though, this is likely not an immediate issue as the allowed region is generally minimal)
if there was a change in permitted region, i.e. the application called brk or an IPC allow* then these need to be recalculated
Maybe others I'm not thinking of?

Basically, I think this is a good common-case optimization, but I think we also need a way for something to "poison" the MPU configuration to force it to be recalculated.

*though that second case may actually be fine since in the current architecture as it's always a different process that is allowing a region, so we'll go through regular process switch

kernel/src/sched.rs

bradjc · 2020-05-05T00:44:45Z

We would need to update the kernel/src/platform/mpu.rs documentation, particularly to clarify that the MPU implementation must be setup to only enforce in the userspace privilege level.

kernel/src/sched.rs

alphan · 2020-05-05T15:12:06Z

Did your testing include running multiple apps as well?

Yes, I also tried this with multiple apps and didn't notice any issues.

Generally, I think this is a good change. However, I'm concerned there may be some extra considerations where we may need to identify the MPU configuration as "tainted". Specifically:

if there are additional grant allocations, then the deny region may need to grow (though, this is likely not an immediate issue as the allowed region is generally minimal)

if there was a change in permitted region, i.e. the application called brk or an IPC allow* then these need to be recalculated

Maybe others I'm not thinking of?

Basically, I think this is a good common-case optimization, but I think we also need a way for something to "poison" the MPU configuration to force it to be recalculated.

*though that second case may actually be fine since in the current architecture as it's always a different process that is allowing a region, so we'll go through regular process switch

Thank you for catching this! I actually looked at various places that can change the MPU config but missed at least one case where MPU config can be modified and not applied immediately, e.g. sbrk and brk eventually call configure_mpu() but alloc doesn't. Since cache invalidation is tricky, does it make sense to modify MPU and MPUConfig to keep track of the taintedness of MPUConfig at a higher level?

ppannuto · 2020-05-05T15:24:23Z

Since cache invalidation is tricky, does it make sense to modify MPU and MPUConfig to keep track of the taintedness of MPUConfig at a higher level?

Yeah, I think that's a great idea. I believe the mpu configuration structure is already well-equipped for this as all of its state is private and can only be modified by getters and setters; hooking the latter to mark the MPU configuration as dirty, and then perhaps updating configure_mpu to know what process it's configured for, pushing the hot-path logic into the MPU itself seems like a good design to me.

In that case, I actually imagine we could remove the configure_mpus that are sprinkled around, instead architecting things such that whenever a process is returned to the MPU layer is consulted first, updating itself if necessary.

1826: kernel: use MapCell for Process::stored_state r=bradjc a=alphan ### Pull Request Overview Currently, `Process::stored_state` is defined as a `Cell`. This change aims to improve syscall latency by using `MapCell` instead. See also: #1730, #178. ### Testing Strategy I ran the same `libtock-rs` app used to test #1822 with and without this change on opentitan on verilator. This change saves ~1200 cycles for a `command` syscall. ### TODO or Help Wanted The main goal of `MapCell` is to avoid panics at runtime, but I'm not sure if it is OK to silently continue in cases where `Process::stored_state`can _possibly_ be empty, which should never happen. The current implementation exclusively uses `Cell::get`, but a `take` could always leave a `default` `StoredState` behind so this change doesn't seem to be worse in that regard. I tried to preserve the current behavior but please let me know if I missed anything. ### Documentation Updated - [X] Updated the relevant files in `/docs`, or no updates are required. ### Formatting - [x] Ran `make formatall`. Co-authored-by: Alphan Ulusoy <alphan@google.com>

Currently, the scheduler configures the MPU every time when it switches to a process. This change aims to improve syscall latency by caching the last running process and configuring the MPU only when switching to a different process. See also: tock#1730.

alphan · 2020-05-19T04:03:08Z

Reverted changes in kernel/src/sched.rs.
Added app_id parameter to configure_mpu.
Implemented the idea captured in this PR as a risc-v-only optimization in arch/rv32i/src/pmp.rs.
Rebased on master.
Also made disable_mpu a no-op for a more accurate latency measurement (moved its contents to PMPConfig::clear_mpu_registers, called from configure_mpu to stay close to current behavior). It looks like we don't need this function at all but I am deferring this change to Possible RISC-V PMP improvements #1860, which I created to keep track of the things that I noticed while working on this.

This matches the new `contains` method of Option: rust-lang/rust#62358

ppannuto

This looks good to me!

I added the implementation for ARM as well. It's a bit lighter weight there as ARM already largely did this by cacheing the region calculation result. However, this is still a case where we can trade 8 memory writes for 1 memory compare on the hot path, so likely worth it.

Re #1654: This adds a feature from the upcoming Option API. It's trivial to re-implement ourselves if the manages to become the last blocking feature, so I think there's little harm in using the official API for now.

alphan · 2020-05-20T13:37:23Z

@ppannuto, thank you for reviewing this change and your comments. I have a question regarding the changes in arch/cortex-m3/src/mpu.rs. Shouldn't hardware_is_configured_for be defined in MPU (to track the state of the hw)? Also, don't we need a is_dirty field in CortexMConfig?

ppannuto · 2020-05-20T14:17:40Z

Hmm.., you're right we need some form of structure that is the global to the MPU hardware. The MPU configuration is stored per process (here), so doing dirty tracking in the MpuConfig type won't be sufficient for resetting the MPU when we change processes.

I don't think the CortexMConfig needs an explicit dirty flag. Instead, the cell indicates what application the hardware "is configured for"; whenever the MPU configuration is changed, that cell is cleared to indicate that the MPU is not configured correctly for any application, which acts similarly to a separate dirty signal.

alphan · 2020-05-20T14:30:14Z

The MPU struct that I was referring to is the one defined here. IIUC, this should track the state of the hardware, i.e. hardware_is_configured_for, while CortexMConfig tracks if it is dirty or not.

Not really sure what I was thinking the first time, but this now correctly tracks the hardware configuration in the hardware structure and the per-process MPU configuration in the process config structure, and will update the hardware if either don't match. This also adds some documentation to these structures to save me from myself in the future.

alphan · 2020-05-26T13:14:23Z

@ppannuto, 121b33f LGTM, thanks! I also pushed a commit to fix the way last_configured_for is used in configure_mpu.

ppannuto

This all looks good to me; looking forward to the rest of the #1860 series.

arch/rv32i/src/pmp.rs

alphan · 2020-05-26T16:58:37Z

Reverted changes in disable_mpu to address @alistair23's comments. We should see the expected latency improvements when do_process stops clearing PMP registers every time it returns to kernel (#1873).

arch/rv32i/src/pmp.rs

alphan · 2020-06-01T18:52:09Z

Thank you for reviewing this @bradjc and @silvestrst!

Added comments to address @silvestrst's comments.

ppannuto · 2020-06-04T14:16:55Z

bors r+

bradjc · 2020-06-04T21:12:57Z

big queue i guess?

alphan marked this pull request as draft May 4, 2020 19:37

alphan changed the title ~~[DRAFT] kernel: configure MPU only if switching to a different process~~ kernel: configure MPU only if switching to a different process May 4, 2020

alphan marked this pull request as ready for review May 4, 2020 19:38

alphan mentioned this pull request May 4, 2020

[Tracking] Inefficiencies in the system call path on RISC-V #1730

Open

4 tasks

ppannuto requested changes May 4, 2020

View reviewed changes

kernel/src/sched.rs Outdated Show resolved Hide resolved

bradjc requested changes May 5, 2020

View reviewed changes

kernel/src/sched.rs Outdated Show resolved Hide resolved

alphan mentioned this pull request May 5, 2020

kernel: use MapCell for Process::stored_state #1826

Merged

2 tasks

ppannuto added the waiting-on-author label May 14, 2020

alistair23 mentioned this pull request May 18, 2020

Scheduler trait + transition all boards to round robin scheduler #1767

Merged

2 tasks

alphan mentioned this pull request May 19, 2020

Possible RISC-V PMP improvements #1860

Closed

3 tasks

alphan force-pushed the same-process-mpu branch from 78aa5d7 to a61d7e7 Compare May 19, 2020 03:37

alphan force-pushed the same-process-mpu branch from a61d7e7 to ded1b2a Compare May 19, 2020 03:55

ppannuto removed the waiting-on-author label May 19, 2020

ppannuto added 2 commits May 19, 2020 09:35

cells: add contains method to OptionalCell

5933fbd

This matches the new `contains` method of Option: rust-lang/rust#62358

cortex-m: Add dirty tracking to MPU

27e8e06

ppannuto previously approved these changes May 19, 2020

View reviewed changes

alistair23 previously approved these changes May 19, 2020

View reviewed changes

alphan requested a review from bradjc May 20, 2020 22:19

ppannuto added 2 commits May 21, 2020 10:24

doc: Add note for newly added feature

a19b11d

ppannuto dismissed alistair23’s stale review via 121b33f May 21, 2020 14:50

ppannuto dismissed their stale review via 121b33f May 21, 2020 14:50

ppannuto mentioned this pull request May 22, 2020

Building Tock with stable rustc #1654

Open

28 tasks

use map_or instead of map in configure_mpu

de32f11

alphan force-pushed the same-process-mpu branch from 99c337d to de32f11 Compare May 26, 2020 13:15

ppannuto previously approved these changes May 26, 2020

View reviewed changes

ppannuto assigned bradjc May 26, 2020

alistair23 suggested changes May 26, 2020

View reviewed changes

arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved

arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved

alphan dismissed ppannuto’s stale review via a59781b May 26, 2020 16:53

alistair23 reviewed May 26, 2020

View reviewed changes

arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved

Revert changes to disable_mpu

1182aae

alphan force-pushed the same-process-mpu branch from a59781b to 1182aae Compare May 26, 2020 22:44

alistair23 previously approved these changes May 26, 2020

View reviewed changes