Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: configure MPU only if switching to a different process #1822

Merged
merged 8 commits into from Jun 4, 2020

Conversation

alphan
Copy link
Contributor

@alphan alphan commented May 4, 2020

Pull Request Overview

Currently, the scheduler configures the MPU every time when it switches
to a process. This change aims to improve syscall latency by caching the
last running process and configuring the MPU only when switching to a
different process.

See also: #1730.

Testing Strategy

I ran a simple libtock-rs application for measuring syscall latency with
and without this change on opentitan on verilator. This change reduces the
latency of a command syscall by 2751 1711 cycles (the exact number of cycles
saved obviously depends on the actual implementation, e.g. would probably
be less after #1821).

Also ran make allcheck.

TODO or Help Wanted

I couldn't find any doc files or tests that should be updated. Since
this is my first real Tock PR, it would be great if you could point me
in the right direction if I missed anything.

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make formatall.

@alphan alphan marked this pull request as draft May 4, 2020 19:37
@alphan alphan changed the title [DRAFT] kernel: configure MPU only if switching to a different process kernel: configure MPU only if switching to a different process May 4, 2020
@alphan alphan marked this pull request as ready for review May 4, 2020 19:38
Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome! And thanks for the PR!

I don't think there are any docs or tests that need updating for this. Did your testing include running multiple apps as well?


Generally, I think this is a good change. However, I'm concerned there may be some extra considerations where we may need to identify the MPU configuration as "tainted". Specifically:

  • if there are additional grant allocations, then the deny region may need to grow (though, this is likely not an immediate issue as the allowed region is generally minimal)
  • if there was a change in permitted region, i.e. the application called brk or an IPC allow* then these need to be recalculated
  • Maybe others I'm not thinking of?

Basically, I think this is a good common-case optimization, but I think we also need a way for something to "poison" the MPU configuration to force it to be recalculated.

*though that second case may actually be fine since in the current architecture as it's always a different process that is allowing a region, so we'll go through regular process switch

kernel/src/sched.rs Outdated Show resolved Hide resolved
@bradjc
Copy link
Contributor

bradjc commented May 5, 2020

We would need to update the kernel/src/platform/mpu.rs documentation, particularly to clarify that the MPU implementation must be setup to only enforce in the userspace privilege level.

kernel/src/sched.rs Outdated Show resolved Hide resolved
@alphan
Copy link
Contributor Author

alphan commented May 5, 2020

Did your testing include running multiple apps as well?

Yes, I also tried this with multiple apps and didn't notice any issues.

Generally, I think this is a good change. However, I'm concerned there may be some extra considerations where we may need to identify the MPU configuration as "tainted". Specifically:

  • if there are additional grant allocations, then the deny region may need to grow (though, this is likely not an immediate issue as the allowed region is generally minimal)
  • if there was a change in permitted region, i.e. the application called brk or an IPC allow* then these need to be recalculated
  • Maybe others I'm not thinking of?

Basically, I think this is a good common-case optimization, but I think we also need a way for something to "poison" the MPU configuration to force it to be recalculated.

*though that second case may actually be fine since in the current architecture as it's always a different process that is allowing a region, so we'll go through regular process switch

Thank you for catching this! I actually looked at various places that can change the MPU config but missed at least one case where MPU config can be modified and not applied immediately, e.g. sbrk and brk eventually call configure_mpu() but alloc doesn't. Since cache invalidation is tricky, does it make sense to modify MPU and MPUConfig to keep track of the taintedness of MPUConfig at a higher level?

@ppannuto
Copy link
Member

ppannuto commented May 5, 2020

Since cache invalidation is tricky, does it make sense to modify MPU and MPUConfig to keep track of the taintedness of MPUConfig at a higher level?

Yeah, I think that's a great idea. I believe the mpu configuration structure is already well-equipped for this as all of its state is private and can only be modified by getters and setters; hooking the latter to mark the MPU configuration as dirty, and then perhaps updating configure_mpu to know what process it's configured for, pushing the hot-path logic into the MPU itself seems like a good design to me.

In that case, I actually imagine we could remove the configure_mpus that are sprinkled around, instead architecting things such that whenever a process is returned to the MPU layer is consulted first, updating itself if necessary.

bors bot added a commit that referenced this pull request May 8, 2020
1826: kernel: use MapCell for Process::stored_state r=bradjc a=alphan

### Pull Request Overview

Currently, `Process::stored_state` is defined as a `Cell`. This change
aims to improve syscall latency by using `MapCell` instead.

See also: #1730, #178.

### Testing Strategy

I ran the same `libtock-rs` app used to test #1822 with and without this
change on opentitan on verilator. This change saves ~1200 cycles for
a `command` syscall.

### TODO or Help Wanted

The main goal of `MapCell` is to avoid panics at runtime, but I'm not sure
if it is OK to silently continue in cases where `Process::stored_state`can 
_possibly_ be empty, which should never happen. The current
implementation exclusively uses `Cell::get`, but a `take` could always
leave a `default` `StoredState` behind so this change doesn't seem to be
worse in that regard. I tried to preserve the current behavior but please let
me know if I missed anything. 

### Documentation Updated

- [X] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make formatall`.


Co-authored-by: Alphan Ulusoy <alphan@google.com>
@alphan alphan mentioned this pull request May 19, 2020
3 tasks
Currently, the scheduler configures the MPU every time when it switches
to a process. This change aims to improve syscall latency by caching the
last running process and configuring the MPU only when switching to a
different process.

See also: tock#1730.
@alphan
Copy link
Contributor Author

alphan commented May 19, 2020

  • Reverted changes in kernel/src/sched.rs.
  • Added app_id parameter to configure_mpu.
  • Implemented the idea captured in this PR as a risc-v-only optimization in arch/rv32i/src/pmp.rs.
  • Rebased on master.
  • Also made disable_mpu a no-op for a more accurate latency measurement (moved its contents to PMPConfig::clear_mpu_registers, called from configure_mpu to stay close to current behavior). It looks like we don't need this function at all but I am deferring this change to Possible RISC-V PMP improvements #1860, which I created to keep track of the things that I noticed while working on this.

ppannuto
ppannuto previously approved these changes May 19, 2020
Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

I added the implementation for ARM as well. It's a bit lighter weight there as ARM already largely did this by cacheing the region calculation result. However, this is still a case where we can trade 8 memory writes for 1 memory compare on the hot path, so likely worth it.


Re #1654: This adds a feature from the upcoming Option API. It's trivial to re-implement ourselves if the manages to become the last blocking feature, so I think there's little harm in using the official API for now.

alistair23
alistair23 previously approved these changes May 19, 2020
@alphan
Copy link
Contributor Author

alphan commented May 20, 2020

@ppannuto, thank you for reviewing this change and your comments. I have a question regarding the changes in arch/cortex-m3/src/mpu.rs. Shouldn't hardware_is_configured_for be defined in MPU (to track the state of the hw)? Also, don't we need a is_dirty field in CortexMConfig?

@ppannuto
Copy link
Member

Hmm.., you're right we need some form of structure that is the global to the MPU hardware. The MPU configuration is stored per process (here), so doing dirty tracking in the MpuConfig type won't be sufficient for resetting the MPU when we change processes.

I don't think the CortexMConfig needs an explicit dirty flag. Instead, the cell indicates what application the hardware "is configured for"; whenever the MPU configuration is changed, that cell is cleared to indicate that the MPU is not configured correctly for any application, which acts similarly to a separate dirty signal.

@alphan
Copy link
Contributor Author

alphan commented May 20, 2020

The MPU struct that I was referring to is the one defined here. IIUC, this should track the state of the hardware, i.e. hardware_is_configured_for, while CortexMConfig tracks if it is dirty or not.

@alphan alphan requested a review from bradjc May 20, 2020 22:19
Not really sure what I was thinking the first time, but this now correctly
tracks the hardware configuration in the hardware structure and the per-process
MPU configuration in the process config structure, and will update the hardware
if either don't match.

This also adds some documentation to these structures to save me from myself
in the future.
@alphan
Copy link
Contributor Author

alphan commented May 26, 2020

@ppannuto, 121b33f LGTM, thanks! I also pushed a commit to fix the way last_configured_for is used in configure_mpu.

ppannuto
ppannuto previously approved these changes May 26, 2020
Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me; looking forward to the rest of the #1860 series.

arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
@alphan
Copy link
Contributor Author

alphan commented May 26, 2020

Reverted changes in disable_mpu to address @alistair23's comments. We should see the expected latency improvements when do_process stops clearing PMP registers every time it returns to kernel (#1873).

arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
alistair23
alistair23 previously approved these changes May 26, 2020
arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
arch/rv32i/src/pmp.rs Outdated Show resolved Hide resolved
bradjc
bradjc previously approved these changes May 28, 2020
@alphan alphan dismissed stale reviews from bradjc and alistair23 via 6ba5561 June 1, 2020 18:51
@alphan
Copy link
Contributor Author

alphan commented Jun 1, 2020

Thank you for reviewing this @bradjc and @silvestrst!

Added comments to address @silvestrst's comments.

@bradjc bradjc added the last-call Final review period for a pull request. label Jun 2, 2020
@ppannuto
Copy link
Member

ppannuto commented Jun 4, 2020

bors r+

@bradjc
Copy link
Contributor

bradjc commented Jun 4, 2020

big queue i guess?

@bradjc bradjc merged commit a172448 into tock:master Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
last-call Final review period for a pull request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants