Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement `UserspaceKernelBoundary` version 2 #1318

Merged
merged 5 commits into from Jul 22, 2019

Conversation

@bradjc
Copy link
Contributor

commented Jun 4, 2019

Pull Request Overview

This pull request is the answer to the question we had a while ago of "Are we confident in the UserlandKernelBoundary trait if we only support one architecture?" with the answer of "no".

After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make UserlandKernelBoundary more generic, this PR makes two general changes to the trait.

  1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic.

  2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having pop_stack_frame() at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield().

    What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code.

Other changes:

  • This PR also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.
  • If UKB returns that the app called a syscall, then it must provide a valid syscall. If it cannot create a valid syscall (perhaps because the app is buggy) then it must return that the process faulted. Before, I believe, we just ignored invalid syscalls and allowed the app to proceed after effectively not doing anything. I think this is an improvement, but perhaps the old behavior was intentional.
  • This adds a new process state type called Unstarted. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the first time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one.

This PR also includes a patch to the cortex-m arch to support the new version of UKB.

The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to.

In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows.

The other change is we now set state.psr and state.yield_pc after every syscall, and not just effectively after yield(). I don't think this has any effect.

Testing Strategy

This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed....

TODO or Help Wanted

We can probably wait to merge this until after context switching is fully working for RISC-V. I think these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features.

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make formatall.

@bradjc bradjc changed the title Implement `UserlandKernelBoundary` version 2 Implement `UserspaceKernelBoundary` version 2 Jun 4, 2019

@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2019

This is the relevant new interface:

tock/kernel/src/syscall.rs

Lines 52 to 146 in 3e1f738

/// Why the process stopped executing and execution returned to the kernel.
#[derive(PartialEq)]
pub enum ContextSwitchReason {
/// Process called a syscall. Also returns the syscall and relevant values.
SyscallFired { syscall: Syscall },
/// Process triggered the hardfault handler.
Fault,
/// Process exceeded its timeslice.
TimesliceExpired,
/// Process interrupted (e.g. by a hardware event)
Interrupted,
}
/// This trait must be implemented by the architecture of the chip Tock is
/// running on. It allows the kernel to manage switching to and from processes
/// in an architecture-agnostic manner.
pub trait UserspaceKernelBoundary {
/// Some architecture-specific struct containing per-process state that must
/// be kept while the process is not running. For example, for keeping CPU
/// registers that aren't stored on the stack.
type StoredState: Default + Copy;
/// Set the return value the process should see when it begins executing
/// again after the syscall. This will only be called after a process has
/// called a syscall.
///
/// To help implementations, both the current stack pointer of the process
/// and the saved state for the process are provided. The `return_value` is
/// the value that should be passed to the process so that when it resumes
/// executing it knows the return value of the syscall it called.
unsafe fn set_syscall_return_value(
&self,
stack_pointer: *const usize,
state: &mut Self::StoredState,
return_value: isize,
);
/// Set the function that the process should execute when it is resumed.
/// This has two major uses: 1) sets up the initial function call to
/// `_start` when the process is started for the very first time; 2) tells
/// the process to execute a callback function after calling `yield()`.
///
/// ### Arguments
///
/// - `stack_pointer` is the address of the stack pointer for the current
/// app.
/// - `remaining_stack_memory` is the number of bytes below the
/// `stack_pointer` that is allocated for the process. This value is
/// checked by the implementer to ensure that there is room for this stack
/// frame without overflowing the stack.
/// - `state` is the stored state for this process.
/// - `callback` is the function that should be executed when the process
/// resumes.
/// - `first_function` is true if this is the first time this process is
/// being run. This allows a `UserspaceKernelBoundary` implementation to
/// assume there are no stack frames on the process's stack.
///
/// ### Return
///
/// Returns `Ok` or `Err` with the current address of the stack pointer for
/// the process. One reason for returning `Err` is that adding the function
/// call requires adding to the stack, and there is insufficient room on the
/// stack to add the function call.
unsafe fn set_process_function(
&self,
stack_pointer: *const usize,
remaining_stack_memory: usize,
state: &mut Self::StoredState,
callback: process::FunctionCall,
first_function: bool,
) -> Result<*mut usize, *mut usize>;
/// Context switch to a specific process.
///
/// This returns a tuple:
/// - The new stack pointer address of the process.
/// - Why the process stopped executing and switched back to the kernel.
unsafe fn switch_to_process(
&self,
stack_pointer: *const usize,
state: &mut Self::StoredState,
) -> (*mut usize, ContextSwitchReason);
/// Display any general information about the fault.
unsafe fn fault_fmt(&self, writer: &mut Write);
/// Display architecture specific (e.g. CPU registers or status flags) data
/// for a process identified by its stack pointer.
unsafe fn process_detail_fmt(
&self,
stack_pointer: *const usize,
state: &Self::StoredState,
writer: &mut Write,
);
}

bradjc added some commits Jun 3, 2019

kernel: UserlandKernelBoundary v2
The initial version of the `UserlandKernelBoundary` trait abstracted
details about context switching on cortex-m platforms out of process.rs,
however, the trait was very much designed around the structure of how
cortex-m handles switching between privilege modes. After working on
RISC-V context switching, it is clear some of the interface does not map
to RISC-V very well.

To make `UserlandKernelBoundary` more generic, this commit makes two
general changes to the trait.

1. Rather than having a separate "getter" for retreiving which syscall
was actually called after learning that the process stopped running
because it called a syscall, we now return the syscall information
directly with the context switch reason. Arguably, this is how it should
have been done before, and this change actually has nothing to do with
making UKB more generic.

2. The Cortex-M context switch uses a special stack frame created by the
SVC call. This approach is not used in RISC-V. Therefore, the
terminology around "popping" and "pushing" stack frames doesn't make
sense in all cases. Also, having `pop_stack_frame()` at all was a bit of
blur between the kernel loop and the context switch interface details as
pop was only called with a yield().

What is generic is the idea that we want to be able to call a function in
the process that the process should execute when it starts (or resumes)
running. That function takes over the old "push" function. "pop" has
been removed. I believe that this new interface will be implementable in
RISC-V, as well as Cortex-M, and actually improves the boundary between
the kernel loop and the context switching code.

This commit also then updates process.rs and sched.rs to use this new
interface. This required some minor changes to process.rs as these
functions are essentially passed through process.rs from the kernel
sched loop to the UKB code.
cortex-m: update syscall to UKB v2
For portability reasons the syscall interface has been updated, and this
commit implements it for cortex-m.

The major change is that we no longer remove the SVC stack frame after a
yield call. In fact, the API for popping stack frames has been removed.
Instead, when we want the process to run a new function, we re-use the
old SVC stack frame. In the case that the app has never run before we do
have to create the stack frame just like we used to.

In theory, this is actually more efficient since we save the effort of
removing the stack frame, but that wasn't very substancial so who knows.

The other change is we now set state.psr and state.yield_pc after every
syscall, and not just effectively after yield(). I don't think this has
any effect.

@bradjc bradjc force-pushed the syscall-v2 branch from 3e1f738 to 1f5902d Jun 11, 2019

@bradjc bradjc force-pushed the syscall-v2 branch from 1f5902d to 42c48a5 Jun 11, 2019

@bradjc bradjc referenced this pull request Jun 11, 2019

Merged

RISC-V: Add context switching #1323

2 of 2 tasks complete
@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jun 28, 2019

Tested on hail, multiple apps (hail, console, hello_loop, crash_dummy) seem to work fine.

@ppannuto

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

I think it's a bit weird to include first_function in the UKB interface, when really it's a Cortex-M specific thing to need to know that. What do you think of moving that information in to the Cortex-M state instead: https://github.com/tock/tock/compare/syscall-v2...remove_first_function?expand=1 ?

I did leave the Unstarted state, as I could see that being useful for debugging, and is conceptually different that Yielded, but wouldn't be opposed to removing it either

(tested with the basics: blink/c_hello/hello_loop; worked fine)


Edit: 👍 to everything else

@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

I think it's a bit weird to include first_function in the UKB interface, when really it's a Cortex-M specific thing to need to know that.

Well, we use it in the RISC-V implementation as well as there is a architecture-independent difference between the starting an app for the first time and resuming an app because only in the latter case does it have any state on the stack or in registers. However, I think only the kernel knows this, and as such needs to pass it to the UKB implementation. What happens in your implementation if an app is restarted?

@ppannuto

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

Hmm, so, my thinking here is definitely influenced by having just read the HotOS fork paper, but I'm wondering if the idea of (literally) restarting a process isn't a good one. It means that we would have to implement code that is able to correctly undo everything a process has ever done to its kernel state, which feels complicated and hard to get right.

I think it architecturally makes more sense to implement "Restart" behavior by completely tearing down the old process and then creating a new one, rather than trying to re-use anything.

Now, all of this is lives on the enormous caveat that currently we have don't have a way of destroying processes, but they are dynamically allocated at least, so at least half of the hard work is already done. Towards the mission of restart-able processes, I think we should focus effort on implementing tear-down, rather than thinking about code to "restart" an existing process.


To the question then of first_function, I think what we're really saying here is that conceptually there are chip/architecture-specific things that need to be part of process creation. What do you think of this interface, which says that more explicitly: https://github.com/tock/tock/compare/syscall-v2...syscall-new-process-interface?expand=1

Conceptually I think this is a bit nicer as it moves chip-specific process creation to part of the process creation logic rather than having it co-mingled with regular execution.


A side concern with (both) interfaces as-presented is the assumption that the only initialization stuff that will be necessary is stack related. Maybe that's sufficient. I played with a variation that passed a reference to the Process object itself, but in that case initialize_new_process can't do anything interesting because all of the Process fields are private. Probably an issue for v3 whenever something proves it needs access to more.

@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

To the question then of first_function, I think what we're really saying here is that conceptually there are chip/architecture-specific things that need to be part of process creation. What do you think of this interface, which says that more explicitly: https://github.com/tock/tock/compare/syscall-v2...syscall-new-process-interface?expand=1

I like this interface, and I agree it's better to be explicit about the functionality.

I think it architecturally makes more sense to implement "Restart" behavior by completely tearing down the old process and then creating a new one, rather than trying to re-use anything.

That would make that feature easier to maintain as Process changes. It does, however, incur a cost on each restart because everything has to be re-scanned and re-calculated. Additionally, I think it will make keeping per-process kernel state (e.g. how many times the process has been restarted) clumsy. I wonder if there isn't some hybrid approach, where all of the per-execution state can be encapsulated and then cleanly erased when the process crashes/ends, but all of the per-process state is preserved.

A side concern with (both) interfaces as-presented is the assumption that the only initialization stuff that will be necessary is stack related. ... Probably an issue for v3 whenever something proves it needs access to more.

Well that function gets StoredState as well, so it has access to the registers too. But in general I agree we can revise that interface if some other architecture shows it to be necessary in the future.

ppannuto added some commits Jul 11, 2019

syscall: documentation around process fn injection
Add an explicit note about the hazard of funciton injection, namely that
it will override any kernel return value (as the injected function will
run first, and its return value will overwrite any kernel return value
that has been set).

In practice, this means that process function injection is only safe to
call in response to syscalls without return values (i.e. only `yield`
in the current syscall interface).

@ppannuto ppannuto force-pushed the syscall-v2 branch from 483faea to c0b27f2 Jul 11, 2019

@ppannuto
Copy link
Member

left a comment

This is good to go for me at this point, but should have a few more eyes on it before hitting the button

@alevy alevy self-assigned this Jul 12, 2019

@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 20, 2019

We didn't get to this in the call I don't think, but maybe it's time to move ahead with this? We have the release testing to help find if there are any issues, and the API changes are pretty internal and unlikely to affect code outside of this repo.

@alevy

This comment has been minimized.

Copy link
Member

commented Jul 20, 2019

I agree @bradjc. If you want to wait for me to look through the code tomorrow. I'm also happy to put a timer on it, where if I don't comment in the next day or so, it counts as approving.

@bradjc

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

bors r+

bors bot added a commit that referenced this pull request Jul 22, 2019

Merge #1318
1318: Implement `UserspaceKernelBoundary` version 2 r=bradjc a=bradjc

### Pull Request Overview

This pull request is the answer to the question we had a while ago of "Are we confident in the `UserlandKernelBoundary` trait if we only support one architecture?" with the answer of "no".

After working on RISC-V context switching, it is clear some of the interface does not map to RISC-V very well. To make `UserlandKernelBoundary` more generic, this PR makes two general changes to the trait.

1. Rather than having a separate "getter" for retreiving which syscall was actually called after learning that the process stopped running because it called a syscall, we now return the syscall information directly with the context switch reason. Arguably, this is how it should have been done before, and this change actually has nothing to do with making UKB more generic.

2. The Cortex-M context switch uses a special stack frame created by the SVC call. This approach is not used in RISC-V. Therefore, the terminology around "popping" and "pushing" stack frames doesn't make sense in all cases. Also, having `pop_stack_frame()` at all was a bit of blur between the kernel loop and the context switch interface details as pop was only called with a yield().

    What is generic is the idea that we want to be able to call a function in the process that the process should execute when it starts (or resumes) running. That function takes over the old "push" function. "pop" has been removed. I believe that this new interface will be implementable in RISC-V, as well as Cortex-M, and actually improves the boundary between the kernel loop and the context switching code.

Other changes:

- This PR also then updates process.rs and sched.rs to use this new interface. This required some minor changes to process.rs as these functions are essentially passed through process.rs from the kernel sched loop to the UKB code.
- If UKB returns that the app called a syscall, then it must provide a valid syscall. If it cannot create a valid syscall (perhaps because the app is buggy) then it must return that the process faulted. Before, I believe, we just ignored invalid syscalls and allowed the app to proceed after effectively not doing anything. I think this is an improvement, but perhaps the old behavior was intentional.
- This adds a new process state type called `Unstarted`. This is needed (well it is one way to address the issue I ran into) to allow process.rs to know if calling a function call for a process will be the _first_ time that process has been executed. The cortex-m syscall code needs to know this because in that case it needs to create a new SVC stack frame rather than re-use an old one.


This PR also includes a patch to the cortex-m arch to support the new version of UKB.

The major change is that we no longer remove the SVC stack frame after a yield call. In fact, the API for popping stack frames has been removed. Instead, when we want the process to run a new function, we re-use the old SVC stack frame. In the case that the app has never run before we do have to create the stack frame just like we used to.

In theory, this is actually more efficient since we save the effort of removing the stack frame, but that wasn't very substantial so who knows.

The other change is we now set `state.psr` and `state.yield_pc` after every syscall, and not just effectively after yield(). I don't think this has any effect.

### Testing Strategy

This pull request was tested by running a few apps on hail. Given what happened last time I started messing around with this code, more is needed....


### TODO or Help Wanted

We can probably wait to merge this until after context switching is fully working for RISC-V. I _think_ these changes will be sufficient to support RISC-V, but there may be more that is needed. I did want to get this code in front of people, however, since it touches some low-level kernel features.


### Documentation Updated

- [ ] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make formatall`.


Co-authored-by: Brad Campbell <bradjc5@gmail.com>
Co-authored-by: Pat Pannuto <pat.pannuto@gmail.com>
@bors

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@bors bors bot merged commit c0b27f2 into master Jul 22, 2019

4 checks passed

bors Build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
deploy/netlify Deploy preview ready!
Details

@bors bors bot deleted the syscall-v2 branch Jul 22, 2019

bors bot added a commit that referenced this pull request Aug 2, 2019

Merge #1323
1323: RISC-V: Add context switching r=bradjc a=bradjc

### Pull Request Overview

This pull request implements the `UserlandKernelBoundary` trait for RISC-V platforms.

This PR is towards tracking issue #1135.

This depends on #1318 so those commits have been included here. That PR should be merged first, and then I can update this one.

While I pulled things together for the PR, @sv2bb did much of this.

### Testing Strategy

This pull request was tested by running blink and c_hello on the the arty-e21 FPGA based board.


### TODO or Help Wanted

See inline comments.


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make formatall`.


Co-authored-by: Brad Campbell <bradjc5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.