Skip to content

Commit

Permalink
Merge #2117
Browse files Browse the repository at this point in the history
2117: Fix kernel work counting when processes are stopped or faulted r=bradjc a=hudson-ayers

### Pull Request Overview

This pull request fixes a bug (identified by @bradjc ) where when a process in the Running state faults, the scheduler would hang, as the kernel work counter is not decremented appropriately. This bug existed prior to the new scheduling interface, but before manifested as the board never going to sleep, rather than hanging outright.

It also fixes a similar set of bugs where a process being Stopped or terminated when in the `Running` state would cause the scheduler to hang, for basically the same reason.

This PR introduces a new wrapper around `Cell<State>` called `ProcessStateCell` that takes care of incrementing and decrementing the kernel work counter based on the transitions between process states that occur. Hopefully, this should prevent similar bugs from cropping up in the future.


### Testing Strategy

This pull request was tested by:

- running the `crash_dummy` app and verifying that other apps + the process console work after the app crashes

- faulting the `whileone` app while it is executing and verifying that doing so does not freeze a simultaneously running `blink` app

- stopping the `whileone` app while it is executing and verifying that doing so does not freeze a simultaneously running `blink` app, and then starting the app again and verifying that works correctly

all three of these cases fail on current master. I also ran 4 apps simultaneously just as a tiny stress test and that worked fine.

### TODO or Help Wanted

Thoughts on naming? Should update be kept as set, so that the wrapper appears identical to a Cell?

### Documentation Updated

- [x] No updates are required.

### Formatting

- [x] Ran `make prepush`.


Co-authored-by: Hudson Ayers <hayers@stanford.edu>
  • Loading branch information
bors[bot] and hudson-ayers committed Sep 25, 2020
2 parents 3d08c2d + 8df7273 commit 4746bb6
Showing 1 changed file with 44 additions and 19 deletions.
63 changes: 44 additions & 19 deletions kernel/src/process.rs
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,37 @@ pub enum State {
Unstarted,
}

/// A wrapper around `Cell<State>` is used by `Process` to prevent bugs arising from
/// the state duplication in the kernel work tracking and process state tracking.
struct ProcessStateCell<'a> {
state: Cell<State>,
kernel: &'a Kernel,
}

impl<'a> ProcessStateCell<'a> {
fn new(kernel: &'a Kernel) -> Self {
Self {
state: Cell::new(State::Unstarted),
kernel,
}
}

fn get(&self) -> State {
self.state.get()
}

fn update(&self, new_state: State) {
let old_state = self.state.get();

if old_state == State::Running && new_state != State::Running {
self.kernel.decrement_work();
} else if new_state == State::Running && old_state != State::Running {
self.kernel.increment_work()
}
self.state.set(new_state);
}
}

/// The reaction the kernel should take when an app encounters a fault.
///
/// When an exception occurs during an app's execution (a common example is an
Expand Down Expand Up @@ -830,7 +861,7 @@ pub struct Process<'a, C: 'static + Chip> {
/// `Running` and `Yielded` states. The system can control the process by
/// switching it to a "stopped" state to prevent the scheduler from
/// scheduling it.
state: Cell<State>,
state: ProcessStateCell<'static>,

/// How to deal with Faults occurring in the process
fault_response: FaultResponse,
Expand Down Expand Up @@ -927,29 +958,28 @@ impl<C: Chip> ProcessType for Process<'_, C> {

fn set_yielded_state(&self) {
if self.state.get() == State::Running {
self.state.set(State::Yielded);
self.kernel.decrement_work();
self.state.update(State::Yielded);
}
}

fn stop(&self) {
match self.state.get() {
State::Running => self.state.set(State::StoppedRunning),
State::Yielded => self.state.set(State::StoppedYielded),
State::Running => self.state.update(State::StoppedRunning),
State::Yielded => self.state.update(State::StoppedYielded),
_ => {} // Do nothing
}
}

fn resume(&self) {
match self.state.get() {
State::StoppedRunning => self.state.set(State::Running),
State::StoppedYielded => self.state.set(State::Yielded),
State::StoppedRunning => self.state.update(State::Running),
State::StoppedYielded => self.state.update(State::Yielded),
_ => {} // Do nothing
}
}

fn set_fault_state(&self) {
self.state.set(State::Fault);
self.state.update(State::Fault);

match self.fault_response {
FaultResponse::Panic => {
Expand Down Expand Up @@ -1266,14 +1296,9 @@ impl<C: Chip> ProcessType for Process<'_, C> {
// set and should mark that this process is ready to be
// scheduled.

// We just setup up a new callback to do, which means this
// process wants to execute, so we set that there is work to
// be done.
self.kernel.increment_work();

// Move this process to the "running" state so the scheduler
// will schedule it.
self.state.set(State::Running);
self.state.update(State::Running);

// Update helpful debugging metadata.
self.current_stack_pointer.set(stack_bottom as *mut u8);
Expand Down Expand Up @@ -1887,7 +1912,8 @@ impl<C: 'static + Chip> Process<'_, C> {
process.flash = app_flash;

process.stored_state = MapCell::new(Default::default());
process.state = Cell::new(State::Unstarted);
// Mark this process as unstarted
process.state = ProcessStateCell::new(process.kernel);
process.fault_response = fault_response;
process.restart_count = Cell::new(0);

Expand Down Expand Up @@ -1956,7 +1982,6 @@ impl<C: 'static + Chip> Process<'_, C> {
}
};

// Mark this process as having something to do (it has to start!).
kernel.increment_work();

// Return the process object and a remaining memory for processes slice.
Expand Down Expand Up @@ -1988,7 +2013,7 @@ impl<C: 'static + Chip> Process<'_, C> {
self.terminate();

// Set the state the process will be in if it cannot be restarted.
self.state.set(failure_state);
self.state.update(failure_state);

// Check if the restart policy for this app allows us to continue with
// the restart.
Expand Down Expand Up @@ -2080,7 +2105,7 @@ impl<C: 'static + Chip> Process<'_, C> {
let flash_app_start = app_flash_address as usize + flash_protected_size;

// Mark the state as `Unstarted` for the scheduler.
self.state.set(State::Unstarted);
self.state.update(State::Unstarted);

// Mark that we restarted this process.
self.restart_count.increment();
Expand Down Expand Up @@ -2126,7 +2151,7 @@ impl<C: 'static + Chip> Process<'_, C> {
}

// Mark the app as stopped so the scheduler won't try to run it.
self.state.set(State::StoppedFaulted);
self.state.update(State::StoppedFaulted);
}

/// Get the current stack pointer as a pointer.
Expand Down

0 comments on commit 4746bb6

Please sign in to comment.