Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Add trait for process restart policies #1565

Merged
merged 3 commits into from
Feb 25, 2020
Merged

Conversation

bradjc
Copy link
Contributor

@bradjc bradjc commented Jan 27, 2020

If a process's fault response is set to "Restart", then currently that process is unconditionally restarted if it crashes. This PR adds a trait so that boards can specify the policy the kernel should use when deciding whether to restart an app or not. There is an Always option to match what we have now, and I also implemented a very simple threshold approach where an app will only be restarted a certain number of times.

This is part of #1082.

Testing Strategy

I tested this earlier, but need to test with the recent PR merges.

TODO or Help Wanted

Thoughts?

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make formatall.

@bradjc bradjc added the kernel label Jan 27, 2020
Copy link
Member

@alevy alevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor inline naming nits.

Overall, I wonder if this is better to do on a per-process/app basis, rather than an entire kernel. Wouldn't it make more sense to have different policies for different apps than to have a fixed policy for an entire board?


/// Implementation of `ProcessRestartPolicy` that unconditionally restarts the
/// app.
pub struct PrpAlways {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just naming it "AlwaysRestart"?

/// Implementation of `ProcessRestartPolicy` that uses a threshold to decide
/// whether to restart an app. If the app has been restarted more times than the
/// threshold then the app will no longer be restarted.
pub struct PrpThreshold {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "ThresholdRestart"?

/// Decide whether to restart the `process` or not.
///
/// Returns `true` if the process should be restarted, `false` otherwise.
fn evaluate(&self, process: &dyn ProcessType) -> bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type signature/name is kind of clunky. If it should return a bool (indicating whether the process should be restarted, maybe it the method should be called should_restart or something.

Alternatively, maybe call it fault_behavior have it return a specialized enum:

enum FaultBehavior {
  Restart,
  Terminate
}

///
/// The provided restart policy is used to determine whether to reset the
/// app, and can be specified on a per-app basis.
Restart(&'static dyn ProcessRestartPolicy),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this is going to add two words of memory to the kernel, regardless of whether this variant is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will add two words to the process, which we charge to the process's memory region.

kernel/src/process.rs Show resolved Hide resolved
@bradjc
Copy link
Contributor Author

bradjc commented Jan 27, 2020

This is in fact already per-process because yes, that likely makes more sense. The limiting factor is the shared load_processes() function uses the same FaultResponse policy for every app it finds, but the FaultResponse is stored per-process.

@alevy
Copy link
Member

alevy commented Jan 27, 2020

@bradjc, OK missed that, my bad.

So now it's an additional word in memory for each process (not a huge deal), and my only comments are about the name/type-signature

@bradjc bradjc force-pushed the process-restart-policy branch 2 times, most recently from 25b1279 to 0a887c5 Compare January 29, 2020 21:51
@bradjc
Copy link
Contributor Author

bradjc commented Jan 29, 2020

Added a new commit which updates the naming.

alevy
alevy previously approved these changes Jan 29, 2020
ppannuto
ppannuto previously approved these changes Feb 4, 2020
@bradjc
Copy link
Contributor Author

bradjc commented Feb 15, 2020

Other thoughts?

hudson-ayers
hudson-ayers previously approved these changes Feb 21, 2020
Copy link
Contributor

@hudson-ayers hudson-ayers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me. Included a few nits that you can address if you like or ignore, I don't think any of them are really worth blocking on if you want to get this merged.

kernel/src/process.rs Outdated Show resolved Hide resolved
kernel/src/process.rs Outdated Show resolved Hide resolved
kernel/src/process.rs Show resolved Hide resolved
This adds a trait so that boards can specify the policy the kernel
should use when deciding whether to restart an app or not.
Since restart_count is used in decision making, and not just debugging,
it doesn't make sense to leaave it as a debug variable. This promotes it
to an actual field in the process struct.

Also, remove re-setting the process state if syscall init fails.
@bradjc
Copy link
Contributor Author

bradjc commented Feb 21, 2020

I moved the restart count out of debugging since it is now used for more than just debugging. Also rebased.

Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@bors
Copy link
Contributor

bors bot commented Feb 25, 2020

Build succeeded

@bors bors bot merged commit c6ad079 into master Feb 25, 2020
@bors bors bot deleted the process-restart-policy branch February 25, 2020 02:19
bors bot added a commit that referenced this pull request Mar 3, 2020
1609: ADC Capsule: More work on supporting restarts. r=alevy a=bradjc

### Pull Request Overview

This pull request continues to update the ADC capsule to allow applications to restart while using the ADC driver and for the system to continue working afterwards.

Highlights of changes:

- Handle `InactiveApp` error when checking if a grant region is valid.
- Check on `sample_ready()` and `samples_ready()` if the app is still valid, and properly handle it if not.
- Check on every command syscall if the app that has exclusive control of the adc driver still exists.
- Call stop sampling if a callback from the adc peripheral driver occurs and the app is no longer valid.

Note, if/when the ADC capsule is virtualized then this will no longer be an issue.


### Testing Strategy

I ran the adc and adc_continuous apps numerous times and used the process console to cause them to fault in various stages of their operation. I can now no longer get the ADC capsule into a state where apps only error.


### TODO or Help Wanted

I tested this with a combination of my other PRs #1565 #1566 #1588. Until those are merged this PR/commit alone will not work.


### Documentation Updated

- [x] Updated the relevant files in `/docs`, or no updates are required.

### Formatting

- [x] Ran `make formatall`.


Co-authored-by: Brad Campbell <bradjc5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants