Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kernel work counting when processes are stopped or faulted #2117

Merged
merged 1 commit into from Sep 25, 2020

Conversation

hudson-ayers
Copy link
Contributor

Pull Request Overview

This pull request fixes a bug (identified by @bradjc ) where when a process in the Running state faults, the scheduler would hang, as the kernel work counter is not decremented appropriately. This bug existed prior to the new scheduling interface, but before manifested as the board never going to sleep, rather than hanging outright.

It also fixes a similar set of bugs where a process being Stopped or terminated when in the Running state would cause the scheduler to hang, for basically the same reason.

This PR introduces a new wrapper around Cell<State> called ProcessStateCell that takes care of incrementing and decrementing the kernel work counter based on the transitions between process states that occur. Hopefully, this should prevent similar bugs from cropping up in the future.

Testing Strategy

This pull request was tested by:

  • running the crash_dummy app and verifying that other apps + the process console work after the app crashes

  • faulting the whileone app while it is executing and verifying that doing so does not freeze a simultaneously running blink app

  • stopping the whileone app while it is executing and verifying that doing so does not freeze a simultaneously running blink app, and then starting the app again and verifying that works correctly

all three of these cases fail on current master. I also ran 4 apps simultaneously just as a tiny stress test and that worked fine.

TODO or Help Wanted

Thoughts on naming? Should update be kept as set, so that the wrapper appears identical to a Cell?

Documentation Updated

  • No updates are required.

Formatting

  • Ran make prepush.

@hudson-ayers hudson-ayers added bug release-blocker Issue or PR that must be resolved before the next release labels Sep 18, 2020
Copy link
Contributor

@bradjc bradjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Writing an OS is hard. I should keep track of this PR for when I warn people about the dangers of duplicating state.

Copy link
Member

@ppannuto ppannuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is strictly better that what was there before. At this point, we're at a point where the only place that {incr/decr}ement_work() is valid is inside this process state cell* -- it would be nice to find a way, perhaps with visibility somehow, to more strictly enforce that.

*The other place is increment_work_external, but that's an exceptional path anyway, so building a new/different exception if needed makes sense to me

@hudson-ayers
Copy link
Contributor Author

At this point, we're at a point where the only place that {incr/decr}ement_work() is valid is inside this process state cell* -- it would be nice to find a way, perhaps with visibility somehow, to more strictly enforce that.

*The other place is increment_work_external, but that's an exceptional path anyway, so building a new/different exception if needed makes sense to me

Unfortunately {incr/decr}ement_work() still must be called manually within process.rs whenever tasks are added/removed from the tasks queue. This could probably be resolved by forcing all modifications to the tasks queue to use the enqueue_task() / dequeue_task() functions, which is not currently the case.

@bradjc bradjc added the last-call Final review period for a pull request. label Sep 24, 2020
@bradjc
Copy link
Contributor

bradjc commented Sep 25, 2020

bors r+

@bors bors bot merged commit 4746bb6 into master Sep 25, 2020
@bors bors bot deleted the fix-process-state-tracking branch September 25, 2020 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug last-call Final review period for a pull request. release-blocker Issue or PR that must be resolved before the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants