Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change scheduler to only store runnable tasks on run queue #1042

Conversation

tsoutsman
Copy link
Member

@tsoutsman tsoutsman commented Sep 16, 2023

Note that this inherently adds task migration as when a task is unblocked it can be added to any core.

Depends on #1035, #1044, #1045.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
There are no guarantees about the value of the interrupt flag when
context switching. If the context switch is voluntary, i.e. a thread
called `schedule`, interrupts will most likely be enabled, whereas
if a thread is preempted, interrupts will be disabled. But this means
that if a preempted thread A switches to a thread B that voluntarily
yielded, thread B will return from the call to `schedule` with
interrupts disabled.

The AArch64 code also needs to be modified but I'll leave that to
@NathanRoyer.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
The new test is significantly more robust than the old one. As of right
now, the test isn't particularly useful because we don't have task
migration, but theseus-os#1042 adds implicit task migration when unblocking a
task. Hence, the test has a focus on blocking and unblocking tasks.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@tsoutsman tsoutsman mentioned this pull request Sep 16, 2023
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
…n-queue

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@tsoutsman tsoutsman marked this pull request as draft September 16, 2023 13:34
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@tsoutsman tsoutsman marked this pull request as ready for review September 16, 2023 22:20
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
When spawning a pinned task, `spawn` didn't previously set
`inner.pinned_cpu`. This created problems in theseus-os#1042 because the scheduler
didn't know that tasks were pinned and freely migrated them across
cores.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
…ueue

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@tsoutsman tsoutsman mentioned this pull request Sep 17, 2023
kevinaboos pushed a commit that referenced this pull request Sep 19, 2023
* When spawning a pinned task, `spawn` didn't previously set
  `inner.pinned_cpu` for the newly-created `Task`.

* This is not currently a problem because the scheduler doesn't perform
  task migration across CPUs, but when that gets enabled (in #1042),
  it would cause the pinning choice to be ignore by the scheduler.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
github-actions bot pushed a commit that referenced this pull request Sep 19, 2023
* When spawning a pinned task, `spawn` didn't previously set
  `inner.pinned_cpu` for the newly-created `Task`.

* This is not currently a problem because the scheduler doesn't perform
  task migration across CPUs, but when that gets enabled (in #1042),
  it would cause the pinning choice to be ignore by the scheduler.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com> cae8ca8
kevinaboos pushed a commit that referenced this pull request Oct 5, 2023
* The new `test_scheduler` is significantly more robust than
  the old one. Currently, the test isn't particularly useful because
  we don't have task migration enabled, but #1042 will add
  implicit task migration when unblocking a task.
  * Hence, the test currently focuses on blocking/unblocking tasks.

* Add a function to iterate over all initialized CPUs.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@kevinaboos
Copy link
Member

all PR dependencies have been merged in. 👍

github-actions bot pushed a commit that referenced this pull request Oct 5, 2023
* The new `test_scheduler` is significantly more robust than
  the old one. Currently, the test isn't particularly useful because
  we don't have task migration enabled, but #1042 will add
  implicit task migration when unblocking a task.
  * Hence, the test currently focuses on blocking/unblocking tasks.

* Add a function to iterate over all initialized CPUs.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com> e9416d6
github-actions bot pushed a commit to tsoutsman/Theseus that referenced this pull request Oct 7, 2023
* The new `test_scheduler` is significantly more robust than
  the old one. Currently, the test isn't particularly useful because
  we don't have task migration enabled, but theseus-os#1042 will add
  implicit task migration when unblocking a task.
  * Hence, the test currently focuses on blocking/unblocking tasks.

* Add a function to iterate over all initialized CPUs.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com> e9416d6
Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@kevinaboos kevinaboos self-requested a review October 8, 2023 05:41
Copy link
Member

@kevinaboos kevinaboos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few questions. I'm a bit concerned the overhead associated with some changes in this PR is actually counterproductive for overall performance, but most of it is solvable.

Comment on lines +394 to +395
let ExposedTask { task: mut new_task } = exposed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first line seems good, second line seems wrong (extra whitespace for no reason?)

} else {
task::scheduler::add_task(task_ref.clone());
}
if !self.idle & !self.blocked {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you meant &&

pub fn block(&self) -> Result<RunState, RunState> {
use RunState::{Blocked, Runnable};

let run_state = self.0.task.runstate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I believe this (and the same logic in unblock()) simply obtains a copy of the task's current RunState and then modifies that copy; it doesn't actually modify the RunState of the task itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good way to avoid this is to split up the functions:

  • in task_struct::Task::block(), only the runstate is modified. Basically, this function as it was.
    • the functions in task_struct::Task may need to be renamed.
  • in task::TaskRef::block(), it calls task_struct::Task::block() and then modifies the runqueue accordingly.
  • unblock() is similar.

Ok(Runnable)
} else if run_state.compare_exchange(Blocked, Blocked).is_ok() {
// warn!("Blocked an already blocked task: {:?}", self);
Ok(Blocked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like blocked tasks are not proactively removed from the runqueue they're on, but rather done lazily by each scheduler policy.

This is fine, as it's much faster, but it should be clearly documented somewhere (i.e., in the docs for the Scheduler trait's next() function) because it's a non-symmetric pair of operations. Otherwise, a scheduler policy implementor wouldn't know that it's okay to simply remove a task from the runqueue when it comes across one that is blocked.

Comment on lines +114 to 125
let locked = SCHEDULERS.lock();

let mut min_busyness = usize::MAX;
let mut least_busy_index = None;

for (i, (_, scheduler)) in locked.iter().enumerate() {
let busyness = scheduler.lock().busyness();
if busyness < min_busyness {
least_busy_index = Some(i);
min_busyness = busyness;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is incredibly slow. In general, I don't think we should ever really access SCHEDULERS except when initializing or changing scheduler policies, though we can think about that more later (and how to improve this, e.g., via a separate estimated utilization metric or something that avoids the need to iterate over all schedulers).

It's likely that using this function every time a task is unblocked is going to absolutely destroy performance... right? Also I imagine it will cause lots and lots of task migrations, which is generally something you want to keep to a minimum unless a given CPU core is at max utilization for a "long" time (which we currently don't track a metric for).

Comment on lines +95 to +96
while let Some(task) = self.out_of_tokens.pop() {
self.have_tokens.push_back(task);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a way to do this in one batch rather than item-by-item? I guess not when we're using VecDeque for one list and Vec for the other...?

Comment on lines +63 to +70
// This check prevents an interleaving where `TaskRef::unblock` wouldn't add
// the task back onto the run queue. `TaskRef::unblock` sets the run state and
// then checks `is_on_run_queue` so we have to do the inverse.
if unlikely(task.task.is_runnable()) {
if let Some(task) = self.add_epoch_task(task) {
return Some(task);
}
}
Copy link
Member

@kevinaboos kevinaboos Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC, the crux of the issue surrounding this is_on_run_queue() atomic boolean is that we need to do two things in one "atomic" step:

  1. change the task's runstate to Runnable
  2. add the task back to a runqueue

Since we cannot do that, you're using a separate atomic bool to indicate whether a task is on a runqueue. I understand this, but I don't love this design. It's complex without bringing any other benefits, and kind of a messy design choice, as it makes both current and future scheduler policies harder to write.

Ideally this shouldn't be a necessary step. Can we avoid this complexity via an alternative design that's a bit cleaner and doesn't burden the Scheduler policy implementor with additional concerns?

Side note: if we actually wanted to store some piece of state inside a task about whether that task is on a runqueue, we might as well store a reference to (or the ID of) the actual runqueue that it's currently on. This would at least make it faster to add the task back once it gets unblocked, which addresses my other comment as a plus.

Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>
@tsoutsman tsoutsman marked this pull request as draft October 10, 2023 10:19
@tsoutsman tsoutsman closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants