Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: fix infinite loop when dropping a LocalSet #1892

Merged
merged 10 commits into from
Dec 4, 2019
Merged

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Dec 3, 2019

Motivation

There's currently an issue in task::LocalSet where dropping the local
set can result in an infinite loop if a task running in the local set is
notified from outside the local set (e.g. by a timer). This was reported
in issue #1885.

This issue exists because the Drop impl for task::local::Scheduler
does not drain the queue of tasks notified externally, the way the basic
scheduler does. Instead, only the local queue is drained, leaving some
tasks in place. Since these tasks are never removed, the loop that
continues trying to cancel tasks until the owned task list is totally
empty continues infinitely.

I think this issue was due to the Drop impl being written before a
remote queue was added to the local scheduler, and the need to close the
remote queue as well was overlooked.

Solution

This branch solves the problem by clearing the local scheduler's remote
queue as well as the local one.

I've added a test that reproduces the behavior. The test fails on master
and passes after this change.

In addition, this branch factors out the common task queue logic in the
basic scheduler runtime and the LocalSet struct in tokio::task. This
is because as more work was done on the LocalSet, it has gotten closer
and closer to the basic scheduler in behavior, and factoring out the
shared code reduces the risk of errors caused by LocalSet not doing
something that the basic scheduler does. The queues are now encapsulated
by a MpscQueues struct in tokio::task::queue (crate-public). As a
follow-up, I'd also like to look into changing this type to use the same
remote queue type as the threadpool (a linked list).

In particular, I noticed the basic scheduler has a flag that indicates
the remote queue has been closed, which is set when dropping the
scheduler. This prevents tasks from being added after the scheduler has
started shutting down, stopping a potential task leak. Rather than
duplicating this code in LocalSet, I thought it was probably better to
factor it out into a shared type.

There are a few cases where there are small differences in behavior,
though, so there is still a need for separate types implemented using
the new MpscQueues struct. However, it should cover most of the
identical code.

Note that this diff is rather large, due to the refactoring. However, the
actual fix for the infinite loop is very simple. It can be reviewed on its own
by looking at commit 4f46ac6. The refactor is in a separate commit, with
the SHA 90b5b1f.

Fixes #1885

Signed-off-by: Eliza Weisman eliza@buoyant.io

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Fixes #1885

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This commit factors out the common task queue logic in the basic
scheduler runtime and the `LocalSet` struct in `tokio::task`. This is
because as more work was done on the `LocalSet`, it has gotten closer
and closer to the basic scheduler in behavior, and factoring out the
shared code reduces the risk of errors caused by `LocalSet` not doing
something that the basic scheduler does.

In particular, I noticed the basic scheduler has a flag that indicates
the remote queue has been closed, which is set when dropping the
scheduler. This prevents tasks from being added after the scheduler has
started shutting down, stopping a potential task leak. Rather than
duplicating this code in `LocalSet`, I thought it was probably better to
factor it out into a shared type.

There are a few cases where there are small differences in behavior,
though, so there is still a need for separate types implemented _using_
the new `Queues` struct. However, it should cover most of the identical
code.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added the C-bug Category: This is a bug. label Dec 3, 2019
@hawkw hawkw requested review from carllerche and a team December 3, 2019 21:48
@hawkw hawkw self-assigned this Dec 3, 2019
tokio/src/runtime/mod.rs Outdated Show resolved Hide resolved
tokio/src/task/local.rs Outdated Show resolved Hide resolved
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
tokio/src/runtime/basic_scheduler.rs Outdated Show resolved Hide resolved
tokio/src/task/local.rs Outdated Show resolved Hide resolved
// Wait until all tasks have been released.
// XXX: this is a busy loop, but we don't really have any way to park
// the thread here?
while self.queues.has_tasks_remaining() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, i don't really know if this can happen. I believe the loop was copied from the threaded scheduler where a task could be "stolen" and running on a different thread and that thread has a "lock" on the task... anyway, not a big deal for now, but we should investigate it and better document the guarantees of task (it was initially not really for reusing).

let is_current = ACTIVE.with(|cell| {
cell.get() == self as *const SchedulerPriv
});
let is_current = ACTIVE.with(|cell| cell.get() == self as *const SchedulerPriv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ACTIVE thing was hacked in to fix a bug. Nothing to do now, but we probably should try cleaning it up sometime...

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Feel free to merge. You may want to address @kleimkuhler suggestion first, so I will not merge for you.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
this should hopefully make it clearer that these are not used by the
threadpool.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw merged commit 0e729aa into master Dec 4, 2019
@carllerche carllerche deleted the eliza/fix-localset-drop branch January 10, 2020 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite loop in LocalSet drop impl
4 participants