Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: fix deadlock in shutdown #3228

Merged
merged 2 commits into from
Dec 8, 2020
Merged

Conversation

bdonlan
Copy link
Contributor

@bdonlan bdonlan commented Dec 7, 2020

Previously, the runtime shutdown logic would first hand control over all cores
to a single thread, which would sequentially shut down all tasks on the core
and then wait for them to complete.

This could deadlock when one task is waiting for a later core's task to
complete. For example, in the newly added test, we have a block_in_place task
that is waiting for another task to be dropped. If the latter task adds its
core to the shutdown list later than the former, we end up waiting forever for
the block_in_place task to complete.

Additionally, there also was a bug wherein we'd attempt to park on the parker
after shutting it down that was fixed as part of the refactors above.

This change restructures the code to bring all tasks to a halt (and do any
parking needed) before we collapse to a single thread to avoid this deadlock.

There was also an issue in which cancelled tasks would not unpark the
originating thread, due to what appears to be some sort of optimization gone
wrong. This has been fixed to be much more conservative in selecting when not
to unpark the source thread (this may be too conservative; please take a look
at the changes to release()).

Fixes: #2789

Motivation

Solution

Previously, the runtime shutdown logic would first hand control over all cores
to a single thread, which would sequentially shut down all tasks on the core
and then wait for them to complete.

This could deadlock when one task is waiting for a later core's task to
complete. For example, in the newly added test, we have a `block_in_place` task
that is waiting for another task to be dropped. If the latter task adds its
core to the shutdown list later than the former, we end up waiting forever for
the `block_in_place` task to complete.

Additionally, there also was a bug wherein we'd attempt to park on the parker
after shutting it down that was fixed as part of the refactors above.

This change restructures the code to bring all tasks to a halt (and do any
parking needed) before we collapse to a single thread to avoid this deadlock.

There was also an issue in which cancelled tasks would not unpark the
originating thread, due to what appears to be some sort of optimization gone
wrong. This has been fixed to be much more conservative in selecting when not
to unpark the source thread (this may be too conservative; please take a look
at the changes to `release()`).

Fixes: tokio-rs#2789
Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 I'm fine removing the optimization at this point.

@carllerche carllerche merged commit 57dffb9 into tokio-rs:master Dec 8, 2020
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Dec 8, 2020
carllerche added a commit that referenced this pull request Jan 28, 2021
In some cases, a cycle is created between I/O driver wakers and the I/O
driver resource slab. This patch clears stored wakers when an I/O
resource is dropped, breaking the cycle.

Fixes #3228
carllerche added a commit that referenced this pull request Jan 29, 2021
In some cases, a cycle is created between I/O driver wakers and the I/O
driver resource slab. This patch clears stored wakers when an I/O
resource is dropped, breaking the cycle.

Fixes #3228
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic with tokio::time on shutdown
3 participants