rt: fix deadlock in shutdown #3228

bdonlan · 2020-12-07T23:36:21Z

Previously, the runtime shutdown logic would first hand control over all cores
to a single thread, which would sequentially shut down all tasks on the core
and then wait for them to complete.

This could deadlock when one task is waiting for a later core's task to
complete. For example, in the newly added test, we have a block_in_place task
that is waiting for another task to be dropped. If the latter task adds its
core to the shutdown list later than the former, we end up waiting forever for
the block_in_place task to complete.

Additionally, there also was a bug wherein we'd attempt to park on the parker
after shutting it down that was fixed as part of the refactors above.

This change restructures the code to bring all tasks to a halt (and do any
parking needed) before we collapse to a single thread to avoid this deadlock.

There was also an issue in which cancelled tasks would not unpark the
originating thread, due to what appears to be some sort of optimization gone
wrong. This has been fixed to be much more conservative in selecting when not
to unpark the source thread (this may be too conservative; please take a look
at the changes to release()).

Fixes: #2789

Motivation

Solution

Previously, the runtime shutdown logic would first hand control over all cores to a single thread, which would sequentially shut down all tasks on the core and then wait for them to complete. This could deadlock when one task is waiting for a later core's task to complete. For example, in the newly added test, we have a `block_in_place` task that is waiting for another task to be dropped. If the latter task adds its core to the shutdown list later than the former, we end up waiting forever for the `block_in_place` task to complete. Additionally, there also was a bug wherein we'd attempt to park on the parker after shutting it down that was fixed as part of the refactors above. This change restructures the code to bring all tasks to a halt (and do any parking needed) before we collapse to a single thread to avoid this deadlock. There was also an issue in which cancelled tasks would not unpark the originating thread, due to what appears to be some sort of optimization gone wrong. This has been fixed to be much more conservative in selecting when not to unpark the source thread (this may be too conservative; please take a look at the changes to `release()`). Fixes: tokio-rs#2789

carllerche

Thanks 👍 I'm fine removing the optimization at this point.

In some cases, a cycle is created between I/O driver wakers and the I/O driver resource slab. This patch clears stored wakers when an I/O resource is dropped, breaking the cycle. Fixes #3228

bdonlan requested a review from carllerche December 7, 2020 23:36

make clippy happy

bfffa0e

carllerche approved these changes Dec 8, 2020

View reviewed changes

carllerche merged commit 57dffb9 into tokio-rs:master Dec 8, 2020

carllerche mentioned this pull request Dec 8, 2020

time: fix panic on multithreaded runtime shutdown #3226

Closed

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Dec 8, 2020

bdonlan mentioned this pull request Dec 14, 2020

chore: prepare v0.3.6 release #3276

Merged

carllerche mentioned this pull request Jan 28, 2021

io: fix memory leak during shutdown #3477

Merged

dependabot bot mentioned this pull request Mar 18, 2021

Bump tokio from 0.2.24 to 0.3.7 in /graph-data.rs openshift/cincinnati-graph-data#711

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt: fix deadlock in shutdown #3228

rt: fix deadlock in shutdown #3228

bdonlan commented Dec 7, 2020

carllerche left a comment

rt: fix deadlock in shutdown #3228

rt: fix deadlock in shutdown #3228

Conversation

bdonlan commented Dec 7, 2020

Motivation

Solution

carllerche left a comment

Choose a reason for hiding this comment