Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Racy deadlock at shutdown with block_in_place #2119

Closed
jonhoo opened this issue Jan 15, 2020 · 5 comments
Closed

Racy deadlock at shutdown with block_in_place #2119

jonhoo opened this issue Jan 15, 2020 · 5 comments

Comments

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Jan 15, 2020

Version

tokio 0.2.9

Platform

Linux jerry 5.4.11-arch1-1 #1 SMP PREEMPT Sun, 12 Jan 2020 12:15:27 +0000 x86_64 GNU/Linux

Description

I think I've encountered a bug somewhere in block_in_place. Not entirely sure what the issue is yet, but it's related to shutdown. It does not happen on every execution. One thread is stuck dropping the BlockingPool:

#0  0x00007f09acc55c45 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x000055d4f4ff0c9c in std::sys::unix::condvar::Condvar::wait (mutex=0xfffffffffffffe00, self=<optimized out>) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/sys/unix/condvar.rs:73
#2  std::sys_common::condvar::Condvar::wait (mutex=0x7f09ac00a5f0, self=<optimized out>) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/sys_common/condvar.rs:50
#3  std::sync::condvar::Condvar::wait (self=0x7f09ac04b728, guard=...) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/sync/condvar.rs:200
#4  tokio::park::thread::Inner::park (self=0x7f09ac04b710) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/park/thread.rs:112
#5  0x000055d4f4fe7b25 in <tokio::park::thread::CachedParkThread as tokio::park::Park>::park::{{closure}} (park_thread=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/park/thread.rs:256
#6  tokio::park::thread::CachedParkThread::with_current::{{closure}} (inner=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/park/thread.rs:242
#7  std::thread::local::LocalKey<T>::try_with (self=<optimized out>, f=...) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/thread/local.rs:262
#8  tokio::park::thread::CachedParkThread::with_current (self=<optimized out>, f=...) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/park/thread.rs:242
#9  <tokio::park::thread::CachedParkThread as tokio::park::Park>::park (self=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/park/thread.rs:256
#10 tokio::runtime::enter::Enter::block_on (self=<optimized out>, f=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/enter.rs:104
#11 0x000055d4f4fd7749 in tokio::runtime::blocking::shutdown::Receiver::wait (self=0x7ffc222295e8) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/blocking/shutdown.rs:46
#12 0x000055d4f4fe6e73 in <tokio::runtime::blocking::pool::BlockingPool as core::ops::drop::Drop>::drop (self=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/blocking/pool.rs:116

while another is stuck in sitting in epoll_wait following a park in shutdown (full backtrace):

#12 <tokio::runtime::park::Parker as tokio::park::Park>::park (self=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/park.rs:93
#13 0x000055d4f4fe0e12 in tokio::runtime::thread_pool::worker::GenerationGuard::shutdown (self=0x7f09ab97cae8) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:570
#14 tokio::runtime::thread_pool::worker::GenerationGuard::run (self=...) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:276
#15 tokio::runtime::thread_pool::worker::Worker::run::{{closure}}::{{closure}} (ob=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:158
#16 std::thread::local::LocalKey<T>::try_with (f=..., self=<optimized out>) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/thread/local.rs:262
#17 std::thread::local::LocalKey<T>::with (f=..., self=<optimized out>) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/thread/local.rs:239
#18 tokio::runtime::thread_pool::worker::Worker::run::{{closure}} () at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:134
#19 tokio::runtime::thread_pool::current::set::{{closure}} (cell=<optimized out>) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/current.rs:47
#20 std::thread::local::LocalKey<T>::try_with (self=<optimized out>, f=...) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/thread/local.rs:262
#21 std::thread::local::LocalKey<T>::with (self=<optimized out>, f=...) at /rustc/31dd4f4acbcbdb02b0745d2136399ed664a28050/src/libstd/thread/local.rs:239
#22 0x000055d4f4fe3357 in tokio::runtime::thread_pool::current::set (pool=0x7f09ab97cbb8, index=<optimized out>, f=...) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/current.rs:29
#23 tokio::runtime::thread_pool::worker::Worker::run (self=...) at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:130
#24 0x000055d4f4ff4fda in tokio::runtime::thread_pool::worker::Worker::block_in_place::{{closure}} () at /home/jfrg/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.9/src/runtime/thread_pool/worker.rs:260

Specifically, the worker is stuck here:

self.park_mut().park().expect("park failed");

That is, it realizes it needs to shut down, but then blocks forever in that park. Note that it is not executing any of my code at this point (that is, my block_in_place closure has returned).

My guess for what goes wrong is as follows:

  1. Task t calls block_in_place, and causes worker w to be given away.
  2. w decides to shut down, notices that some of the tasks it owns are still pending, and parks at
    self.park_mut().park().expect("park failed");
  3. t finishes its blocking closure, and ends up at
    match res {
    It ultimately does a remote release on the original worker (I believe) here
    executor.cast::<S>().as_ref().release(task);
    Which (I believe) ends up being the impl at
    self.pending_drop.push(task);
    That just sticks the drop into the pending queue for w. But, w is parked, and so never learns that t has completed because it won't check its pending drop queue.

I don't know if the same deadlock occurs even if t does not complete after the blocking closure completes (i.e., if it hits this case).

I think both of those may need a notify of some kind to unpark w. Or I am missing something obvious, which is totally possible. I don't see anything that would wake the worker that's parked in shutdown when something is added to its pending drop queue. And specifically, I don't think the existing worker notifications that happen during shutdown are sufficient to handle this case. I don't think that helps? If there is just one worker remaining, and it's stuck in that park in the loop in shutdown, and then the last task is remotely released from a block_in_place call (so just appended to the pending_drop queue), then nothing wakes that worker.

Now, there are loom tests for block_in_place:

#[test]
fn only_blocking() {
loom::model(|| {
let pool = mk_pool(1);
let (block_tx, block_rx) = oneshot::channel();
pool.spawn(async move {
crate::task::block_in_place(move || {
block_tx.send(());
})
});
block_rx.recv();
drop(pool);
});
}
#[test]
fn blocking_and_regular() {
const NUM: usize = 3;
loom::model(|| {
let pool = mk_pool(1);
let cnt = Arc::new(AtomicUsize::new(0));
let (block_tx, block_rx) = oneshot::channel();
let (done_tx, done_rx) = oneshot::channel();
let done_tx = Arc::new(Mutex::new(Some(done_tx)));
pool.spawn(async move {
crate::task::block_in_place(move || {
block_tx.send(());
})
});
for _ in 0..NUM {
let cnt = cnt.clone();
let done_tx = done_tx.clone();
pool.spawn(async move {
if NUM == cnt.fetch_add(1, Relaxed) + 1 {
done_tx.lock().unwrap().take().unwrap().send(());
}
});
}
done_rx.recv();
block_rx.recv();
drop(pool);
});
}

But apparently they don't catch whatever is going on here. I suspect that is because they do not ever yield Poll::Pending, but do not have any direct evidence for that.

@jonhoo
Copy link
Sponsor Contributor Author

jonhoo commented Jan 15, 2020

Tried to make loom catch this in #2120, but it does not appear to be sufficient.

@jonhoo
Copy link
Sponsor Contributor Author

jonhoo commented Jan 31, 2020

Perhaps more readable backtraces:

The dropping thread:

#0  0x00007fef2e6fcc45 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x000055a80f77d9ce in std::sys::unix::condvar::Condvar::wait (self=0x7fef2dc096c0, mutex=0x7fef2dc09690) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/sys/unix/condvar.rs:73
#2  0x000055a80f7734b7 in std::sys_common::condvar::Condvar::wait (self=0x7fef2dc096c0, mutex=0x7fef2dc09690) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/sys_common/condvar.rs:50
#3  0x000055a80f781ca1 in std::sync::condvar::Condvar::wait (self=0x7fef2dc11468, guard=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/sync/condvar.rs:200
#4  0x000055a80f6b582f in tokio::park::thread::Inner::park (self=0x7fef2dc11450) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/thread.rs:112
#5  0x000055a80f6b6511 in <tokio::park::thread::CachedParkThread as tokio::park::Park>::park::{{closure}} (park_thread=0x7fef2e1c6280) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/thread.rs:256
#6  0x000055a80f6b6420 in tokio::park::thread::CachedParkThread::with_current::{{closure}} (inner=0x7fef2e1c6280) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/thread.rs:242
#7  0x000055a80f6f2072 in std::thread::local::LocalKey<T>::try_with (self=0x55a8101235d0, f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/thread/local.rs:262
#8  0x000055a80f6b63c6 in tokio::park::thread::CachedParkThread::with_current (self=0x7fff86e86358, f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/thread.rs:242
#9  0x000055a80f6b6493 in <tokio::park::thread::CachedParkThread as tokio::park::Park>::park (self=0x7fff86e86358) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/thread.rs:256
#10 0x000055a80f705e50 in tokio::runtime::enter::Enter::block_on (self=0x7fff86e86408, f=0x7fff86e865c8) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/enter.rs:107
#11 0x000055a80f741438 in tokio::runtime::blocking::shutdown::Receiver::wait (self=0x7fff86e865c8) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/shutdown.rs:46
#12 0x000055a80f737a15 in <tokio::runtime::blocking::pool::BlockingPool as core::ops::drop::Drop>::drop (self=0x7fff86e865c0) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/pool.rs:116
#13 0x000055a80f71f575 in core::ptr::drop_in_place () at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libcore/ptr/mod.rs:174

The thread holding it up

#0  0x00007fef2e40f60e in epoll_wait () from /usr/lib/libc.so.6
#1  0x000055a80f77c0c7 in mio::sys::unix::epoll::Selector::select (self=0x7fef2dc8a3d0, evts=0x7fef2dc8a4a8, awakener=..., timeout=...) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/sys/unix/epoll.rs:72
#2  0x000055a80f76f120 in mio::poll::Poll::poll2 (self=0x7fef2dc8a3d0, events=0x7fef2dc8a4a8, timeout=..., interruptible=false) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1178
#3  0x000055a80f76ed87 in mio::poll::Poll::poll1 (self=0x7fef2dc8a3d0, events=0x7fef2dc8a4a8, timeout=..., interruptible=false) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1139
#4  0x000055a80f76e5c7 in mio::poll::Poll::poll (self=0x7fef2dc8a3d0, events=0x7fef2dc8a4a8, timeout=...) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1010
#5  0x000055a80f6ac16f in tokio::io::driver::Driver::turn (self=0x7fef2dc8a4a8, max_wait=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/io/driver/mod.rs:107
#6  0x000055a80f6acb28 in <tokio::io::driver::Driver as tokio::park::Park>::park (self=0x7fef2dc8a4a8) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/io/driver/mod.rs:176
#7  0x000055a80f6e9779 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park (self=0x7fef2dc8a4a0) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/either.rs:28
#8  0x000055a80f6ba2e1 in <tokio::time::driver::Driver<T> as tokio::park::Park>::park (self=0x7fef2dc8a478) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/time/driver/mod.rs:261
#9  0x000055a80f6e96d8 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park (self=0x7fef2dc8a470) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/either.rs:28
#10 0x000055a80f6d1cbb in tokio::runtime::park::Inner::park_driver (self=0x7fef2dc11350, driver=0x7fef2dc8a470) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:201
#11 0x000055a80f6d14a6 in tokio::runtime::park::Inner::park (self=0x7fef2dc11350) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:133
#12 0x000055a80f6d0f03 in <tokio::runtime::park::Parker as tokio::park::Park>::park (self=0x7fef2dc38330) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:93
#13 0x000055a80f6faf9d in tokio::runtime::thread_pool::worker::GenerationGuard::shutdown (self=0x7feec51f9d28) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:576
#14 0x000055a80f6f9995 in tokio::runtime::thread_pool::worker::GenerationGuard::run (self=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:282
#15 0x000055a80f6f94f9 in tokio::runtime::thread_pool::worker::Worker::run::{{closure}}::{{closure}} (ob=0x7feec51fd608) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:160
#16 0x000055a80f6f139c in std::thread::local::LocalKey<T>::try_with (self=0x55a8101243a8, f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/thread/local.rs:262
#17 0x000055a80f6f09a8 in std::thread::local::LocalKey<T>::with (self=0x55a8101243a8, f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/thread/local.rs:239
#18 0x000055a80f6f959e in tokio::runtime::thread_pool::worker::Worker::run::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:136
#19 0x000055a80f7391ac in tokio::runtime::thread_pool::current::set::{{closure}} (cell=0x7feec51fd628) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/current.rs:47
#20 0x000055a80f6f1e39 in std::thread::local::LocalKey<T>::try_with (self=0x55a8101254a8, f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/thread/local.rs:262
#21 0x000055a80f6f0c8c in std::thread::local::LocalKey<T>::with (self=0x55a8101254a8, f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/thread/local.rs:239
#22 0x000055a80f7390a2 in tokio::runtime::thread_pool::current::set (pool=0x7feec51fa158, index=3, f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/current.rs:29
#23 0x000055a80f6f9324 in tokio::runtime::thread_pool::worker::Worker::run (self=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:132
#24 0x000055a80f6f9950 in tokio::runtime::thread_pool::worker::Worker::block_in_place::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:266
#25 0x000055a80f6e9ca8 in <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll (self=..., _cx=0x7feec51fa310) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/task.rs:30
#26 0x000055a80f6ae414 in tokio::task::core::Core<T>::poll (self=0x7feef4483220, header=0x7feef44831f0) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/core.rs:128
#27 0x000055a80f745708 in tokio::task::harness::Harness<T,S>::poll::{{closure}}::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/harness.rs:120
#28 0x000055a80f71bbc8 in core::ops::function::FnOnce::call_once () at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libcore/ops/function.rs:232
#29 0x000055a80f6df09e in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (self=..., _args=()) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/panic.rs:318
#30 0x000055a80f6ff2b5 in std::panicking::try::do_call (data=0x7feec51fa4e8 "\360\246\037\305\356\177") at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/panicking.rs:305
#31 0x000055a80f81d017 in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:86
#32 0x000055a80f6fee37 in std::panicking::try (f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/panicking.rs:281
#33 0x000055a80f6df435 in std::panic::catch_unwind (f=...) at /rustc/212b2c7da87f3086af535b33a9ca6b5242f2d5a7/src/libstd/panic.rs:394
#34 0x000055a80f745823 in tokio::task::harness::Harness<T,S>::poll::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/harness.rs:101
#35 0x000055a80f6cf67f in tokio::loom::std::causal_cell::CausalCell<T>::with_mut (self=0x7feef4483220, f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/loom/std/causal_cell.rs:41
#36 0x000055a80f74527f in tokio::task::harness::Harness<T,S>::poll (self=..., executor=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/harness.rs:100
#37 0x000055a80f6af358 in tokio::task::raw::poll (ptr=0x7feef44831f0, executor=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/raw.rs:162
#38 0x000055a80f6af087 in tokio::task::raw::RawTask::poll (self=..., executor=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/raw.rs:113
#39 0x000055a80f72a70c in tokio::task::Task<S>::run (self=..., executor=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/task/mod.rs:381
#40 0x000055a80f738f4f in tokio::runtime::blocking::pool::run_task (f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/pool.rs:290
#41 0x000055a80f738628 in tokio::runtime::blocking::pool::Inner::run (self=0x7fef2dca5a90) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/pool.rs:206
#42 0x000055a80f73835e in tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/pool.rs:186
#43 0x000055a80f6d00c1 in tokio::runtime::context::enter (new=..., f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/context.rs:72
#44 0x000055a80f6c9247 in tokio::runtime::handle::Handle::enter (self=0x7feec51facd0, f=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/handle.rs:34
#45 0x000055a80f7383f7 in tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} () at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/blocking/pool.rs:185

@jonhoo
Copy link
Sponsor Contributor Author

jonhoo commented Jan 31, 2020

It turns out the pending_drop queue of the stuck thread is actually empty. So, here's some new data. The stuck thread is actually stuck doing an epoll_wait on an empty file descriptor set with a timeout of -1, which as far as I can tell it can basically never return from. No amount of notifying will wake it from that. From gdb:

(gdb) up
#1  0x000055bfba165017 in mio::sys::unix::epoll::Selector::select (self=0x7f482168ad30, evts=0x7f482168ae08, awakener=..., timeout=...) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/sys/unix/epoll.rs:72
72                  let cnt = cvt(libc::epoll_wait(self.epfd,
(gdb) list
67                  .unwrap_or(-1);
68
69              // Wait for epoll events for at most timeout_ms milliseconds
70              evts.clear();
71              unsafe {
72                  let cnt = cvt(libc::epoll_wait(self.epfd,
73                                                 evts.events.as_mut_ptr(),
74                                                 evts.events.capacity() as i32,
75                                                 timeout_ms))?;
76                  let cnt = cnt as usize;
(gdb) print evts
$1 = (*mut mio::sys::unix::epoll::Events) 0x7f482168ae08
(gdb) print evts.events
$2 = alloc::vec::Vec<libc::unix::linux_like::epoll_event> {buf: alloc::raw_vec::RawVec<libc::unix::linux_like::epoll_event, alloc::alloc::Global> {ptr: core::ptr::unique::Unique<libc::unix::linux_like::epoll_event> {pointer: 0x7f4821613000, _marker: core::marker::PhantomData<libc::unix::linux_like::epoll_event>}, cap: 1024, a: alloc::alloc::Global}, len: 0}
(gdb) print timeout_ms
$3 = -1

The way the thread got there was (as above) through:

#0  0x00007f4821d3a60e in epoll_wait () from /usr/lib/libc.so.6
#1  0x000055bfba165017 in mio::sys::unix::epoll::Selector::select (self=0x7f482168ad30, evts=0x7f482168ae08, awakener=..., timeout=...) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/sys/unix/epoll.rs:72
#2  0x000055bfba158070 in mio::poll::Poll::poll2 (self=0x7f482168ad30, events=0x7f482168ae08, timeout=..., interruptible=false) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1178
#3  0x000055bfba157cd7 in mio::poll::Poll::poll1 (self=0x7f482168ad30, events=0x7f482168ae08, timeout=..., interruptible=false) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1139
#4  0x000055bfba157517 in mio::poll::Poll::poll (self=0x7f482168ad30, events=0x7f482168ae08, timeout=...) at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.6.21/src/poll.rs:1010
#5  0x000055bfba09725f in tokio::io::driver::Driver::turn (self=0x7f482168ae08, max_wait=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/io/driver/mod.rs:107
#6  0x000055bfba097c18 in <tokio::io::driver::Driver as tokio::park::Park>::park (self=0x7f482168ae08) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/io/driver/mod.rs:176
#7  0x000055bfba0d3369 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park (self=0x7f482168ae00) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/either.rs:28
#8  0x000055bfba0a4321 in <tokio::time::driver::Driver<T> as tokio::park::Park>::park (self=0x7f482168add8) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/time/driver/mod.rs:261
#9  0x000055bfba0d3408 in <tokio::park::either::Either<A,B> as tokio::park::Park>::park (self=0x7f482168add0) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/park/either.rs:28
#10 0x000055bfba0bc07b in tokio::runtime::park::Inner::park_driver (self=0x7f4821699050, driver=0x7f482168add0) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:201
#11 0x000055bfba0bb866 in tokio::runtime::park::Inner::park (self=0x7f4821699050) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:133
#12 0x000055bfba0bb2c3 in <tokio::runtime::park::Parker as tokio::park::Park>::park (self=0x7f4821638990) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/park.rs:93
#13 0x000055bfba0e39bd in tokio::runtime::thread_pool::worker::GenerationGuard::shutdown (self=0x7f47eaafad98) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/runtime/thread_pool/worker.rs:576

Seemingly what happens is that the last driver shuts down, decides to try and drive the I/O driver, which ends up parking by calling poll here:

match self.inner.io.poll(&mut self.events, max_wait) {

When actually self.events is empty, leading to the call blocking forever.

@jonhoo
Copy link
Sponsor Contributor Author

jonhoo commented Jan 31, 2020

Confirmed this through gdb (see the value of len):

#5  0x000055bfba09725f in tokio::io::driver::Driver::turn (self=0x7f482168ae08, max_wait=...) at /home/jon/.cargo/git/checkouts/tokio-377c595163f99a10/6ea6821/tokio/src/io/driver/mod.rs:107
107             match self.inner.io.poll(&mut self.events, max_wait) {
(gdb) print self.events
$4 = mio::poll::Events {inner: mio::sys::unix::epoll::Events {events: alloc::vec::Vec<libc::unix::linux_like::epoll_event> {buf: alloc::raw_vec::RawVec<libc::unix::linux_like::epoll_event, alloc::alloc::Global> {ptr: core::ptr::unique::Unique<libc::unix::linux_like::epoll_event> {pointer: 0x7f4821613000, _marker: core::marker::PhantomData<libc::unix::linux_like::epoll_event>}, cap: 1024, a: alloc::alloc::Global}, len: 0}}}

@jonhoo
Copy link
Sponsor Contributor Author

jonhoo commented Jan 31, 2020

Got too excited -- self.events should be empty for a call to epoll...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant