A possible race condition bug #1946

pimeys · 2019-12-11T15:14:10Z

Version

v0.2.4

Platform

Linux 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Subcrates

Possibly the threaded runtime.

Description

We have a test utility for benchmarks using Hyper 0.13 and Tokio 0.2. Every now and then (rarely, but often enough) the system crashes with error:

thread 'main' panicked at 'state = Snapshot { is_running: false, is_notified: true, is_released: true, is_complete: false, is_canceled: false, is_join_interested: false, has_join_waker: false, is_final_ref: true }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.4/src/task/harness.rs:160:9
stack backtrace:
   0:     0x55eacb265054 - backtrace::backtrace::libunwind::trace::hda41dbcdfba36aa0
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1:     0x55eacb265054 - backtrace::backtrace::trace_unsynchronized::h1a8d6e1f8cb3f5d4
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2:     0x55eacb265054 - std::sys_common::backtrace::_print_fmt::h610c4127487e10da
                               at src/libstd/sys_common/backtrace.rs:76
   3:     0x55eacb265054 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0722dc552e01bd1d
                               at src/libstd/sys_common/backtrace.rs:60
   4:     0x55eacb28778c - core::fmt::write::h01edf6dd68a42c9c
                               at src/libcore/fmt/mod.rs:1030
   5:     0x55eacb260067 - std::io::Write::write_fmt::hf15985f193f03c04
                               at src/libstd/io/mod.rs:1412
   6:     0x55eacb267585 - std::sys_common::backtrace::_print::hd8d5d08a1795e743
                               at src/libstd/sys_common/backtrace.rs:64
   7:     0x55eacb267585 - std::sys_common::backtrace::print::hf89a79e3921a2366
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x55eacb267585 - std::panicking::default_hook::{{closure}}::h3a8f42beb3bb8ae3
                               at src/libstd/panicking.rs:196
   9:     0x55eacb267276 - std::panicking::default_hook::h8f803b0bc31a5c37
                               at src/libstd/panicking.rs:210
  10:     0x55eacb267c85 - std::panicking::rust_panic_with_hook::h825f041245da8739
                               at src/libstd/panicking.rs:473
  11:     0x55eacb267822 - std::panicking::continue_panic_fmt::hbe0378e33481e81b
                               at src/libstd/panicking.rs:380
  12:     0x55eacb26777c - std::panicking::begin_panic_fmt::hecb5f06c798651bd
                               at src/libstd/panicking.rs:335
  13:     0x55eacb18a2cf - tokio::task::harness::Harness<T,S>::drop_task::h710db8934ba23568
  14:     0x55eacb006043 - <alloc::collections::vec_deque::VecDeque<T> as core::ops::drop::Drop>::drop::h613860871752b96d
  15:     0x55eacb07dc12 - alloc::sync::Arc<T>::drop_slow::h05e88d5f4f5f5711
  16:     0x55eacb01ee30 - core::ptr::real_drop_in_place::h3ab2430a672bc2a4
  17:     0x55eacb03c457 - chihiro::main::h7ee1a253419b9f99
  18:     0x55eacb068c53 - std::rt::lang_start::{{closure}}::h7241da27d8c8dc63
  19:     0x55eacb2676b3 - std::rt::lang_start_internal::{{closure}}::ha04574f12d97cbe2
                               at src/libstd/rt.rs:49
  20:     0x55eacb2676b3 - std::panicking::try::do_call::h7c2a8488f72db90c
                               at src/libstd/panicking.rs:292
  21:     0x55eacb26bcba - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  22:     0x55eacb26823d - std::panicking::try::hc3a9b5da4250385d
                               at src/libstd/panicking.rs:271
  23:     0x55eacb26823d - std::panic::catch_unwind::hf27600bf8c37809a
                               at src/libstd/panic.rs:394
  24:     0x55eacb26823d - std::rt::lang_start_internal::h409d4f2fe51133b0
                               at src/libstd/rt.rs:48
  25:     0x55eacb03c502 - main
  26:     0x7fd6bce122e1 - __libc_start_main
  27:     0x55eacafaa35a - _start
  28:                0x0 - <unknown>

The code where this happens is this method.

While browsing old issues, issue #1853 seems quite similar what we're witnessing here.

We've been using tokio 0.2 since early alphas, but the behavior started after the new scheduler was merged.

Hard to repeat due to it happening randomly and after some hours of running the benchmarker.

The text was updated successfully, but these errors were encountered:

carllerche · 2019-12-11T15:55:31Z

Thanks for the report. Which runtime does this happen with?

carllerche · 2019-12-11T15:56:29Z

I guess which scheduler? Basic or threaded?

pimeys · 2019-12-11T16:38:42Z

Threaded.

pimeys · 2019-12-11T16:40:00Z

tokio = { version = "0.2", features = ["rt-threaded", "time", "macros"] }

carllerche · 2019-12-11T18:02:14Z

Could you provide steps on how to reproduce?

pimeys · 2019-12-11T18:26:53Z

Don't know for sure, but the original code is linked in the first post. Basically we're spawning hyper requests to the runtime and awaiting the joinhandles afterwards. This error happens maybe once per week and after some hours of fast spawning.

I can see tomorrow if I can isolate and reproduce the error...

carllerche · 2019-12-11T21:02:48Z

Do you know if it happens while the runtime is shutting down (which would be #1853) or some other case?

pimeys · 2019-12-11T21:06:45Z

I checked the runs and you're right. This happens after the last run, so probably while the runtime is shutting down.

carllerche · 2019-12-11T21:07:23Z

Thanks, that narrows it down a lot. I suspect that you are correct and it is a similar bug as #1853. Thanks for the insight.

The blocking task queue was not explicitly drained as part of the blocking pool shutdown logic. It was originally assumed that the contents of the queue would be dropped when the blocking pool structure is dropped. However, tasks must be explictly shutdown, so we must drain the queue can call `shutdown` on each task. Fixes #1970, #1946

carllerche · 2019-12-17T20:32:43Z

I believe this is the same bug as #1970 and is due to an error in shutdown logic fixed in #1978.

I cannot say for sure as your backtrace is "release" mode and display the component that owns the VecDeque.

The blocking task queue was not explicitly drained as part of the blocking pool shutdown logic. It was originally assumed that the contents of the queue would be dropped when the blocking pool structure is dropped. However, tasks must be explicitly shutdown, so we must drain the queue can call `shutdown` on each task. Fixes #1970, #1946

carllerche · 2019-12-21T05:00:51Z

Closing as fixed.

hawkw mentioned this issue Dec 16, 2019

Random panic behaviour in Windows after CTRL+C has been handled #1970

Closed

hawkw added the C-bug Category: This is a bug. label Dec 16, 2019

carllerche mentioned this issue Dec 17, 2019

rt: fix blocking pool shutdown logic #1978

Merged

carllerche closed this as completed Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A possible race condition bug #1946

A possible race condition bug #1946

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

carllerche commented Dec 17, 2019

carllerche commented Dec 21, 2019

A possible race condition bug #1946

A possible race condition bug #1946

Comments

pimeys commented Dec 11, 2019

Version

Platform

Subcrates

Description

carllerche commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

pimeys commented Dec 11, 2019

carllerche commented Dec 11, 2019

carllerche commented Dec 17, 2019

carllerche commented Dec 21, 2019