Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possible race condition bug #1946

Closed
pimeys opened this issue Dec 11, 2019 · 11 comments
Closed

A possible race condition bug #1946

pimeys opened this issue Dec 11, 2019 · 11 comments
Labels
C-bug Category: This is a bug.

Comments

@pimeys
Copy link

pimeys commented Dec 11, 2019

Version

v0.2.4

Platform

Linux 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Subcrates

Possibly the threaded runtime.

Description

We have a test utility for benchmarks using Hyper 0.13 and Tokio 0.2. Every now and then (rarely, but often enough) the system crashes with error:

thread 'main' panicked at 'state = Snapshot { is_running: false, is_notified: true, is_released: true, is_complete: false, is_canceled: false, is_join_interested: false, has_join_waker: false, is_final_ref: true }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.4/src/task/harness.rs:160:9
stack backtrace:
   0:     0x55eacb265054 - backtrace::backtrace::libunwind::trace::hda41dbcdfba36aa0
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1:     0x55eacb265054 - backtrace::backtrace::trace_unsynchronized::h1a8d6e1f8cb3f5d4
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2:     0x55eacb265054 - std::sys_common::backtrace::_print_fmt::h610c4127487e10da
                               at src/libstd/sys_common/backtrace.rs:76
   3:     0x55eacb265054 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h0722dc552e01bd1d
                               at src/libstd/sys_common/backtrace.rs:60
   4:     0x55eacb28778c - core::fmt::write::h01edf6dd68a42c9c
                               at src/libcore/fmt/mod.rs:1030
   5:     0x55eacb260067 - std::io::Write::write_fmt::hf15985f193f03c04
                               at src/libstd/io/mod.rs:1412
   6:     0x55eacb267585 - std::sys_common::backtrace::_print::hd8d5d08a1795e743
                               at src/libstd/sys_common/backtrace.rs:64
   7:     0x55eacb267585 - std::sys_common::backtrace::print::hf89a79e3921a2366
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x55eacb267585 - std::panicking::default_hook::{{closure}}::h3a8f42beb3bb8ae3
                               at src/libstd/panicking.rs:196
   9:     0x55eacb267276 - std::panicking::default_hook::h8f803b0bc31a5c37
                               at src/libstd/panicking.rs:210
  10:     0x55eacb267c85 - std::panicking::rust_panic_with_hook::h825f041245da8739
                               at src/libstd/panicking.rs:473
  11:     0x55eacb267822 - std::panicking::continue_panic_fmt::hbe0378e33481e81b
                               at src/libstd/panicking.rs:380
  12:     0x55eacb26777c - std::panicking::begin_panic_fmt::hecb5f06c798651bd
                               at src/libstd/panicking.rs:335
  13:     0x55eacb18a2cf - tokio::task::harness::Harness<T,S>::drop_task::h710db8934ba23568
  14:     0x55eacb006043 - <alloc::collections::vec_deque::VecDeque<T> as core::ops::drop::Drop>::drop::h613860871752b96d
  15:     0x55eacb07dc12 - alloc::sync::Arc<T>::drop_slow::h05e88d5f4f5f5711
  16:     0x55eacb01ee30 - core::ptr::real_drop_in_place::h3ab2430a672bc2a4
  17:     0x55eacb03c457 - chihiro::main::h7ee1a253419b9f99
  18:     0x55eacb068c53 - std::rt::lang_start::{{closure}}::h7241da27d8c8dc63
  19:     0x55eacb2676b3 - std::rt::lang_start_internal::{{closure}}::ha04574f12d97cbe2
                               at src/libstd/rt.rs:49
  20:     0x55eacb2676b3 - std::panicking::try::do_call::h7c2a8488f72db90c
                               at src/libstd/panicking.rs:292
  21:     0x55eacb26bcba - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:80
  22:     0x55eacb26823d - std::panicking::try::hc3a9b5da4250385d
                               at src/libstd/panicking.rs:271
  23:     0x55eacb26823d - std::panic::catch_unwind::hf27600bf8c37809a
                               at src/libstd/panic.rs:394
  24:     0x55eacb26823d - std::rt::lang_start_internal::h409d4f2fe51133b0
                               at src/libstd/rt.rs:48
  25:     0x55eacb03c502 - main
  26:     0x7fd6bce122e1 - __libc_start_main
  27:     0x55eacafaa35a - _start
  28:                0x0 - <unknown>

The code where this happens is this method.

While browsing old issues, issue #1853 seems quite similar what we're witnessing here.

We've been using tokio 0.2 since early alphas, but the behavior started after the new scheduler was merged.

Hard to repeat due to it happening randomly and after some hours of running the benchmarker.

@carllerche
Copy link
Member

Thanks for the report. Which runtime does this happen with?

@carllerche
Copy link
Member

I guess which scheduler? Basic or threaded?

@pimeys
Copy link
Author

pimeys commented Dec 11, 2019

Threaded.

@pimeys
Copy link
Author

pimeys commented Dec 11, 2019

tokio = { version = "0.2", features = ["rt-threaded", "time", "macros"] }

@carllerche
Copy link
Member

Could you provide steps on how to reproduce?

@pimeys
Copy link
Author

pimeys commented Dec 11, 2019

Don't know for sure, but the original code is linked in the first post. Basically we're spawning hyper requests to the runtime and awaiting the joinhandles afterwards. This error happens maybe once per week and after some hours of fast spawning.

I can see tomorrow if I can isolate and reproduce the error...

@carllerche
Copy link
Member

Do you know if it happens while the runtime is shutting down (which would be #1853) or some other case?

@pimeys
Copy link
Author

pimeys commented Dec 11, 2019

I checked the runs and you're right. This happens after the last run, so probably while the runtime is shutting down.

@carllerche
Copy link
Member

Thanks, that narrows it down a lot. I suspect that you are correct and it is a similar bug as #1853. Thanks for the insight.

@hawkw hawkw added the C-bug Category: This is a bug. label Dec 16, 2019
carllerche added a commit that referenced this issue Dec 17, 2019
The blocking task queue was not explicitly drained as part of the
blocking pool shutdown logic. It was originally assumed that the
contents of the queue would be dropped when the blocking pool structure
is dropped. However, tasks must be explictly shutdown, so we must drain
the queue can call `shutdown` on each task.

Fixes #1970, #1946
@carllerche
Copy link
Member

I believe this is the same bug as #1970 and is due to an error in shutdown logic fixed in #1978.

I cannot say for sure as your backtrace is "release" mode and display the component that owns the VecDeque.

carllerche added a commit that referenced this issue Dec 18, 2019
The blocking task queue was not explicitly drained as part of the
blocking pool shutdown logic. It was originally assumed that the
contents of the queue would be dropped when the blocking pool structure
is dropped. However, tasks must be explicitly shutdown, so we must drain
the queue can call `shutdown` on each task.

Fixes #1970, #1946
@carllerche
Copy link
Member

Closing as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

3 participants