Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sharded-slab panic after upgrade to tokio 1.9.0 (and other crates) #1485

Closed
jeromegn opened this issue Jul 30, 2021 · 6 comments
Closed

sharded-slab panic after upgrade to tokio 1.9.0 (and other crates) #1485

jeromegn opened this issue Jul 30, 2021 · 6 comments
Assignees
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working

Comments

@jeromegn
Copy link
Contributor

Bug Report

Version

$ cargo tree | grep tracing
│   │   └── tracing v0.1.26
│   │       ├── tracing-attributes v0.1.15 (proc-macro)
│   │       └── tracing-core v0.1.18
│   ├── tracing v0.1.26 (*)
│   └── tracing v0.1.26 (*)
│   └── tracing v0.1.26 (*)
├── tracing v0.1.26 (*)
├── tracing-core v0.1.18 (*)
├── tracing-futures v0.2.5
│   └── tracing v0.1.26 (*)
├── tracing-log v0.1.2
│   └── tracing-core v0.1.18 (*)
├── tracing-serde v0.1.2
│   └── tracing-core v0.1.18 (*)
├── tracing-subscriber v0.2.19
│   ├── tracing v0.1.26 (*)
│   ├── tracing-core v0.1.18 (*)
│   ├── tracing-log v0.1.2 (*)
│   └── tracing-serde v0.1.2 (*)

Platform

Linux kernel 5.11.7, Ubuntu Bionic

Crates

I believe this is happening in tracing-subscriber

Description

We routinely upgrade dependencies on our rust-based proxy. This time it seemed to cause a lot of panics when we rolled it out.

It appeared to happen when create spans. I'm not entirely sure.

Example line where a panic occurred:

impl ResponseBody {
    pub fn new(body: Body, guard: Option<LoadGuard>, task_guard: Option<IsolateTaskGuard>) -> Self {
        Self {
            body,
            guard,
            started_at: None,
            recorder: None,
            task_guard,
            span: info_span!("response_body"), // <-- right here
        }
    }

(it seems to happen all over the place)

Here's the panic:

index out of bounds: the len is 4096 but the index is 4134

Thread 0 Crashed:
0   std                             0x55c926f790dc      std::sys_common::backtrace::__rust_end_short_backtrace (backtrace.rs:141)
1   <unknown>                       0x55c926f7b189      rust_begin_unwind (panicking.rs:493)
2   core                            0x55c926f9bd01      core::panicking::panic_fmt (panicking.rs:92)
3   core                            0x55c926f9bcc2      core::panicking::panic_bounds_check (panicking.rs:69)
4   sharded_slab                    0x55c926318546      [inlined] sharded_slab::shard::Array<T>::current (shard.rs:297)
5   sharded_slab                    0x55c926318546      [inlined] sharded_slab::pool::Pool<T>::create (pool.rs:452)
6   sharded_slab                    0x55c926318546      sharded_slab::pool::Pool<T>::create_with (pool.rs:656)
7   tracing_subscriber              0x55c92642682b      tracing_subscriber::registry::sharded::Registry::new_span (sharded.rs:188)
8   tracing_subscriber              0x55c92642621f      tracing_subscriber::layer::Layered<T>::new_span (layer.rs:642)
9   tracing_subscriber              0x55c9264263ee      [inlined] tracing_subscriber::layer::Layered<T>::new_span (layer.rs:642)
10  tracing_subscriber              0x55c9264263ee      tracing_subscriber::layer::Layered<T>::new_span (layer.rs:642)
11  alloc                           0x55c926f5a203      [inlined] alloc::sync::Arc<T>::new_span (subscriber.rs:662)
12  tracing_core                    0x55c926f5a203      [inlined] tracing_core::dispatcher::Dispatch::new_span (dispatcher.rs:480)
13  tracing                         0x55c926f5a203      tracing::span::Span::make_with (span.rs:581)
14  tracing                         0x55c926f5afb3      [inlined] tracing::span::Span::new_with (span.rs:461)
15  tracing                         0x55c926f5afb3      [inlined] tracing::span::Span::new::{{closure}} (span.rs:450)
16  tracing_core                    0x55c926f5afb3      [inlined] tracing_core::dispatcher::get_default::{{closure}} (dispatcher.rs:346)
17  std                             0x55c926f5afb3      [inlined] std::thread::local::LocalKey<T>::try_with (local.rs:376)
18  tracing_core                    0x55c926f5afb3      tracing_core::dispatcher::get_default (dispatcher.rs:343)
19  tracing                         0x55c926f5a1ae      tracing::span::Span::new (span.rs:450)
20  fly_proxy                       0x55c9267cd639      fly_proxy::connect::body::ResponseBody::new (body.rs:395)
21  fly_proxy                       0x55c926923c58      [inlined] fly_proxy::http::service::do_request::{{closure}}::{{closure}}::{{closure}} (service.rs:696)
22  http                            0x55c926923c58      [inlined] http::response::Response<T>::map (response.rs:478)
23  fly_proxy                       0x55c926923c58      fly_proxy::http::service::do_request::{{closure}}::{{closure}} (service.rs:695)
24  core                            0x55c926952388      [inlined] core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
25  fly_proxy                       0x55c926952388      [inlined] fly_proxy::http::service::do_request::{{closure}} (service.rs:656)
26  core                            0x55c926952388      [inlined] core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
27  fly_proxy                       0x55c926952388      fly_proxy::http::service::HttpHandler::call::{{closure}} (service.rs:366)
28  core                            0x55c92667ec02      [inlined] core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
29  futures_util                    0x55c92667ec02      [inlined] futures_util::future::future::map::Map<T>::poll (map.rs:55)
30  futures_util                    0x55c92667ec02      futures_util::future::future::Inspect<T>::poll (lib.rs:95)
31  tracing_futures                 0x55c9265fd3e6      tracing_futures::Instrumented<T>::poll (lib.rs:283)
32  core                            0x55c9264aed0c      [inlined] core::pin::Pin<T>::poll (future.rs:120)
33  hyper                           0x55c9264aed0c      hyper::proto::h1::dispatch::Server<T>::poll_msg (dispatch.rs:490)
34  hyper                           0x55c9264bbcad      [inlined] hyper::proto::h1::dispatch::Dispatcher<T>::poll_write (dispatch.rs:296)
35  hyper                           0x55c9264bbcad      [inlined] hyper::proto::h1::dispatch::Dispatcher<T>::poll_loop (dispatch.rs:160)
36  hyper                           0x55c9264bbcad      hyper::proto::h1::dispatch::Dispatcher<T>::poll_inner (dispatch.rs:136)
37  hyper                           0x55c926481bb8      [inlined] hyper::proto::h1::dispatch::Dispatcher<T>::poll_catch (dispatch.rs:119)
38  hyper                           0x55c926481bb8      [inlined] hyper::proto::h1::dispatch::Dispatcher<T>::poll (dispatch.rs:423)
39  hyper                           0x55c926481bb8      [inlined] hyper::server::conn::ProtoServer<T>::poll (conn.rs:1025)
40  hyper                           0x55c926481bb8      hyper::server::conn::upgrades::UpgradeableConnection<T>::poll (conn.rs:1226)
41  tracing_futures                 0x55c9264411c0      tracing_futures::Instrumented<T>::poll (lib.rs:283)
42  <unknown>                       0x55c92633dcd6      [inlined] F::poll (future.rs:108)
43  fly_proxy                       0x55c92633dcd6      [inlined] fly_proxy::layers::http::HttpService::call::{{closure}}::{{closure}}::{{closure}} (select.rs:505)
44  tokio                           0x55c92633dcd6      tokio::future::poll_fn::PollFn<T>::poll (poll_fn.rs:38)
45  fly_proxy                       0x55c9263d5412      fly_proxy::layers::http::HttpService::call::{{closure}}::{{closure}} (http.rs:96)
46  core                            0x55c92659b038      [inlined] core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
47  tracing                         0x55c92659b038      tracing::instrument::Instrumented<T>::poll (instrument.rs:151)
48  fly_proxy                       0x55c9263f085b      [inlined] fly_proxy::layers::http::HttpService::call::{{closure}} (http.rs:87)
49  core                            0x55c9263f085b      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
50  core                            0x55c9263e5690      [inlined] core::pin::Pin<T>::poll (future.rs:120)
51  fly_proxy                       0x55c9263e5690      [inlined] fly_proxy::layers::tls::TlsLayeredService<T>::call::{{closure}} (tls.rs:136)
52  core                            0x55c9263e5690      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
53  core                            0x55c9263de9a4      [inlined] core::pin::Pin<T>::poll (future.rs:120)
54  fly_proxy                       0x55c9263de9a4      [inlined] fly_proxy::layers::rate::RateLimit<T>::call::{{closure}} (rate.rs:70)
55  core                            0x55c9263de9a4      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
56  core                            0x55c926441901      [inlined] core::pin::Pin<T>::poll (future.rs:120)
57  tower                           0x55c926441901      [inlined] tower::limit::concurrency::future::ResponseFuture<T>::poll (future.rs:38)
58  tower                           0x55c926441901      [inlined] tower::load_shed::future::ResponseFuture<T>::poll (future.rs:51)
59  tracing_futures                 0x55c926441901      tracing_futures::Instrumented<T>::poll (lib.rs:283)
60  fly_proxy                       0x55c92649990a      [inlined] fly_proxy::util::spawn::RefCountedFut<T>::poll (spawn.rs:87)
61  fly_proxy                       0x55c92649990a      fly_proxy::util::spawn::CountedFut<T>::poll (spawn.rs:52)
62  tokio                           0x55c926466027      [inlined] tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} (core.rs:147)
63  tokio                           0x55c926466027      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut (unsafe_cell.rs:14)
64  tokio                           0x55c92653dca0      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:137)
65  tokio                           0x55c92653dca0      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:437)
66  std                             0x55c92653dca0      [inlined] std::panic::AssertUnwindSafe<T>::call_once (panic.rs:344)
67  std                             0x55c92653dca0      [inlined] std::panicking::try::do_call (panicking.rs:379)
68  std                             0x55c92653dca0      [inlined] std::panicking::try (panicking.rs:343)
69  std                             0x55c92653dca0      [inlined] std::panic::catch_unwind (panic.rs:431)
70  tokio                           0x55c92653dca0      tokio::runtime::task::harness::poll_future (harness.rs:424)
71  tokio                           0x55c9265531b4      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:89)
72  tokio                           0x55c9265531b4      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:59)
73  tokio                           0x55c926eb6730      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
74  tokio                           0x55c926eb6730      [inlined] tokio::runtime::task::Notified<T>::run (mod.rs:166)
75  tokio                           0x55c926eb6730      [inlined] tokio::runtime::thread_pool::worker::Context::run_task::{{closure}} (worker.rs:417)
76  tokio                           0x55c926eb6730      [inlined] tokio::coop::with_budget::{{closure}} (coop.rs:106)
77  std                             0x55c926eb6730      [inlined] std::thread::local::LocalKey<T>::try_with (local.rs:376)
78  std                             0x55c926eb6730      std::thread::local::LocalKey<T>::with (local.rs:352)
79  tokio                           0x55c926ec216a      [inlined] tokio::coop::with_budget (coop.rs:99)
80  tokio                           0x55c926ec216a      [inlined] tokio::coop::budget (coop.rs:76)
81  tokio                           0x55c926ec216a      tokio::runtime::thread_pool::worker::Context::run_task (worker.rs:395)
82  tokio                           0x55c926ec1273      tokio::runtime::thread_pool::worker::Context::run (worker.rs:365)
83  tokio                           0x55c926eacb83      [inlined] tokio::runtime::thread_pool::worker::run::{{closure}} (worker.rs:350)
84  tokio                           0x55c926eacb83      tokio::macros::scoped_tls::ScopedKey<T>::set (scoped_tls.rs:61)
85  tokio                           0x55c926ec09f2      tokio::runtime::thread_pool::worker::run (worker.rs:347)
86  tokio                           0x55c926eaf352      [inlined] tokio::runtime::thread_pool::worker::Launch::launch::{{closure}} (worker.rs:326)
87  tokio                           0x55c926eaf352      [inlined] tokio::runtime::blocking::task::BlockingTask<T>::poll (task.rs:42)
88  tokio                           0x55c926eaf352      [inlined] tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} (core.rs:147)
89  tokio                           0x55c926eaf352      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut (unsafe_cell.rs:14)
90  tokio                           0x55c926eb5390      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:137)
91  tokio                           0x55c926eb5390      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:437)
92  std                             0x55c926eb5390      [inlined] std::panic::AssertUnwindSafe<T>::call_once (panic.rs:344)
93  std                             0x55c926eb5390      [inlined] std::panicking::try::do_call (panicking.rs:379)
94  std                             0x55c926eb5390      [inlined] std::panicking::try (panicking.rs:343)
95  std                             0x55c926eb5390      [inlined] std::panic::catch_unwind (panic.rs:431)
96  tokio                           0x55c926eb5390      [inlined] tokio::runtime::task::harness::poll_future (harness.rs:424)
97  tokio                           0x55c926eb5390      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:89)
98  tokio                           0x55c926eb5390      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:59)
99  tokio                           0x55c926e9df8a      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
100 tokio                           0x55c926e9df8a      [inlined] tokio::runtime::task::Notified<T>::run (mod.rs:166)
101 tokio                           0x55c926e9df8a      [inlined] tokio::runtime::blocking::pool::Inner::run (pool.rs:265)
102 tokio                           0x55c926e9df8a      [inlined] tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} (pool.rs:245)
103 std                             0x55c926e9df8a      std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
104 std                             0x55c926ea8e08      [inlined] std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} (mod.rs:481)
105 std                             0x55c926ea8e08      [inlined] std::panic::AssertUnwindSafe<T>::call_once (panic.rs:344)
106 std                             0x55c926ea8e08      [inlined] std::panicking::try::do_call (panicking.rs:379)
107 std                             0x55c926ea8e08      [inlined] std::panicking::try (panicking.rs:343)
108 std                             0x55c926ea8e08      [inlined] std::panic::catch_unwind (panic.rs:431)
109 std                             0x55c926ea8e08      [inlined] std::thread::Builder::spawn_unchecked::{{closure}} (mod.rs:480)
110 core                            0x55c926ea8e08      core::ops::function::FnOnce::call_once{{vtable.shim}} (function.rs:227)
111 alloc                           0x55c926f7fa6a      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1546)
112 alloc                           0x55c926f7fa6a      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1546)
113 std                             0x55c926f7fa6a      std::sys::unix::thread::Thread::new::thread_start (thread.rs:71)
114 <unknown>                       0x7ff22e5a66db      start_thread
115 <unknown>                       0x7ff22d90da3f      __clone
116 <unknown>                       0x0                 <unknown>

Here's the list of upgraded depencies:

    Updating addr2line v0.15.2 -> v0.16.0
    Updating backtrace v0.3.60 -> v0.3.61
    Updating const_format v0.2.14 -> v0.2.15
    Updating enum-iterator v0.6.0 -> v0.7.0
    Updating enum-iterator-derive v0.6.0 -> v0.7.0
    Updating futures v0.3.15 -> v0.3.16
    Updating futures-channel v0.3.15 -> v0.3.16
    Updating futures-core v0.3.15 -> v0.3.16
    Updating futures-executor v0.3.15 -> v0.3.16
    Updating futures-io v0.3.15 -> v0.3.16
    Updating futures-macro v0.3.15 -> v0.3.16
    Updating futures-sink v0.3.15 -> v0.3.16
    Updating futures-task v0.3.15 -> v0.3.16
    Updating futures-util v0.3.15 -> v0.3.16
    Updating gimli v0.24.0 -> v0.25.0
    Updating object v0.25.3 -> v0.26.0
    Updating ordered-float v2.6.0 -> v2.7.0
    Updating pin-project v1.0.7 -> v1.0.8
    Updating pin-project-internal v1.0.7 -> v1.0.8
    Updating proc-macro2 v1.0.27 -> v1.0.28
    Updating sha-1 v0.9.6 -> v0.9.7
    Updating siphasher v0.3.5 -> v0.3.6
    Updating syn v1.0.73 -> v1.0.74
    Updating tinyvec v1.2.0 -> v1.3.1
    Updating tokio v1.8.2 -> v1.9.0
@hawkw
Copy link
Member

hawkw commented Aug 1, 2021

Thanks for the report!

Interestingly, I notice that none of the tracing-related dependencies changed in the update. My suspicion is that something that changed in Tokio 1.9 is breaking sharded-slab's thread ID code, but i'm not sure how OTTOMH. Will have to do some digging...

@hawkw hawkw self-assigned this Aug 1, 2021
@hawkw hawkw added kind/bug Something isn't working crate/subscriber Related to the `tracing-subscriber` crate labels Aug 1, 2021
@hawkw
Copy link
Member

hawkw commented Aug 1, 2021

@jeromegn a couple questions: is your code spawning a very large number of threads? are you using Tokio's blocking pool?

@hawkw
Copy link
Member

hawkw commented Aug 1, 2021

the panic at that location suggests that you have 4134 threads currently active (or that thread IDs are not being correctly reused?)

@hawkw
Copy link
Member

hawkw commented Aug 1, 2021

If you can, try cargo update -p sharded-slab (you should get version 0.1.2) and let me know what panic you get? It will still panic but it may have more useful information in the panic message.

@hawkw
Copy link
Member

hawkw commented Aug 2, 2021

Per a Discord conversation, it seems like @jeromegn's app spawns a lot of threads (separate Tokio runtimes).

Normally, the app doesn't have more than 200 threads concurrently alive, which shouldn't hit the 4096 thread limit in the sharded-slab config, since the IDs assigned to terminated threads are recycled. I thought this might be a bug in the thread-ID recycling code in sharded-slab. However, after talking to Jerome, it seems like something else (possibly the Tokio upgrade) may be causing the app to start leaking unused threads.

Right now, it seems like the best solution here is:

  • figure out what's causing the thread leak in the application
  • consider giving each Tokio runtime its own tracing subscriber, rather than sharing a single global subscriber.

I do think sharded-slab could handle this better. We could potentially reserve the last thread ID (in this case, 4096) for a special shard that allows concurrent mutation by multiple threads, with a mutex, and use that as a fallback when the thread ID cap is reached. That way, rather than panicking, we'd degrade to contending a mutex for threads over the max, which might be a better solution. However, that would be a bit of work to implement...but I'll open a ticket for it upstream.

I'm closing this issue since there's not really anything to fix in tracing.

@arssher
Copy link

arssher commented Dec 5, 2022

consider giving each Tokio runtime its own tracing subscriber, rather than sharing a single global subscriber.

FWIW that won't work as all threads get thread ids from single REGISTRY in sharded-slab's tid.rs, even if they use different tracing subscribers. So currently tracing-subscriber is basically unusable for apps with a lot of threads.

arssher added a commit to neondatabase/tracing that referenced this issue Dec 19, 2022
Adds CollectionBuilder with_slab_config::<S>() func to this end. This e.g.
allows to use tracing in environment with >= 4k threads.

c.f. tokio-rs#1485
arssher added a commit to neondatabase/tracing that referenced this issue Dec 29, 2022
Adds CollectionBuilder with_slab_config::<S>() func to this end. This e.g.
allows to use tracing in environment with >= 4k threads.

c.f. tokio-rs#1485
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/subscriber Related to the `tracing-subscriber` crate kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants