Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio Util LongRunningTaskDetector. (https://github.com/tokio-rs/console/issues/150) #6256

Closed
wants to merge 26 commits into from

Conversation

zolyfarkas
Copy link

@zolyfarkas zolyfarkas commented Dec 30, 2023

Motivation

The motivation for this pull request is to provide a solution for the problem described at: tokio-rs/console#150

Blocking calls in the async workers lead to scalability issues, being able to detect them so that code can be fixed is very important for the scalability of an async-io service.

Solution

LongRunningTaskDetector uses a sampling black box approach to detect when processing in the tokio runtime in blocked for more than a configurable amount.

When blocking is detected the stack traces of the runtime worker threads can be dumped (action is plugable), to allow us to identify what is happening in the worker threads.

On detection, details can contain precise info about the blocking culprit and will look like:

Detected worker blocking, signaling SIGUSR1 worker threads: [123145546047488]
Stack trace for thread "test_blocking_detection_current":123145546047488
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/backtrace.rs:331:13
   3: std::backtrace::Backtrace::force_capture
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/backtrace.rs:313:9
   4: tokio_util::lrtd::lrtd::signal_handler
             at ./src/lrtd/lrtd.rs:221:21
   5: __sigtramp
   6: ___semwait_signal
   7: <unknown>
   8: std::sys::unix::thread::Thread::sleep
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys/unix/thread.rs:241:20
   9: std::thread::sleep
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/thread/mod.rs:872:5
  10: lrtd::run_blocking_stuff::{{closure}}
             at ./tests/lrtd.rs:12:5
  11: lrtd::test_blocking_detection_current::{{closure}}
             at ./tests/lrtd.rs:57:30
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/future/future.rs:125:9
  13: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
....

This is the output from the unit tests (run_blocking_stuff).

Not sure I put this in the right place, and there is likely some polishing left to do, but this should be good enough for some initial feedback.

@Darksonn Darksonn added A-tokio-util Area: The tokio-util crate M-taskdump --cfg tokio_taskdump labels Dec 30, 2023
@Darksonn
Copy link
Contributor

Before you implement a significant large new feature, it is really important that you talk to us first. This is crucial to ensure that you don't waste a lot of time on something that can't be merged.

The proposal here appears to have significant overlap with the taskdump proposal. However, it is special-cased to also include a detection mechanism for blocking tasks, and to trigger the task dump only when that detection triggers. I think that for something like this, it makes more sense to separate the task dump functionality from the detection of blocking tasks, so that the task dumps can be used in other cases as well.

The specific way that this is implemented will not work for the reasons that @sfackler points out. However, when @jswrenn investigated the different ways that task dumps can be implemented, he outlined one approach that is similar to this. However, for the reasons outlined in this document, we used a different implementation in the end. That said, I think it makes sense to add something similar to this in the future to handle "very stuck tasks", as is necessary for getting stack traces into tasks that are blocking the thread like you are doing here. Though, I'm not sure if we are ready to do that right now. At least, there are other improvements in the taskdump area that would be higher priority to me.

You may want to open a thread on our discord channel, or a new issue, for discussion of the best way forward here.

@zolyfarkas
Copy link
Author

Thanks for your feedback, and for the extra info.

This is a "RFC PR" of some fun I had on a flight today. Even if this will not be merged, I would not consider it a waste of time :-).

I agree with separating the thread stack dump from the detection mechanism. I was not able to quickly find a good way to dump stack traces in rust for a bunch of threads, but if anyone can point me to a solution I would be happy to update the implementation. (or merge a PR against this PR)

@Darksonn
Copy link
Contributor

This is the proposal that @jswrenn suggested for that:

Worker Pausing

Finally, I will implement the ability to trace currently-polling tasks. To do this, I will introduce a mechanism for pausing Tokio's worker threads, Handle::in_pause, that consumes a closure and runs it while Tokio's worker threads are paused. I have thought of two possible mechanisms for achieving this.

With ptrace

Linux's ptrace API provides a simple and reliable means for pausing a process's threads.

Upon invoking Handle::in_pause, Tokio will spawn-and-join a thread that:

  1. locks the Shared structure at handle.blocking_spawner.inner.shared
  2. forks the process
    • child:
      • uses ptrace to pause every PID in shared.worker_threads
      • notifies parent that all threads are paused
      • waits to receive notification from parent that tracing is complete
      • terminates
    • parent:
      • waits for child's notification that all workers are paused
      • traces the internal state all workers
      • notifies the child that tracing is complete
      • waits for the child to terminate
      • returns the trace

Special configuration during deployment is required for the ptrace API to be available. You must do either of the following:

  • globally allow applications to ptrace their parents by writing 1 to /proc/sys/kernel/yama/ptrace_scope
  • set the CAP_SYS_PTRACE capability on your application's binary, to allow it to ptrace arbitrary processes

A prototype implementation of this approach is viewable here: commit b999c71

With Signal Interrupts

Alternatively, we can pause Tokio's workers by registering a signal handler that blocks until tracing is complete. Upon invoking Handle::in_pause, Tokio will spawn-and-join a thread that sends some particular signal to each of Tokio's worker threads, traces the runtime, then notifies the signal handlers to unblock and return.

The primary downside of this approach is the risk of choosing a signal kind that's already handled by the user's application. Special care will need to be taken to avoid deadlocks, too.

This was taken from the latest edit on Feb 15 of the top-level post in #5457. (It has since been edited.)

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to move forward with the taskdump component of this, then I think that the best way to do that would be to modify the taskdumps in the main Tokio crate to have special handling for the case where there is a very stuck task that isn't returning from poll. In that case, it would use signals or ptrace to get a stacktrace into that task, instead of the usual mechanism for tasks that are well-behaved.

tokio-util/Cargo.toml Outdated Show resolved Hide resolved
tokio-util/Cargo.toml Outdated Show resolved Hide resolved
@Darksonn
Copy link
Contributor

Darksonn commented Jan 1, 2024

It looks like you moved the long-running task detector into the Tokio crate, and removed the code for capturing taskdumps of blocking tasks.

My previous comment was mostly referring to taskdumps of blocking tasks. I don't think the long-running task detector is a good fit for the main Tokio crate. We generally try to keep debugging features outside of Tokio when that is possible, and in this case, that is possible. Perhaps it would be a better fit for tokio-metrics, or an entirely new crate instead?

@zolyfarkas
Copy link
Author

My previous comment was mostly referring to taskdumps of blocking tasks.

I am still experimenting with the task dump code, but based on what I am seeing I am not sure it is practical for LRTD due to the fact that LRTD does detect "very stuck tasks" (and it is important that it does) and from what I see at that is not something supported?

I think a "traditional stack trace" obtained via a productionized version of what I use in one of the unit tests is the way to go. (This can happen at a later stage)

We generally try to keep debugging features outside of Tokio when that is possible, and in this case, that is possible. Perhaps it would be a better fit for tokio-metrics, or an entirely new crate instead?

I agree, lrtd is an independent component, I can easily move it to a separate crate, does not really fit into a "metrics" crate.
One minor issue is that I will not be able to use FastRand then, and I will likely have to copy something like FastRand instead of just referencing it.

@Darksonn
Copy link
Contributor

Darksonn commented Jan 1, 2024

Right, we do not support detection of very stuck tasks right now, and as the article points out, it isn't as critical as you can debug them with gdb. But I do think that it would be an interesting feature to have (with one of the implementations outlined in #6256 (comment)). If nothing else, we probably need to do something more intelligent than just hanging the task dump forever like we do today.

One minor issue is that I will not be able to use FastRand then, and I will likely have to copy something like FastRand instead of just referencing it.

If it's not in the main Tokio crate, then I don't object to the use of the rand crate. Or, of course, you can just copy its implementation.

@zolyfarkas
Copy link
Author

@Darksonn I am thinking of best location for now is tokio-util as a lrtd feature. Not sure it is worth creating a separate crate for this... What do you think?

The implementation allows a custom handler to be invoked on detection, so any future utility/approach can be easily used.

@Darksonn
Copy link
Contributor

Darksonn commented Jan 1, 2024

I think that the reality is that none of our crates have anything similar to your lrtd utility, and there's no great place to put it. I don't think tokio-util is a good fit. It's a crate with utilities for writing applications, not debugging tools.

Your original motivation was to fix tokio-rs/console#150. Perhaps it makes more sense to pursue a lint in tokio-console? A lint in that tool would be much less "ad hoc" and more easily discoverable, since anyone using tokio-console would get it automatically. Furthermore, tokio-console is already my standard recommendation for debugging blocking tasks.

Alternatively, the tokio-metrics crate provides something called task metrics, and it would be useful to have a task metric for whether a specific task is blocking. Of course, I acknowledge that this would require an entirely different implementation, and that you have already spent time on the current one ... but something using a different implementation based on task metrics would be a good fit for tokio-metrics.

As it stands right now, I don't think I can accept the current implementation strategy for detecting long running tasks anywhere in Tokio. If you want to publish it, you will need to make your own crate for it. Therefore, I will close this PR.

If you want to pursue an implementation of capturing taskdumps for blocking tasks, then please create a new PR for that. An implementation of that is welcome and I would be happy to review it.

@Darksonn Darksonn closed this Jan 1, 2024
@zolyfarkas
Copy link
Author

zolyfarkas commented Jan 2, 2024

It's a crate with utilities for writing applications, not debugging tools.

I will provide some extra context here on how we have been using LRTD (for python and c++) for the past few years. We are using it continuously in production, and some of the decisions (on top of emitting debug detail) one might take on blocking detection is to report the node unhealthy so that the container can be recycled sooner. The emitted stack trace details enable owners to quickly identify the issue and push out a fix, further enhancing the scalability and responsiveness of their services. The blocking event counts are used to test the health of the canary tier, etc... (I see here your point about tokio-metrics)

Lrtd is designed to have very small(configurable) overhead so that it can be safely run in production, it is a little bit more than a debug tool.

@Darksonn Right now I am thinking tokio-metrics, let me know what you think...

zolyfarkas added a commit to zolyfarkas/tokio-metrics that referenced this pull request Jan 2, 2024
@zolyfarkas
Copy link
Author

Will continue this in: tokio-rs/tokio-metrics#57

@carllerche
Copy link
Member

Thanks for the proof of concept. I do think solving the "detect blocking tasks" feature would be nice to get.

I looked through this, and if I understand correctly, it will detect if all runtime threads are 100% blocked, but if a single thread is blocked, it won't. It also doesn't detect tasks that "block" for short periods (say 100ms). Ideally, we will develop a strategy that handles both of these cases, and I believe it will require work in the Tokio runtime itself.

At a high level, an epoch-based strategy will work best. Each worker thread has a monotonically increasing number that represents its epoch. The worker thread will increment its epoch every so often 1. There will then be a "monitoring" thread that periodically checks that each worker's epoch increases regularly. This strategy would at least detect that there are tasks with slow poll times with minimal overhead. Identifying the exact ones that are slow would require some additional work.

However, your utility to detect a blocked runtime is generally useful. It would be better to release it as your crate instead of pulling it in tokio-util or tokio-metrics, as we have limited maintenance bandwidth and want a more comprehensive solution in the long term. If you'd like to work on the runtime to detect slow polls, please try pinging me on Discord, and we can talk more about that.

Footnotes

  1. probably some combination of every N polled tasks & checking elapsed duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio-util Area: The tokio-util crate M-taskdump --cfg tokio_taskdump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants