Enrich the structure of traces emitted by Tokio #5792

hds · 2023-06-13T12:34:55Z

Is your feature request related to a problem? Please describe.
The traces produced by Tokio are great, however there are some areas in which the hierarchy in which the traces are emitted could be improved or doesn't make sense.

Specific issues are:

No division by runtime (it isn't possible to distinguish which tasks belong to which runtime when there are multiple runtimes)
- This would enable new Console features like First class notion of runtimes console#130, warnings: indication that there are tasks on the runtime but the runtime itself is not running console#229
- Would be nice to have externally available runtime ID and name too (Runtime identifier and name #5545).
There is no information about runtimes at all
- how many runtimes are there?
- what configuration do they have?
Tasks spawned within the context of another task take that task as a parent
- This causes some issues because the parent task's span won't close until the child hasn't, which isn't what is actually happening in Tokio (issues in Console: completed task shows warning for having lost it's waker console#345, Spans unexpectedly remain open within spawned tasks console#412)
Work stealing isn't visible (there is no concept of workers in the traces)

Describe the solution you'd like

I would like to add some additional traces to Tokio and modify others.

The proposed hierarchy is the following. Sub-items have their span's parent set. Values in parentheses are related fields which would be set. Note that all spans have a span ID implicitly, these are used for internal referencing.

Runtime [0] (kind=current_thread, runtime.id=1, runtime.name="my runtime")
- Worker [0] (name=my-worker-0)
- Worker [1] …
- Blocking thread [0]
- Blocking thread [1] …
- Task [0] (task.id=1)
  - Async Op [1] (resource.span.id=0, worker.span.id=1)
- Task [1] (follows_from: Task [0], task.id=2)
- Task [2] (task.id=3)
  - Async Op [5] (resource.id=0, worker.id=0)
Runtime [1] …
Resource [0] (creator.task.span.id=4, creator.runtime.span.id=0)
Resource [1] ...

Further explanation of the structure:

Runtimes are top level (root) spans.
Workers belong to runtimes
Blocking threads belong to runtimes
1. If a blocking thread is destroyed and another is created, it will have a different span (even if the id is the same)
Tasks belong to runtimes
1. Tasks don’t have parent tasks, but they follow_from the task that spawned them (if any)
2. This is a breaking change
Resources are top level (root) spans
1. They may be shared between runtimes
2. They reference the runtime (if any) and task (if any) that created the resource
Async ops belong to tasks (the task that executes the async op)
1. They reference the resource that the async op belongs to
2. They reference the worker id where they are executed

Ideally all these changes, or at least all the breaking ones would make it into the same Tokio release. Preferably with the addition of spans for Runtimes. This would allow Tokio Console to determine the version tokio tracing variant being used and read the traces accordingly.

Describe alternatives you've considered

Parts of this hierarchy can be implemented without the rest. But it makes sense to discuss the entirety.

Additional context

This is mostly considering being consumed by Tokio Console. However it would be interesting to hear from anyone who is using tokio traces in other ways.

The text was updated successfully, but these errors were encountered:

Darksonn · 2023-06-13T14:38:06Z

This sounds reasonable to me. You should probably coordinate with @carllerche's scheduler refactor, which may change the same parts of the codebase.

There are a number of cases in which being able to identify a runtime is useful. When instrumenting an application, this is particularly true. For example, we would like to be able to add traces for runtimes so that tasks can be differentiated (#5792). It would also allow a way to differentiate runtimes which are have their tasks dumped. Outside of instrumentation, it may be useful to check whether 2 runtime handles are pointing to the same runtime. This change adds an opaque `runtime::Id` struct which serves this purpose, initially behind the `tokio_unstable` cfg flag. It follows the same pattern as the `task::Id` struct. The Id can be compared for equality with another `runtime::Id` and implements `Debug` and `Display` so that it can be output as well. Internally the Id is a `u64`, but that is an implementation detail. There is a degree of code duplication, but that is necessary to ensure that the Ids are not used to compare against one another. The Id is added within the scope of working towards closing #5545.

There are a number of cases in which being able to identify a runtime is useful. When instrumenting an application, this is particularly true. For example, we would like to be able to add traces for runtimes so that tasks can be differentiated (#5792). It would also allow a way to differentiate runtimes which are have their tasks dumped. Outside of instrumentation, it may be useful to check whether 2 runtime handles are pointing to the same runtime. This change adds an opaque `runtime::Id` struct which serves this purpose, initially behind the `tokio_unstable` cfg flag. The inner value of the ID is taken from the `OwnedTasks` or `LocalOwnedTasks` struct which every runtime and local set already has. This will mean that any use of the ID will align with the task dump feature. The ID is added within the scope of working towards closing #5545.

In the Tokio instrumentation, a tracing span is created for each task which is spawned. Since the new span is created within the context of where `tokio::spawn()` (or similar) is called from, it gets a contextual parent attached. In tracing, when a span has a child span (either because the child was created in the context of the parent, or because the parent was set explicitly) then that span will not be closed until the child has closed. The result in the console subscriber is that a task which spawns another task won't have a `dropped_at` time set until the spawned task exits, even if the parent task exits much earlier. This causes Tokio Console to show an incorrect lost waker warning (#345). It also affects other spans that are entered when a task is spawned (#412). The solution is to modify the instrumentation in Tokio so that task spans are explicit roots (`parent: None`). This will be done as part of enriching the Tokio instrumentation (tokio-rs/tokio#5792). This change adds functionality to the test framework within `console-subscriber` so that the state of a task can be set as an expectation. The state is calculated based on 4 values: * `console_api::tasks::Stats::dropped_at` * `console_api::tasks::Stats::last_wake` * `console_api::PollStats::last_poll_started` * `console_api::PollStats::last_poll_ended` It can then be tested that a task that spawns another task and then ends actually goes to the `Completed` state, even if the spawned task is still running. As of Tokio 1.33.0, this test fails, but the PR FIXME:TBD fixes this and the test should pass from Tokio 1.34 onwards.

In Tokio, tasks are optionally instrumented with tracing spans to allow analysis of the runtime behavior to be performed with tools like tokio-console. The span that is created for each task gets currently follows the default tracing behavior and has a contextual parent attached to it based on the span that is actual when `tokio::spawn` or similar is called. However, in tracing, a span will remain "alive" until all its children spans are closed. This doesn't match how spawned tasks work. A task may outlive the context in which is was spawned (and frequently does). This causes tasks which spawn other - longer living - tasks to appear in `tokio-console` as having lost their waker when instead they should be shown as completed (tokio-rs/console#345). It can also cause undesired behavior for unrelated tracing spans if a subscriber is receiving both the other spans as well as Tokio's instrumentation. To fix this mismatch in behavior, the task span has `parent: None` set on it, making it an explicit root - it has no parent. The same was already done for all spans representing resources in #6107. This change is made within the scope of #5792. Due to a defect in the currently available `tracing-mock` crate, it is not possible to test this change at a tracing level (tokio-rs/tracing#2440). Instead, a test for the `console-subscriber` has been written which shows that this change fixes the defect as observed in `tokio-console` (tokio-rs/console#490).

In the Tokio instrumentation, a tracing span is created for each task which is spawned. Since the new span is created within the context of where `tokio::spawn()` (or similar) is called from, it gets a contextual parent attached. In tracing, when a span has a child span (either because the child was created in the context of the parent, or because the parent was set explicitly) then that span will not be closed until the child has closed. The result in the console subscriber is that a task which spawns another task won't have a `dropped_at` time set until the spawned task exits, even if the parent task exits much earlier. This causes Tokio Console to show an incorrect lost waker warning (#345). It also affects other spans that are entered when a task is spawned (#412). The solution is to modify the instrumentation in Tokio so that task spans are explicit roots (`parent: None`). This will be done as part of enriching the Tokio instrumentation (tokio-rs/tokio#5792). This change adds functionality to the test framework within `console-subscriber` so that the state of a task can be set as an expectation. The state is calculated based on 4 values: * `console_api::tasks::Stats::dropped_at` * `console_api::tasks::Stats::last_wake` * `console_api::PollStats::last_poll_started` * `console_api::PollStats::last_poll_ended` It can then be tested that a task that spawns another task and then ends actually goes to the `Completed` state, even if the spawned task is still running. As of Tokio 1.34.0, this test fails, but the PR tokio-rs/tokio#XXXX fixes this and the test should pass from Tokio 1.35 onwards.

In the Tokio instrumentation, a tracing span is created for each task which is spawned. Since the new span is created within the context of where `tokio::spawn()` (or similar) is called from, it gets a contextual parent attached. In tracing, when a span has a child span (either because the child was created in the context of the parent, or because the parent was set explicitly) then that span will not be closed until the child has closed. The result in the console subscriber is that a task which spawns another task won't have a `dropped_at` time set until the spawned task exits, even if the parent task exits much earlier. This causes Tokio Console to show an incorrect lost waker warning (#345). It also affects other spans that are entered when a task is spawned (#412). The solution is to modify the instrumentation in Tokio so that task spans are explicit roots (`parent: None`). This will be done as part of enriching the Tokio instrumentation (tokio-rs/tokio#5792). This change adds functionality to the test framework within `console-subscriber` so that the state of a task can be set as an expectation. The state is calculated based on 4 values: * `console_api::tasks::Stats::dropped_at` * `console_api::tasks::Stats::last_wake` * `console_api::PollStats::last_poll_started` * `console_api::PollStats::last_poll_ended` It can then be tested that a task that spawns another task and then ends actually goes to the `Completed` state, even if the spawned task is still running. As of Tokio 1.34.0, this test fails, but the PR tokio-rs/tokio#6158 fixes this and the test should pass from Tokio 1.35 onwards.

In Tokio, tasks are optionally instrumented with tracing spans to allow analysis of the runtime behavior to be performed with tools like tokio-console. The span that is created for each task gets currently follows the default tracing behavior and has a contextual parent attached to it based on the span that is actual when `tokio::spawn` or similar is called. However, in tracing, a span will remain "alive" until all its children spans are closed. This doesn't match how spawned tasks work. A task may outlive the context in which is was spawned (and frequently does). This causes tasks which spawn other - longer living - tasks to appear in `tokio-console` as having lost their waker when instead they should be shown as completed (tokio-rs/console#345). It can also cause undesired behavior for unrelated tracing spans if a subscriber is receiving both the other spans as well as Tokio's instrumentation. To fix this mismatch in behavior, the task span has `parent: None` set on it, making it an explicit root - it has no parent. The same was already done for all spans representing resources in #6107. This change is made within the scope of #5792. Due to a defect in the currently available `tracing-mock` crate, it is not possible to test this change at a tracing level (tokio-rs/tracing#2440). Instead, a test for the `console-subscriber` has been written which shows that this change fixes the defect as observed in `tokio-console` (tokio-rs/console#490).

qRoC · 2024-02-20T14:32:58Z

A task may outlive the context in which is was spawned (and frequently does).

@hds What about block_on?

This runs the given future on the current thread, blocking until it is complete

Looks like him span should have parent.

hds · 2024-02-20T14:36:31Z

A task may outlive the context in which is was spawned (and frequently does).

@hds What about block_on?

This runs the given future on the current thread, blocking until it is complete

Looks like him span should have parent.

This is something we should consider. We've taken it out for now, but a subscriber could still get the context for that span and store it if it needed to.

hds added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jun 13, 2023

hawkw self-assigned this Jun 13, 2023

Darksonn added the M-tracing Tracing support in Tokio label Jun 13, 2023

hds mentioned this issue Jul 3, 2023

fix(subscriber): do not report excessive polling (#378) tokio-rs/console#440

Merged

hds mentioned this issue Jul 13, 2023

rt: add runtime Id #5864

Merged

davidbarsky mentioned this issue Oct 25, 2023

sync: avoid creating mutex and rwlock resource_span with parent, use None parent instead #6107

Merged

hds mentioned this issue Nov 17, 2023

test(subscriber): add test for tasks being kept open tokio-rs/console#490

Draft

hds mentioned this issue Nov 18, 2023

task: make task span explicit root #6158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich the structure of traces emitted by Tokio #5792

Enrich the structure of traces emitted by Tokio #5792

hds commented Jun 13, 2023

Darksonn commented Jun 13, 2023

qRoC commented Feb 20, 2024 •

edited

Loading

hds commented Feb 20, 2024

Enrich the structure of traces emitted by Tokio #5792

Enrich the structure of traces emitted by Tokio #5792

Comments

hds commented Jun 13, 2023

Darksonn commented Jun 13, 2023

qRoC commented Feb 20, 2024 • edited Loading

hds commented Feb 20, 2024

qRoC commented Feb 20, 2024 •

edited

Loading