Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich the structure of traces emitted by Tokio #5792

Open
hds opened this issue Jun 13, 2023 · 3 comments
Open

Enrich the structure of traces emitted by Tokio #5792

hds opened this issue Jun 13, 2023 · 3 comments
Assignees
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-tracing Tracing support in Tokio

Comments

@hds
Copy link
Contributor

hds commented Jun 13, 2023

Is your feature request related to a problem? Please describe.
The traces produced by Tokio are great, however there are some areas in which the hierarchy in which the traces are emitted could be improved or doesn't make sense.

Specific issues are:

Describe the solution you'd like

I would like to add some additional traces to Tokio and modify others.

The proposed hierarchy is the following. Sub-items have their span's parent set. Values in parentheses are related fields which would be set. Note that all spans have a span ID implicitly, these are used for internal referencing.

  • Runtime [0] (kind=current_thread, runtime.id=1, runtime.name="my runtime")
    • Worker [0] (name=my-worker-0)
    • Worker [1] …
    • Blocking thread [0]
    • Blocking thread [1] …
    • Task [0] (task.id=1)
      • Async Op [1] (resource.span.id=0, worker.span.id=1)
    • Task [1] (follows_from: Task [0], task.id=2)
    • Task [2] (task.id=3)
      • Async Op [5] (resource.id=0, worker.id=0)
  • Runtime [1] …
  • Resource [0] (creator.task.span.id=4, creator.runtime.span.id=0)
  • Resource [1] ...

Further explanation of the structure:

  1. Runtimes are top level (root) spans.
  2. Workers belong to runtimes
  3. Blocking threads belong to runtimes
    1. If a blocking thread is destroyed and another is created, it will have a different span (even if the id is the same)
  4. Tasks belong to runtimes
    1. Tasks don’t have parent tasks, but they follow_from the task that spawned them (if any)
    2. This is a breaking change
  5. Resources are top level (root) spans
    1. They may be shared between runtimes
    2. They reference the runtime (if any) and task (if any) that created the resource
  6. Async ops belong to tasks (the task that executes the async op)
    1. They reference the resource that the async op belongs to
    2. They reference the worker id where they are executed

Ideally all these changes, or at least all the breaking ones would make it into the same Tokio release. Preferably with the addition of spans for Runtimes. This would allow Tokio Console to determine the version tokio tracing variant being used and read the traces accordingly.

Describe alternatives you've considered

Parts of this hierarchy can be implemented without the rest. But it makes sense to discuss the entirety.

Additional context

This is mostly considering being consumed by Tokio Console. However it would be interesting to hear from anyone who is using tokio traces in other ways.

@hds hds added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jun 13, 2023
@hawkw hawkw self-assigned this Jun 13, 2023
@Darksonn Darksonn added the M-tracing Tracing support in Tokio label Jun 13, 2023
@Darksonn
Copy link
Contributor

This sounds reasonable to me. You should probably coordinate with @carllerche's scheduler refactor, which may change the same parts of the codebase.

hds added a commit that referenced this issue Jul 13, 2023
There are a number of cases in which being able to identify a runtime is
useful.

When instrumenting an application, this is particularly true. For
example, we would like to be able to add traces for runtimes so that
tasks can be differentiated (#5792). It would also allow a way to
differentiate runtimes which are have their tasks dumped.

Outside of instrumentation, it may be useful to check whether 2 runtime
handles are pointing to the same runtime.

This change adds an opaque `runtime::Id` struct which serves this
purpose, initially behind the `tokio_unstable` cfg flag. It follows the
same pattern as the `task::Id` struct. The Id can be compared for
equality with another `runtime::Id` and implements `Debug` and `Display`
so that it can be output as well.

Internally the Id is a `u64`, but that is an implementation detail.

There is a degree of code duplication, but that is necessary to ensure
that the Ids are not used to compare against one another.

The Id is added within the scope of working towards closing #5545.
hds added a commit that referenced this issue Jul 13, 2023
There are a number of cases in which being able to identify a runtime is
useful.

When instrumenting an application, this is particularly true. For
example, we would like to be able to add traces for runtimes so that
tasks can be differentiated (#5792). It would also allow a way to
differentiate runtimes which are have their tasks dumped.

Outside of instrumentation, it may be useful to check whether 2 runtime
handles are pointing to the same runtime.

This change adds an opaque `runtime::Id` struct which serves this
purpose, initially behind the `tokio_unstable` cfg flag. It follows the
same pattern as the `task::Id` struct. The Id can be compared for
equality with another `runtime::Id` and implements `Debug` and `Display`
so that it can be output as well.

Internally the Id is a `u64`, but that is an implementation detail.

There is a degree of code duplication, but that is necessary to ensure
that the Ids are not used to compare against one another.

The Id is added within the scope of working towards closing #5545.
hds added a commit that referenced this issue Jul 13, 2023
There are a number of cases in which being able to identify a runtime is
useful.

When instrumenting an application, this is particularly true. For
example, we would like to be able to add traces for runtimes so that
tasks can be differentiated (#5792). It would also allow a way to
differentiate runtimes which are have their tasks dumped.

Outside of instrumentation, it may be useful to check whether 2 runtime
handles are pointing to the same runtime.

This change adds an opaque `runtime::Id` struct which serves this
purpose, initially behind the `tokio_unstable` cfg flag. It follows the
same pattern as the `task::Id` struct. The Id can be compared for
equality with another `runtime::Id` and implements `Debug` and `Display`
so that it can be output as well.

Internally the Id is a `u64`, but that is an implementation detail.

There is a degree of code duplication, but that is necessary to ensure
that the Ids are not used to compare against one another.

The Id is added within the scope of working towards closing #5545.
hds added a commit that referenced this issue Jul 19, 2023
There are a number of cases in which being able to identify a runtime is
useful.

When instrumenting an application, this is particularly true. For
example, we would like to be able to add traces for runtimes so that
tasks can be differentiated (#5792). It would also allow a way to
differentiate runtimes which are have their tasks dumped.

Outside of instrumentation, it may be useful to check whether 2 runtime
handles are pointing to the same runtime.

This change adds an opaque `runtime::Id` struct which serves this
purpose, initially behind the `tokio_unstable` cfg flag. 

The inner value of the ID is taken from the `OwnedTasks` or
`LocalOwnedTasks` struct which every runtime and local set already
has. This will mean that any use of the ID will align with the task
dump feature.

The ID is added within the scope of working towards closing #5545.
hds added a commit to tokio-rs/console that referenced this issue Nov 17, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.33.0, this test fails, but the PR FIXME:TBD
fixes this and the test should pass from Tokio 1.34 onwards.
hds added a commit that referenced this issue Nov 18, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
hds added a commit to tokio-rs/console that referenced this issue Nov 18, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.34.0, this test fails, but the PR
tokio-rs/tokio#XXXX fixes this and the test should pass from Tokio 1.35
onwards.
hds added a commit to tokio-rs/console that referenced this issue Nov 18, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.34.0, this test fails, but the PR
tokio-rs/tokio#6158 fixes this and the test should pass from Tokio 1.35
onwards.
hds added a commit that referenced this issue Nov 19, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
@qRoC
Copy link

qRoC commented Feb 20, 2024

A task may outlive the context in which is was spawned (and frequently does).

@hds What about block_on?

This runs the given future on the current thread, blocking until it is complete

Looks like him span should have parent.

@hds
Copy link
Contributor Author

hds commented Feb 20, 2024

A task may outlive the context in which is was spawned (and frequently does).

@hds What about block_on?

This runs the given future on the current thread, blocking until it is complete

Looks like him span should have parent.

This is something we should consider. We've taken it out for now, but a subscriber could still get the context for that span and store it if it needed to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-tracing Tracing support in Tokio
Projects
None yet
Development

No branches or pull requests

4 participants