rfc: Runtime stats #3845

LucioFranco · 2021-06-07T14:32:55Z

This RFC proposes the low level stats implementation within tokio to be used by metrics aggregators/collectors to expose within dashboards such as grafana, etc. This low level stats will be the foundation for tokio's future runtime observability goals and do not present a complete story since they will mostly be raw values that are unaggregated.

Darksonn · 2021-06-07T14:45:43Z

tokio/docs/runtime-stats.md

+
+### I/O and Timer implementations
+
+Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor.


When we are using a lot of atomics in this manner, we should be careful regarding false sharing of the atomics.

Can you expand on what you mean by false sharing?

If you have a bunch of atomic variables stored together (in the same cache line), with many threads writing to them concurrently, then this can impact performance quite a lot, even if the writes are affecting two different counters.

In this case the issue is true sharing (contention on atomics), so padding stuff out won't solve it either.

How is it a case of true sharing? My read is that there is a single driver and one thread polls it. It can store stats in an atomic and an arbitrary number of stats aggregators can load it.

Contention happens when there are concurrent mutations, which is not (afaik) the case here.

Darksonn · 2021-06-07T14:47:02Z

tokio/docs/runtime-stats.md

+
+To avoid any extra overhead in the executor loop, each worker will batch metrics into a `Core` local struct. These values will be incremented or sampled during regular executor cycles when certain operations happen like a work steal attempt or a pop from one of the queues.
+
+The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process.


How is the user supposed to hook up to the stats struct? Read it at regular intervals? Is there a mechanism for being notified when a batch update happens?

I think it is polling, and described in the next paragraph.

tobz

I think most of the technical aspects of the RFC make sense, but tightening up the grammar/structure/flow will strengthen the overall proposal, especially since this will end up as documentation read by users.

tobz · 2021-06-07T14:46:03Z

tokio/docs/runtime-stats.md

+
+### Executor
+
+The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.


nit: structure/flow

Suggested change

The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.

Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregating and merging these per-worker statistics in a way that makes more sense when used from existing telemetry collection systems will be provided by crates like `tokio-metrics`.

To iterate on that:

Suggested change

The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.

Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregated and merged per-worker statistics, which may be more amenable to existing telemetry collection systems, will be provided by crates like `tokio-metrics`.

tobz · 2021-06-07T14:46:51Z

tokio/docs/runtime-stats.md

+
+The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles.
+
+Each worker will expose these stats, updated in batches:


nit: redundancy

You already mentioned in the above paragraph that these are batched.

tobz · 2021-06-07T14:48:28Z

tokio/docs/runtime-stats.md

+
+The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.
+
+The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles.


I think you need to speak more to the time aspect. This will be important to deriving rates from monotonic counters.

In other words, I know what you're driving at by talking about the executor ticking at a predictable interval, but that needs to be made explicit here in order to drive home the point that it's being used, or could be used, as an invariant, specifically because it ties into the staleness guarantees around specific statistics.

I agree. The references to "wall clock time" are also confusing, since wall clock time is, by definition, "real" time and not something that happens in, say, ticks.

tobz · 2021-06-07T14:49:45Z

tokio/docs/runtime-stats.md

+- Amount of executor ticks (loop iterations)
+- Number of `block_in_place` tasks entered
+
+The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing.


nit: structure/flow

I think wording like this could go into the summary/motivation sections.

tobz · 2021-06-07T14:55:30Z

tokio/docs/runtime-stats.md

+
+The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing.
+
+Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.


nit: structure/flow

Suggested change

Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.

Some of these statistics, such as queue depth, include the minimum and maximum value measured during the given observation window. These statistics can rapidly change under heavy load, and this approach provides a middle ground between streaming all measurements/changes (expensive) and potentially not observing the spikes at all.

Is there a particular reason why we choose min/max/avg over, say, percentiles?

I think it's a question of the runtime overhead caused by this. Percentiles will require more work will collecting data, and you might not be able to use atomic counters anymore.

An external aggregator that consumes the stats (e.g. once per second) could still perform aggregation and do percentiles based on the amount of occurrences inside the sampling period.

However an external aggregator won't be able to capture min/max values in case there are peaks inside that sampling period. E.g. if you want to have a metric which is around "maximum tasks polled inside an executor iteration" and "minimum tasks polled", you couldn't get that if you just have counters of

eventloop iterations

tasks polled

I guess for tasks where we find those values useful, it makes sense to add them.

Otherwise it's probably easiest to just add always incrementing counters and let the external application do the diffing and aggregation. You can provide some helpers that allow like:

let mut last_stats = stats.executor(); loop { std::thread::delay(sampling_time); let stats = stats.executor(); let delta_stats = stats.diff(last_stats); my_favorite_metric_system.aggregate_and_emit(delta_stats); // or potentially also the raw stats last_stats = stats;

E.g. we had issues in the past where some metrics that only had been emitted once per minute didn't show BPS spikes that happened inside some seconds and caused excessive packet drops.

@LucioFranco Might be worthwhile to document that kind of periodic sampling system in the "guide" section, since there had been a few questions on how to use the thing.

tobz · 2021-06-07T15:35:18Z

tokio/docs/runtime-stats.md

+
+### I/O and Timer implementations
+
+Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor.


singtons -> singletons

tobz · 2021-06-07T15:36:41Z

tokio/docs/runtime-stats.md

+
+The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process.
+
+This then allows the collector to poll the stats on any interval. Thus, allowing it to drive its own timer to understand an estimate of the duration that a certain amount of ticks too, or how many times the the executor entered the park state.


nit: structure/flow

This sentence has grammatical issues, but I think the bigger problem is that it potentially conflicts with the idea that the executor ticks on a predictable interval. Why would we need to track the duration vs ticks ratio ourselves?

carllerche · 2021-06-07T17:31:53Z

tokio/docs/runtime-stats.md

+
+### `tokio-metrics`
+
+The `tokio-metrics` crate will provide aggregated metrics based on the `Stats` struct. This will include histograms and other useful aggregated forms of the stats that could be emitted by various metrics implementations. This crate is designed to provide the ability to expose the aggregated stats in an unstable `0.1` method outside of the runtime and allow the ability to iterate on how they are aggregated without the need to follow `tokio`'s strict versioning scheme.


It would help to expand on this. What specific aggregations will tokio-metrics expose. How do you expect this to be used in practice? What will trigger alerts, how are engineers expected to use the aggregations in their workflow, etc...

bdonlan · 2021-06-07T17:32:23Z

tokio/docs/runtime-stats.md

+- Min local queue depth
+- Avg local queue depth
+- Queue depth at time of batch emission
+- Amount of executor ticks (loop iterations)


If my understanding is correct, you intend to provide information about how long executors spend busy in tasks by allowing collectors to observe the number of ticks which occur in a known time interval. However, I think you also need to measure the amount of time spent parked in order to avoid counting parked time as "busy" time.

bdonlan · 2021-06-07T17:33:01Z

tokio/docs/runtime-stats.md

+
+The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing.
+
+Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.


This will require a way to atomically capture the values in a single batch (and observe when a new batch is ready), I think.

I agree — unless we want consumers to poll these statistics, we'll need some kind of subscription/notify mechanism.

There is a question if atomicity is important to interpret metrics correctly, or whether you are ok that individual values don't match each other (e.g. the sum of tasks run per worker doesn't match the total tasks run metric).

Since it's "just metrics", I think one can be ok with the latter. It will simplify the implementation.

And polling metrics is reasonable. You can always increase the polling frequency to get more details. Polling mostly isn't feasible if are interested in every single event. But that won't work in this system anyway, if it makes use of internal batching.

bdonlan · 2021-06-07T17:35:54Z

tokio/docs/runtime-stats.md

+
+### I/O and Timer implementations
+
+Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor.


In this case the issue is true sharing (contention on atomics), so padding stuff out won't solve it either.

jonhoo · 2021-06-08T00:00:51Z

tokio/docs/runtime-stats.md

+
+There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.
+
+A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.


This was odd to me. "stats" is short for "statistics", which are just as much aggregated as "metrics" are. Would "performance counters" be better"? Or "performance events"? Or maybe "observations" or simply "data"?

+1 for "metrics"; I'm not sure the distinction here matters that much. People will want the metrics, try to figure out how it works, see they need an extra crate, and be on their way. Calling it "stats" doesn't imply that, we'll have to spell it out in the documentation. So just go with the more common term of "metrics," IMO.

I personally think of those are fine (and like the "performance counters" too). It should just be consistent

jonhoo · 2021-06-08T00:01:03Z

tokio/docs/runtime-stats.md

+
+## Motivation
+
+When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions.


Suggested change

When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions.

When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up to? How can I optimize my application? This RFC intends to provide a foundation to answer these questions.

jonhoo · 2021-06-08T00:01:47Z

tokio/docs/runtime-stats.md

+
+Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate.
+
+```rust=


Is the = intentional here? It prevents highlighting.

jonhoo · 2021-06-08T00:03:55Z

tokio/docs/runtime-stats.md

+
+### Executor
+
+The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.


To iterate on that:

Suggested change

The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.

Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregated and merged per-worker statistics, which may be more amenable to existing telemetry collection systems, will be provided by crates like `tokio-metrics`.

jonhoo · 2021-06-08T00:04:56Z

tokio/docs/runtime-stats.md

+
+The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.
+
+The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles.


I agree. The references to "wall clock time" are also confusing, since wall clock time is, by definition, "real" time and not something that happens in, say, ticks.

jonhoo · 2021-06-08T00:12:02Z

tokio/docs/runtime-stats.md

+The blocking pool already tracks the number of idle threads and the total number of threads. These values are currently within a shared mutex but can be moved to be `AtomicUsize` values and then shared with the `Stats` struct to be sampled by the collector. In addition, a counter that is incremented on each task execution will be included. All values will be streamed to the stats struct via atomics.
+
+Stats from the blocking pool:
+- Number of idle threads


This value also feels like it might vary wildly. Should it also present min/max/avg?

jonhoo · 2021-06-08T00:12:47Z

tokio/docs/runtime-stats.md

+
+### Task
+
+This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail.


Suggested change

This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail.

This RFC does not propose tracking stats/metrics at the task level due to the overhead required. Instead, the this is left to projects like the [tokio console](https://github.com/tokio-rs/console), which allows the user to attach the console and take the performance hit when they want to explore issues in more detail.

jonhoo · 2021-06-08T00:13:14Z

tokio/docs/runtime-stats.md

+
+### I/O driver
+
+Unlike, the executor stats, stats coming from the I/O driver will be streamed directly to the `Stats` struct via atomics. Each value will be incremented (via `AtomicU64::fetch_add`) for each event.


Again, I don't know what "streaming ... via atomics" means.

Also, Stats struct hasn't been defined.

jonhoo · 2021-06-08T00:13:26Z

tokio/docs/runtime-stats.md

+Unlike, the executor stats, stats coming from the I/O driver will be streamed directly to the `Stats` struct via atomics. Each value will be incremented (via `AtomicU64::fetch_add`) for each event.
+
+List of stats provided from the I/O driver:
+- Amount of compact


Here, too, "number" seems preferable to "amount".

What does "compact" mean here?

jonhoo · 2021-06-08T00:13:50Z

tokio/docs/runtime-stats.md

+List of stats provided from the I/O driver:
+- Amount of compact
+- Amount of "token dispatches" (aka ready events)
+- Amount of fd currently registered with `io::Driver`


Suggested change

- Amount of fd currently registered with `io::Driver`

- Number of file descriptors currently registered with `io::Driver`

How does this work on Windows?

call it IO handles or IO resources?

hawkw · 2021-06-08T00:19:05Z

tokio/docs/runtime-stats.md

+
+This RFC proposes a new way to gather understanding from the Tokio runtime. Currently, the runtime does not expose any methods to understand what is happening under the hood. This provides a rough experience when deploying Tokio based applications into production where you would like to understand what is happening to your code. Via this RFC, we will propose a few methods to collect this data at different levels. Beyond what is proposed as implemenation in this RFC, we will also discuss other methods to gather the information a user might need to be successful with Tokio in a production environment.
+
+There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.


nit: "A ... stats" seems grammatically weird --- i would just say

Suggested change

There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.

There are two main types of stats that a runtime can expose. Per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime, and per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.

Plus implemenation -> implementation

hawkw · 2021-06-08T00:19:33Z

tokio/docs/runtime-stats.md

+
+There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.
+
+A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.


nit:

Suggested change

A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.

A small note: the term "stats" is used instead of "metrics", because we are only concerned with exposing raw data rather than methods of aggregating and emitting that data.

hawkw · 2021-06-08T00:20:07Z

tokio/docs/runtime-stats.md

+
+Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate.
+
+```rust=


Suggested change

```rust=

```rust

Matthias247

Thanks for getting started on this! Looking forward to it.

Matthias247 · 2021-06-08T18:20:15Z

tokio/docs/runtime-stats.md

+
+There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.
+
+A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.


I personally think of those are fine (and like the "performance counters" too). It should just be consistent

Matthias247 · 2021-06-08T18:25:03Z

tokio/docs/runtime-stats.md

+
+## Motivation
+
+When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions.


What is my runtime up too? How can I optimize my application?

I would recommend to make this a bit more concrete, because the "what is going on" duplicates a couple of times in the doc without going much deeper.

Among:

Why is the latency of the system higher than expected?

Why does memory utilization grow over time?

Why does the service run out of the file descriptor limit?

Matthias247 · 2021-06-08T18:27:12Z

tokio/docs/runtime-stats.md

+
+## Guide-level explanation
+
+Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate.


I think what is returned could be a reference-counted accessor for the raw values. But it doesn't have to store the stats itself. It simply can contain a fn stats(&self) -> RealStats function which returns a POD struct with just values in it. How the accessor handler gets those doesn't matter.

Through this, there will be a tokio-metrics crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the metrics crate.

This is more confusing than helpful to be at the moment. What are "proper aggregated metrics"? What is not proper about the other ones? Maybe its easier to leave that detail out of this proposal, and just mention that metric submission is out of scope because it is application dependent?

Matthias247 · 2021-06-08T18:29:41Z

tokio/docs/runtime-stats.md

+let executor = stats.executor();
+
+// per-worker stats via the executor.
+for worker in executor.workers() {


Is that number even static? Is there a unique worker ID?

Matthias247 · 2021-06-08T18:33:27Z

tokio/docs/runtime-stats.md

+
+Each worker will expose these stats, updated in batches:
+
+- Amount of futures executed


Amount of futures polled sees right to me. I don't think it should be "distinct". If the same future gets scheduled multiple times, it is also work.

Matthias247 · 2021-06-08T18:58:44Z

tokio/docs/runtime-stats.md

+
+### Task
+
+This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail.


This is kind of confusing, since per-task stats seem mentioned in the intro?

A per-task stats (eg poll_duration, amount_polls) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.

Apart from that I'm ok with them not being able in the beginning. Once users have global visibility and see some abnormalities they can always add custom instrumentation to their tasks/futures to figure out the details. The executor-level stats are more tricky because those details are not exposed to users.

Matthias247 · 2021-06-08T18:59:23Z

tokio/docs/runtime-stats.md

+List of stats provided from the I/O driver:
+- Amount of compact
+- Amount of "token dispatches" (aka ready events)
+- Amount of fd currently registered with `io::Driver`


call it IO handles or IO resources?

Matthias247 · 2021-06-08T19:02:11Z

tokio/docs/runtime-stats.md

+
+List of stats provided from the I/O driver:
+- Amount of compact
+- Amount of "token dispatches" (aka ready events)


Sounds like a tokio/mio concept. "ready events" might be a better term to expose externally

Matthias247 · 2021-06-08T19:04:20Z

tokio/docs/runtime-stats.md

+
+To avoid any extra overhead in the executor loop, each worker will batch metrics into a `Core` local struct. These values will be incremented or sampled during regular executor cycles when certain operations happen like a work steal attempt or a pop from one of the queues.
+
+The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process.


I think it is polling, and described in the next paragraph.

Matthias247 · 2021-06-08T19:11:24Z

tokio/docs/runtime-stats.md

+
+The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing.
+
+Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.


I think it's a question of the runtime overhead caused by this. Percentiles will require more work will collecting data, and you might not be able to use atomic counters anymore.

An external aggregator that consumes the stats (e.g. once per second) could still perform aggregation and do percentiles based on the amount of occurrences inside the sampling period.

However an external aggregator won't be able to capture min/max values in case there are peaks inside that sampling period. E.g. if you want to have a metric which is around "maximum tasks polled inside an executor iteration" and "minimum tasks polled", you couldn't get that if you just have counters of

eventloop iterations

tasks polled

I guess for tasks where we find those values useful, it makes sense to add them.

Otherwise it's probably easiest to just add always incrementing counters and let the external application do the diffing and aggregation. You can provide some helpers that allow like:

let mut last_stats = stats.executor(); loop { std::thread::delay(sampling_time); let stats = stats.executor(); let delta_stats = stats.diff(last_stats); my_favorite_metric_system.aggregate_and_emit(delta_stats); // or potentially also the raw stats last_stats = stats;

E.g. we had issues in the past where some metrics that only had been emitted once per minute didn't show BPS spikes that happened inside some seconds and caused excessive packet drops.

@LucioFranco Might be worthwhile to document that kind of periodic sampling system in the "guide" section, since there had been a few questions on how to use the thing.

Darksonn · 2021-08-12T13:41:33Z

tokio/docs/runtime-stats.md

+
+Each worker will expose these stats, updated in batches:
+
+- Amount of futures executed


What about futures passed to block_on? Which thread are they on? What about futures polled in a LocalSet?

So localset is a good question...

For block_on I would say it doesn't run on the main executor so it doesn't count?

Darksonn · 2021-08-17T10:27:36Z

Please see the initial work in #4043 and provide feedback on the direction.

carllerche · 2022-11-22T19:37:27Z

Thanks for the work. I'm going to close this due to inactivity. If you want to continue this patch, please open a new PR and reference this one.

rfc: Runtime stats

e4ad18f

LucioFranco requested review from carllerche, seanmonstar, jonhoo, hawkw and a team June 7, 2021 14:32

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Jun 7, 2021

Darksonn reviewed Jun 7, 2021

View reviewed changes

tobz requested changes Jun 7, 2021

View reviewed changes

carllerche reviewed Jun 7, 2021

View reviewed changes

bdonlan reviewed Jun 7, 2021

View reviewed changes

jonhoo requested changes Jun 8, 2021

View reviewed changes

hawkw reviewed Jun 8, 2021

View reviewed changes

Matthias247 reviewed Jun 8, 2021

View reviewed changes

hawkw mentioned this pull request Jul 13, 2021

add time-series metrics for memory consumption by proxy data structures linkerd/linkerd2#6473

Closed

Darksonn reviewed Aug 12, 2021

View reviewed changes

Darksonn mentioned this pull request Aug 17, 2021

Initial work on runtime stats #4043

Merged

carllerche mentioned this pull request Aug 26, 2021

meta: Runtime metrics stabilization #4073

Open

7 tasks

Darksonn added M-metrics Module: tokio/runtime/metrics and removed M-runtime Module: tokio/runtime labels Aug 27, 2021

Matthias247 mentioned this pull request Aug 28, 2021

Add a configuration option to skip the lifo_slot optimization #4051

Closed

Darksonn mentioned this pull request Oct 19, 2021

stats: add busy_duration stats #4179

Merged

carllerche closed this Nov 22, 2022

Darksonn deleted the lucio/runtime-stats-rfc branch November 22, 2022 20:14


		### I/O and Timer implementations

		Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor.


		To avoid any extra overhead in the executor loop, each worker will batch metrics into a `Core` local struct. These values will be incremented or sampled during regular executor cycles when certain operations happen like a work steal attempt or a pop from one of the queues.

		The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process.


		### Executor

		The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.

	The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric.
	Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregating and merging these per-worker statistics in a way that makes more sense when used from existing telemetry collection systems will be provided by crates like `tokio-metrics`.


		The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles.

		Each worker will expose these stats, updated in batches:


		The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing.

		Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.

	Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values.
	Some of these statistics, such as queue depth, include the minimum and maximum value measured during the given observation window. These statistics can rapidly change under heavy load, and this approach provides a middle ground between streaming all measurements/changes (expensive) and potentially not observing the spikes at all.


		The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process.

		This then allows the collector to poll the stats on any interval. Thus, allowing it to drive its own timer to understand an estimate of the duration that a certain amount of ticks too, or how many times the the executor entered the park state.


		### `tokio-metrics`

		The `tokio-metrics` crate will provide aggregated metrics based on the `Stats` struct. This will include histograms and other useful aggregated forms of the stats that could be emitted by various metrics implementations. This crate is designed to provide the ability to expose the aggregated stats in an unstable `0.1` method outside of the runtime and allow the ability to iterate on how they are aggregated without the need to follow `tokio`'s strict versioning scheme.


		There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.

		A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.


		## Motivation

		When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions.


		Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate.

		```rust=


		### Task

		This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail.

	This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail.
	This RFC does not propose tracking stats/metrics at the task level due to the overhead required. Instead, the this is left to projects like the [tokio console](https://github.com/tokio-rs/console), which allows the user to attach the console and take the performance hit when they want to explore issues in more detail.


		### I/O driver

		Unlike, the executor stats, stats coming from the I/O driver will be streamed directly to the `Stats` struct via atomics. Each value will be incremented (via `AtomicU64::fetch_add`) for each event.

	- Amount of fd currently registered with `io::Driver`
	- Number of file descriptors currently registered with `io::Driver`


		This RFC proposes a new way to gather understanding from the Tokio runtime. Currently, the runtime does not expose any methods to understand what is happening under the hood. This provides a rough experience when deploying Tokio based applications into production where you would like to understand what is happening to your code. Via this RFC, we will propose a few methods to collect this data at different levels. Beyond what is proposed as implemenation in this RFC, we will also discuss other methods to gather the information a user might need to be successful with Tokio in a production environment.

		There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.

	A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data.
	A small note: the term "stats" is used instead of "metrics", because we are only concerned with exposing raw data rather than methods of aggregating and emitting that data.


		## Guide-level explanation

		Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate.


		Each worker will expose these stats, updated in batches:

		- Amount of futures executed

rfc: Runtime stats #3845

rfc: Runtime stats #3845

Conversation

LucioFranco commented Jun 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllerche Jun 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Matthias247 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Darksonn commented Aug 17, 2021

carllerche commented Nov 22, 2022

carllerche Jun 7, 2021 •

edited

Loading