-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: add busy_duration stats #4179
Conversation
} | ||
|
||
pub(crate) fn about_to_park(&mut self) { | ||
self.park_count += 1; | ||
|
||
let busy_duration = self.last_resume_time.elapsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this should just add up those busy_duration
times per worker in a continuously incrementing time value - then the application could still calculate diffs as below?
I guess the benefit of tokio doing it is that one can get smaller durations than one could get if only querying the runtime states every 1s - but on the other hand all information about everything except the last 16 parks is lost?
Maybe both metrics would be helpful, the smaller one for getting some samples around busy times and the full aggregate for having overall load averages (total_busy_time / total_time_elapsed
is load on the worker), but for the fine grained one I'm not sure if 16parks is a good unit or not. It's a pretty short duration in the typical case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to split the considerations, I'm happy to start trying out one version with just a total_busy_duration
metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a total_busy_time
sounds reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which direction do you prefer than? Adding that and keeping the other parts? Or just adding total_busy_time
for now? I'm good with any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding total_busy_time
to this PR sounds reasonable to me.
@Darksonn it seemed like the conclusion was to only add |
I would recommend sticking w/ total. busy_total + a loop counter (park_count). Those two combined should be enough to get a good sense for busy time per iteration. The listener can compute an EWMA of busy time per park and that should be good enough |
If the goal is to be able to see outliers hidden by averages, one option besides using histograms would be to let the user specify duration bands and track the total busy time within those bands. For example, the user could specify that they expect most park loops to spend at most 1ms "busy", that 1ms -> 5ms is "high", and over 5ms is unexpected. Using user-specified bands, we can track the total busy time within those bands along with the total count of times a loop landed in the band. The observer could use those total times to compute EMWAs. This should satisfy the objective without having to manage synchronization related to passing histograms out of the runtime. |
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
# 1.14.0 (November 15, 2021) ### Fixed - macros: fix compiler errors when using `mut` patterns in `select!` ([#4211]) - sync: fix a data race between `oneshot::Sender::send` and awaiting a `oneshot::Receiver` when the oneshot has been closed ([#4226]) - sync: make `AtomicWaker` panic safe ([#3689]) - runtime: fix basic scheduler dropping tasks outside a runtime context ([#4213]) ### Added - stats: add `RuntimeStats::busy_duration_total` ([#4179], [#4223]) ### Changed - io: updated `copy` buffer size to match `std::io::copy` ([#4209]) ### Documented - io: rename buffer to file in doc-test ([#4230]) - sync: fix Notify example ([#4212]) [#4211]: #4211 [#4226]: #4226 [#3689]: #3689 [#4213]: #4213 [#4179]: #4179 [#4223]: #4223 [#4209]: #4209 [#4230]: #4230 [#4212]: #4212
Add busy duration statistic. Refs: #3845
cc @Matthias247