New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Accurate duration tracing of storage/scheduler message handling #8403
Conversation
src/storage/txn/scheduler.rs
Outdated
let mut statistics = Statistics::default(); | ||
|
||
if task.cmd.readonly() { | ||
self.process_read(snapshot, task, &mut statistics); | ||
} else { | ||
SCHED_PRE_HANDLE_2_DURATIONS_VEC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCHED_PRE_HANDLE_2_DURATIONS_VEC
seems no much different from SCHED_PRE_HANDLE_3_DURATIONS_VEC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCHED_PRE_HANDLE_3_DURATIONS_VEC is intended to record the accurate duration of in-queue time,
SCHED_PRE_HANDLE_2_DURATIONS_VEC will not be recorded when task.cmd.readonly()
is true.
But I agree what you said, so they will be merged into one.
src/storage/txn/scheduler.rs
Outdated
@@ -374,6 +392,9 @@ impl<E: Engine, L: LockManager, P: PdClient + 'static> Scheduler<E, L, P> { | |||
"process cmd with snapshot"; | |||
"cid" => task.cid, "cb_ctx" => ?cb_ctx | |||
); | |||
SCHED_PRE_HANDLE_1_DURATIONS_VEC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it have a more descriptive name instead of 1, 2, 3... For example SCHED_AFTER_SNAPSHOT_DURATIONS_VEC
because this is right after getting snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
The changes after your last review:
@sticnarf PTAL |
@@ -3127,6 +3127,10 @@ where | |||
if channel_timer.is_none() { | |||
channel_timer = Some(start); | |||
} | |||
if let Some(timer) = channel_timer { | |||
let elapsed = duration_to_sec(timer.elapsed()); | |||
APPLY_TASK_WAIT_TIME_HISTOGRAM.observe(elapsed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be observed several times. I think it should either be moved to L3128 or just remove channel_timer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
if let Some(tag) = tag { | ||
ASYNC_WRITE_DURATIONS_VEC | ||
.get(tag) | ||
.observe(begin_instant.elapsed_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the value calculated at L387 be reused? Or how about just keeping one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L387 doesn't record the duration by operating type (prewrite
, commit
, etc),
that's why we need a new metric here.
The idea of remove the origin metric crossed my mind,
but I think is not nice to simply remove it because it's depended by some panels,
We could keep both until the new panel totally replace the origin panel and the origin metric doesn't needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about reusing the metrics and add extra dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will also make the origin panel malfunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think prometheus is OK to be query with less dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think prometheus is OK to be query with less dimensions.
That require we rewrite the PromQL, before that, the panel will be error.
What I mean to do is two steps (the first step may last for a while and include more than one PR, that's why I separated it into 2 steps):
1, Improve metrics, as in this PR, not touch any old metrics to keep the old panels work
2, Improve panels, in the mean time, remove the old metrics as well
@BusyJay Addressed, PTAL |
a244a16
to
04b6c13
Compare
src/storage/txn/scheduler.rs
Outdated
@@ -52,6 +52,17 @@ use crate::storage::{ | |||
ErrorInner as StorageErrorInner, | |||
}; | |||
|
|||
use crate::storage::metrics::SCHED_POST_HANDLE_DURATIONS_VEC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use use crate::storage::metrics::*;
. We allow glob import for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
tctx.on_schedule(); | ||
SCHED_LATCH_HISTOGRAM_VEC | ||
.get(tctx.tag) | ||
.observe(tctx.wait_timer.elapsed_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is reset at L247, so the latch wait time is always smaller than SCHED_WAIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is for fixing:
3. The SCHED_LATCH_HISTOGRAM_VEC include duration in queue and latch time, which is not correct
After this change, the SCHED_WAIT will response for waiting for an available thread
, and SCHED_LATCH will response for waiting for the latch
if let Some(tag) = tag { | ||
ASYNC_WRITE_DURATIONS_VEC | ||
.get(tag) | ||
.observe(begin_instant.elapsed_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think prometheus is OK to be query with less dimensions.
Signed-off-by: Liu Cong <innerr@gmail.com>
04b6c13
to
f26963f
Compare
@BusyJay PTAL |
.get(self.tag) | ||
.observe(self.latch_timer.elapsed_secs()); | ||
.observe(self.wait_timer.elapsed_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing Instant::now_coarse()
can reduce one function call.
@@ -379,6 +379,9 @@ impl<E: Engine, L: LockManager> Scheduler<E, L> { | |||
"process cmd with snapshot"; | |||
"cid" => task.cid, "cb_ctx" => ?cb_ctx | |||
); | |||
SCHED_ASYNC_SNAPSHOT_DURATIONS_VEC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it duplicated with storage async snapshot metrics?
|
1 similar comment
|
@innerr: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Liu Cong innerr@gmail.com
What problem does this PR solve?
The inaccurate duration tracing of TiKV
Problem Summary:
tikv_raftstore_apply_wait_time_duration_secs
is not only the duration in queueasync_write
SCHED_LATCH_HISTOGRAM_VEC
include duration in queue and latch time, which is not correctWhat is changed and how it works?
Tests
Side effects
Release note
Later works
About the
4
, there will be some new Grafana panels to trace the duration of each message:And, there will be a panel to show the accuracy of this trace:
(the red delta line should close to
0
, means the trace is accurate)