Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: Accurate duration tracing of storage/scheduler message handling #8403

Closed
wants to merge 1 commit into from

Conversation

innerr
Copy link
Contributor

@innerr innerr commented Aug 4, 2020

Signed-off-by: Liu Cong innerr@gmail.com

What problem does this PR solve?

The inaccurate duration tracing of TiKV

Problem Summary:

  1. The tikv_raftstore_apply_wait_time_duration_secs is not only the duration in queue
  2. Lack of duration recording (separated by message type) of async_write
  3. The SCHED_LATCH_HISTOGRAM_VEC include duration in queue and latch time, which is not correct
  4. For each storage message, the duration recording was not fully covered the whole executing progress now

What is changed and how it works?

  1. Record only the in-queue duration
  2. Add a metric to record it
  3. Add a metric for in-queue duration
  4. Add a group of metrics, to record the whole executing progress by parts

Tests

  • Manual test
  • CI test

Side effects

  • The related metrics' meaning will be different from before

Release note

  • Add accurate duration tracing of TiKV

Later works

About the 4, there will be some new Grafana panels to trace the duration of each message:
WeChat Work Screenshot_20200804235913
WeChat Work Screenshot_20200804235936

And, there will be a panel to show the accuracy of this trace:
(the red delta line should close to 0, means the trace is accurate)
WeChat Work Screenshot_20200804235848

@innerr innerr changed the title metrics: Accurate duration tracing of scheduler message handling metrics: Accurate duration tracing of storage/scheduler message handling Aug 4, 2020
let mut statistics = Statistics::default();

if task.cmd.readonly() {
self.process_read(snapshot, task, &mut statistics);
} else {
SCHED_PRE_HANDLE_2_DURATIONS_VEC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCHED_PRE_HANDLE_2_DURATIONS_VEC seems no much different from SCHED_PRE_HANDLE_3_DURATIONS_VEC

Copy link
Contributor Author

@innerr innerr Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCHED_PRE_HANDLE_3_DURATIONS_VEC is intended to record the accurate duration of in-queue time,
SCHED_PRE_HANDLE_2_DURATIONS_VEC will not be recorded when task.cmd.readonly() is true.

But I agree what you said, so they will be merged into one.

@@ -374,6 +392,9 @@ impl<E: Engine, L: LockManager, P: PdClient + 'static> Scheduler<E, L, P> {
"process cmd with snapshot";
"cid" => task.cid, "cb_ctx" => ?cb_ctx
);
SCHED_PRE_HANDLE_1_DURATIONS_VEC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it have a more descriptive name instead of 1, 2, 3... For example SCHED_AFTER_SNAPSHOT_DURATIONS_VEC because this is right after getting snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@innerr
Copy link
Contributor Author

innerr commented Aug 10, 2020

The changes after your last review:

tikv_scheduler_pre_handle_1_duration_seconds =>
tikv_scheduler_async_snapshot_duration_seconds

tikv_scheduler_pre_handle_2_duration_seconds =>
(merged with pre_handle_3)

tikv_scheduler_pre_handle_3_duration_seconds =>
tikv_scheduler_wait_for_thread_duration_seconds

tikv_scheduler_before_write_1_duration_seconds =>
tikv_scheduler_process_before_write_duration_seconds

tikv_scheduler_before_write_2_duration_seconds =>
(merged with before_write_1)

@sticnarf PTAL

@@ -3127,6 +3127,10 @@ where
if channel_timer.is_none() {
channel_timer = Some(start);
}
if let Some(timer) = channel_timer {
let elapsed = duration_to_sec(timer.elapsed());
APPLY_TASK_WAIT_TIME_HISTOGRAM.observe(elapsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be observed several times. I think it should either be moved to L3128 or just remove channel_timer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

if let Some(tag) = tag {
ASYNC_WRITE_DURATIONS_VEC
.get(tag)
.observe(begin_instant.elapsed_secs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the value calculated at L387 be reused? Or how about just keeping one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L387 doesn't record the duration by operating type (prewrite, commit, etc),
that's why we need a new metric here.

The idea of remove the origin metric crossed my mind,
but I think is not nice to simply remove it because it's depended by some panels,
We could keep both until the new panel totally replace the origin panel and the origin metric doesn't needed anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reusing the metrics and add extra dimensions?

Copy link
Contributor Author

@innerr innerr Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will also make the origin panel malfunction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think prometheus is OK to be query with less dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think prometheus is OK to be query with less dimensions.

That require we rewrite the PromQL, before that, the panel will be error.

What I mean to do is two steps (the first step may last for a while and include more than one PR, that's why I separated it into 2 steps):
1, Improve metrics, as in this PR, not touch any old metrics to keep the old panels work
2, Improve panels, in the mean time, remove the old metrics as well

@innerr
Copy link
Contributor Author

innerr commented Aug 26, 2020

@BusyJay Addressed, PTAL

@innerr innerr force-pushed the accurate_trace branch 3 times, most recently from a244a16 to 04b6c13 Compare August 28, 2020 01:43
@@ -52,6 +52,17 @@ use crate::storage::{
ErrorInner as StorageErrorInner,
};

use crate::storage::metrics::SCHED_POST_HANDLE_DURATIONS_VEC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use use crate::storage::metrics::*;. We allow glob import for metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

tctx.on_schedule();
SCHED_LATCH_HISTOGRAM_VEC
.get(tctx.tag)
.observe(tctx.wait_timer.elapsed_secs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer is reset at L247, so the latch wait time is always smaller than SCHED_WAIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is for fixing:

3. The SCHED_LATCH_HISTOGRAM_VEC include duration in queue and latch time, which is not correct

After this change, the SCHED_WAIT will response for waiting for an available thread, and SCHED_LATCH will response for waiting for the latch

if let Some(tag) = tag {
ASYNC_WRITE_DURATIONS_VEC
.get(tag)
.observe(begin_instant.elapsed_secs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think prometheus is OK to be query with less dimensions.

Signed-off-by: Liu Cong <innerr@gmail.com>
@innerr
Copy link
Contributor Author

innerr commented Sep 4, 2020

@BusyJay PTAL

.get(self.tag)
.observe(self.latch_timer.elapsed_secs());
.observe(self.wait_timer.elapsed_secs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing Instant::now_coarse() can reduce one function call.

@@ -379,6 +379,9 @@ impl<E: Engine, L: LockManager> Scheduler<E, L> {
"process cmd with snapshot";
"cid" => task.cid, "cb_ctx" => ?cb_ctx
);
SCHED_ASYNC_SNAPSHOT_DURATIONS_VEC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it duplicated with storage async snapshot metrics?

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

1 similar comment
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ti-chi-bot
Copy link
Member

@innerr: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants