*: add metrics about raft message wait flush duration #16214

crazycs520 · 2023-12-22T03:13:00Z

What is changed and how it works?

Issue Number: ref #12362

What's Changed:

add metrics about raft message wait flush duration

In following metrics, I restarted 2 TiKV servers every 2 minutes. During restart, raft message wait flush duration was high probably cause by waiting for the connection to be established. However, the reason for the high wait flush latency at other times is not known.

Related changes

Need to cherry-pick to the release branch

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Release note

None

Signed-off-by: crazycs520 <crazycs520@gmail.com>

ti-chi-bot · 2023-12-22T03:13:02Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Connor1996
bufferflies

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

crazycs520 · 2023-12-22T03:13:36Z

@cfzjywxk @overvenus PTAL

metrics/grafana/tikv_details.dashboard.py

Signed-off-by: crazycs520 <crazycs520@gmail.com>

Connor1996

So why the wait flush duration is so high

crazycs520 · 2023-12-26T06:01:56Z

So why the wait flush duration is so high

In the monitoring by PR description, I restarted 2 TiKV servers every 2 minutes. During restart, wait flush duration was too high probably by waiting for the connection to be established. However, the reason for the high wait flush latency at other times is not known

crazycs520 · 2023-12-26T06:02:19Z

@overvenus @cfzjywxk PTAL

cfzjywxk · 2023-12-26T06:50:08Z

So why the wait flush duration is so high

In the monitoring by PR description, I restarted 2 TiKV servers every 2 minutes. During restart, wait flush duration was too high probably by waiting for the connection to be established. However, the reason for the high wait flush latency at other times is not known

@crazycs520
It's better to describe the tests and observed phenomenon, it could help to explain why the metric here is useful and how is it used for diagnose.

zyguan

I don't get the meaning of the metric. Think about the following case, observe(t5-t1) can be thought as the flush wait duration of m1, but how to understand observe(t8-t6)?

t1: push(m1)       record_begin_wait
t2: push(m2)
t3: push(m3)
t4: push(m4)
t5: flush([m1,m2]) take_begin_wait     observe(t5-t1)
t6: push(m5)       record_begin_wait
t7: push(m6)
t8: flush([m3,m4]) take_begin_wait     observe(t8-t6)
t9: flush([m5,m6])

crazycs520 · 2023-12-26T10:22:23Z

I don't get the meaning of the metric. Think about the following case, observe(t5-t1) can be thought as the flush wait duration of m1, but how to understand observe(t8-t6)?
t1: push(m1)       record_begin_wait
t2: push(m2)
t3: push(m3)
t4: push(m4)
t5: flush([m1,m2]) take_begin_wait     observe(t5-t1)
t6: push(m5)       record_begin_wait
t7: push(m6)
t8: flush([m3,m4]) take_begin_wait     observe(t8-t6)
t9: flush([m5,m6])

Nice catch. But this shouldn't happen. since the default config is:

raft_client_queue_size =  8192;
raft_client_grpc_send_msg_buffer = 512 * 1024;

So each flush will send all messages in the queue. So the example should be:

t1: push(m1)       record_begin_wait
t2: push(m2)
t3: push(m3)
t4: push(m4)
t5: flush([m1,m2,m3,m4]) take_begin_wait     observe(t5-t1)
t6: push(m5)       record_begin_wait
t7: push(m6)
t8: flush([m5,m6]) take_begin_wait     observe(t8-t6)

I don't record the timestamp for each message to avoid performance issues.

zyguan · 2023-12-26T13:19:36Z

I don't get the meaning of the metric. Think about the following case, observe(t5-t1) can be thought as the flush wait duration of m1, but how to understand observe(t8-t6)?
t1: push(m1)       record_begin_wait
t2: push(m2)
t3: push(m3)
t4: push(m4)
t5: flush([m1,m2]) take_begin_wait     observe(t5-t1)
t6: push(m5)       record_begin_wait
t7: push(m6)
t8: flush([m3,m4]) take_begin_wait     observe(t8-t6)
t9: flush([m5,m6])
Nice catch. But this shouldn't happen. since the default config is:
raft_client_queue_size =  8192;
raft_client_grpc_send_msg_buffer = 512 * 1024;
So each flush will send all messages in the queue. So the example should be:
t1: push(m1)       record_begin_wait
t2: push(m2)
t3: push(m3)
t4: push(m4)
t5: flush([m1,m2,m3,m4]) take_begin_wait     observe(t5-t1)
t6: push(m5)       record_begin_wait
t7: push(m6)
t8: flush([m5,m6]) take_begin_wait     observe(t8-t6)
I don't record the timestamp for each message to avoid performance issues.

According to the code, the capacity of the buffer is raft_msg_max_batch_size, which is 128 by default. Besides, the observed duration could be hard to understand even when cap(buffer) > cap(queue). Think about the following case, where cap(queue) = 2 and cap(buffer) = 3, how to explain observe(t13-t11)?

t0:  init           queue:[]           buffer:[]
t1:  push(m1)       queue:[m1]         buffer:[]               record_begin_wait
t2:  push(m2)       queue:[m1,m2]      buffer:[]
t3:  fill_msg       queue:[]           buffer:[m1,m2]
t4:  push(m3)       queue:[m3]         buffer:[m1,m2]
t5:  push(m4)       queue:[m3,m4]      buffer:[m1,m2]
t6:  fill_msg       queue:[]           buffer:[m1,m2,m3]+m4
t7:  push(m5)       queue:[m5]         buffer:[m1,m2,m3]+m4
t8:  push(m6)       queue:[m5,m6]      buffer:[m1,m2,m3]+m4
t9:  flush          queue:[m5,m6]      buffer:[m4]             take_begin_wait    observe(t9-t1)
t10: fill_msg       queue:[]           buffer:[m4,m5,m6]
t11: push(m7)       queue:[m7]         buffer:[m4,m5,m6]       record_begin_wait
t12: fill_msg       queue:[]           buffer:[m4,m5,m6]+m7
t13: flush          queue:[]           buffer:[m7]             take_begin_wait    observe(t13-t11)

crazycs520 · 2023-12-27T02:16:31Z

According to the code, the capacity of the buffer is raft_msg_max_batch_size, which is 128 by default. Besides, the observed duration is hard to understand even when cap(buffer) > cap(queue). Think about the following case, where cap(queue) = 2 and cap(buffer) = 3), how to explain observe(t13-t11)?

t0:  init           queue:[]           buffer:[]
t1:  push(m1)       queue:[m1]         buffer:[]               record_begin_wait
t2:  push(m2)       queue:[m1,m2]      buffer:[]
t3:  fill_msg       queue:[]           buffer:[m1,m2]
t4:  push(m3)       queue:[m3]         buffer:[m1,m2]
t5:  push(m4)       queue:[m3,m4]      buffer:[m1,m2]
t6:  fill_msg       queue:[]           buffer:[m1,m2,m3]+m4
t7:  push(m5)       queue:[m5]         buffer:[m1,m2,m3]+m4
t8:  push(m6)       queue:[m5,m6]      buffer:[m1,m2,m3]+m4
t9:  flush          queue:[m5,m6]      buffer:[m4]             take_begin_wait    observe(t9-t1)
t10: fill_msg       queue:[]           buffer:[m4,m5,m6]
t11: push(m7)       queue:[m7]         buffer:[m4,m5,m6]       record_begin_wait
t12: fill_msg       queue:[]           buffer:[m4,m5,m6]+m7
t13: flush          queue:[]           buffer:[m7]             take_begin_wait    observe(t13-t11)

It seems that when send buffer overflow, this monitoring will be inaccurate, which is smaller than the actual flush wait time

ti-chi-bot · 2024-01-05T08:37:51Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cfzjywxk · 2024-01-10T07:50:00Z

@crazycs520
Please resolve the conflicts.

@zyguan PTAL again

zyguan · 2024-01-10T14:51:41Z

@crazycs520 Please resolve the conflicts.

@zyguan PTAL again

It seems there is no change since my last review. The issue of the current implementation is that the observed duration is not well defined, which may lead to:

the 1st case since raft_client_queue_size >> raft_msg_max_batch_size
the 2nd case even when the batch size > the queue size

The raft client can be described as the following (note grpc_raft_conn_num can be greater than 1). IMO, push events and flush events are relatively independent.

crazycs520 added 2 commits December 22, 2023 10:28

*: add metrics about raft message wait flush duration

91667e3

Signed-off-by: crazycs520 <crazycs520@gmail.com>

update grafana template

c68dfc0

Signed-off-by: crazycs520 <crazycs520@gmail.com>

ti-chi-bot bot added release-note-none contribution Type: PR - From contributors size/XXL labels Dec 22, 2023

overvenus reviewed Dec 22, 2023

View reviewed changes

metrics/grafana/tikv_details.dashboard.py Outdated Show resolved Hide resolved

update grafana template

b603764

Signed-off-by: crazycs520 <crazycs520@gmail.com>

cfzjywxk requested review from cfzjywxk, MyonKeminta and Connor1996 December 26, 2023 01:48

Connor1996 approved these changes Dec 26, 2023

View reviewed changes

ti-chi-bot bot added the status/LGT1 Status: PR - There is already 1 approval label Dec 26, 2023

Merge branch 'master' into raft-client-metric2

1266e20

bufferflies approved these changes Dec 26, 2023

View reviewed changes

ti-chi-bot bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Dec 26, 2023

cfzjywxk requested a review from zyguan December 26, 2023 08:20

zyguan reviewed Dec 26, 2023

View reviewed changes

cfzjywxk requested a review from zyguan January 4, 2024 07:12

ti-chi-bot bot added the needs-rebase label Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: add metrics about raft message wait flush duration #16214

*: add metrics about raft message wait flush duration #16214

crazycs520 commented Dec 22, 2023 •

edited

ti-chi-bot bot commented Dec 22, 2023 •

edited

crazycs520 commented Dec 22, 2023

Connor1996 left a comment

crazycs520 commented Dec 26, 2023

crazycs520 commented Dec 26, 2023

cfzjywxk commented Dec 26, 2023

zyguan left a comment

crazycs520 commented Dec 26, 2023

zyguan commented Dec 26, 2023 •

edited

crazycs520 commented Dec 27, 2023

ti-chi-bot bot commented Jan 5, 2024

cfzjywxk commented Jan 10, 2024

zyguan commented Jan 10, 2024

*: add metrics about raft message wait flush duration #16214

Are you sure you want to change the base?

*: add metrics about raft message wait flush duration #16214

Conversation

crazycs520 commented Dec 22, 2023 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot bot commented Dec 22, 2023 • edited

crazycs520 commented Dec 22, 2023

Connor1996 left a comment

Choose a reason for hiding this comment

crazycs520 commented Dec 26, 2023

crazycs520 commented Dec 26, 2023

cfzjywxk commented Dec 26, 2023

zyguan left a comment

Choose a reason for hiding this comment

crazycs520 commented Dec 26, 2023

zyguan commented Dec 26, 2023 • edited

crazycs520 commented Dec 27, 2023

ti-chi-bot bot commented Jan 5, 2024

cfzjywxk commented Jan 10, 2024

zyguan commented Jan 10, 2024

crazycs520 commented Dec 22, 2023 •

edited

ti-chi-bot bot commented Dec 22, 2023 •

edited

zyguan commented Dec 26, 2023 •

edited