trace peers' availability info on leader side #13209

ethercflow · 2022-08-02T15:50:10Z

Issue Number: ref #12876

Signed-off-by: Wenbo Zhang ethercflow@gmail.com

What is changed and how it works?

Issue Number: Close #xxx

What's Changed:

trace peers' availability info on leader side

Related changes

No

Check List

Tests

No code

Side effects

Performance regression
- Consumes more CPU
- Consumes more MEM

Release note

trace peers' availability info on leader side

ti-chi-bot · 2022-08-02T15:50:11Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

BusyJay
Connor1996

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

components/raftstore/src/store/fsm/peer.rs

components/raftstore/src/store/peer.rs

components/raftstore/src/store/fsm/peer.rs

BusyJay · 2022-08-10T03:26:42Z

What's the purpose of this PR?

ethercflow · 2022-08-10T06:40:43Z

What's the purpose of this PR?

It'll be used to judge pending peer: whose (except witness) apply index is smaller than leader's truncated index

BusyJay · 2022-08-10T06:58:55Z

It'll be used to judge pending peer: whose (except witness) apply index is smaller than leader's truncated index

Why it has to be apply index? Why can't matched be used? What's the exact definition and purpose of pending peer?

Connor1996 · 2022-08-10T07:01:23Z

What's the purpose of this PR?

For witness -> non-witness conf-change, we need to know when the snapshot has been applied. The witness or uninitialized peer would always report apply index of 0. And once the snapshot is applied, it reports the latest apply index then PD gets to know the operator is finished.
It's a general feature that can benefit other features:
- Tiflash has a demand to detect slow apply tiflash peer
- Pre-transfer-leader doesn't need to send a message to get the target peer's apply index

BusyJay · 2022-08-10T07:18:21Z

For witness -> non-witness conf-change, we need to know when the snapshot has been applied. The witness or uninitialized peer would always report apply index of 0. And once the snapshot is applied, it reports the latest apply index then PD gets to know the operator is finished.

Instead of querying index, I suggest to reply with the exact information that whether it's applying snapshot or whether it's in healthy state that can serve at least stale read.

  * Tiflash has a demand to detect slow apply tiflash peer

Index less than truncated index doesn't mean it's slow. Leader is free to truncate logs whenever it wants despite other peers' apply progress. Slow apply should be monitored by time. The requirement is false.

  * Pre-transfer-leader doesn't need to send a message to get the target peer's apply index

pre-transfer-leader doesn't just check for apply index. A message is always required.

Connor1996 · 2022-08-10T07:46:19Z

Instead of querying index, I suggest to reply with the exact information that whether it's applying snapshot or whether it's in healthy state that can serve at least stale read.

I don't think exact information is general enough, what if we want to do flow control based on slow apply, it still needs to trace apply index progress.

Index less than truncated index doesn't mean it's slow. Leader is free to truncate logs whenever it wants despite other peers' apply progress. Slow apply should be monitored by time. The requirement is false.

Yes, it may be false-positive when compared to the truncated index, but it can compare to the leader's apply index.

pre-transfer-leader doesn't just check for apply index. A message is always required.

I know, but it can be a quick return when the apply index is lagged.

BusyJay · 2022-08-10T08:04:03Z

what if we want to do flow control based on slow apply, it still needs to trace apply index progress.

Slow apply should not be measured by index. Instead, it should be measured by memory, CPU and disk capacity/IO. Only the follower knows whether it's overloaded and suggest leader to do a flow control. We already have such control by actively reject leader's MsgAppend.

Yes, it may be false-positive when compared to the truncated index, but it can compare to the leader's apply index.

What's the point? They are doing different work and have different access pattern.

Connor1996 · 2022-08-10T08:24:27Z

Slow apply should not be measured by index. Instead, it should be measured by memory, CPU and disk capacity/IO. Only the follower knows whether it's overloaded and suggest leader to do a flow control. We already have such control by actively reject leader's MsgAppend.

Not want to deep into the details of slow apply. My point is the apply index fits the witness requirement, and it also can be used by other features. So why bother to pass exact information dedicated to witness instead of a general one -- apply index?

What's the point? They are doing different work and have different access pattern.

The point is that the slow Tiflash can be detected by the tiflash peer's apply index with leader apply index, it's not a false requirement. This issue of tiflash isn't related with time, it needs to know the index to check if the tiflash is ready, see https://pingcap.feishu.cn/docx/doxcnxQfXMgxnLV0pQqNxP9mQad

components/raftstore/src/store/fsm/peer.rs

components/raftstore/src/store/peer.rs

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ethercflow · 2022-10-07T01:23:18Z

/run-tests

ethercflow · 2022-10-07T09:21:44Z

/run-tests

components/raftstore/src/store/fsm/peer.rs

Connor1996

LGTM

Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

BusyJay

Any test case?

components/raftstore/src/store/fsm/peer.rs

Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ethercflow · 2022-10-11T12:59:10Z

Any test case?

I've added integration tests here because currently there is no place for these functions to be called in this PR.

Connor1996 · 2022-10-12T04:47:34Z

/merge

ti-chi-bot · 2022-10-12T04:47:35Z

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2022-10-12T04:47:37Z

This pull request has been accepted and is ready to merge.

Commit hash: 4ee908a

ti-chi-bot · 2022-10-12T04:47:50Z

@ethercflow: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

ti-chi-bot added do-not-merge/needs-linked-issue release-note do-not-merge/work-in-progress size/M labels Aug 2, 2022

ethercflow requested a review from Connor1996 August 3, 2022 02:57

Connor1996 reviewed Aug 3, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

Connor1996 mentioned this pull request Aug 4, 2022

Support 2 Replicas With 1 Log Only Mechanism #12876

Open

38 tasks

ethercflow force-pushed the support-witness branch from 279afb3 to de1d226 Compare August 4, 2022 09:51

Connor1996 reviewed Aug 5, 2022

View reviewed changes

components/raftstore/src/store/peer.rs Outdated Show resolved Hide resolved

ethercflow force-pushed the support-witness branch from de1d226 to 68246b2 Compare August 8, 2022 12:12

ethercflow requested a review from Connor1996 August 8, 2022 12:57

hehechen reviewed Aug 9, 2022

View reviewed changes

components/raftstore/src/store/peer.rs Outdated Show resolved Hide resolved

Connor1996 reviewed Aug 9, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Show resolved Hide resolved

ti-chi-bot added size/L and removed size/M labels Aug 9, 2022

ethercflow marked this pull request as ready for review August 10, 2022 03:06

ti-chi-bot removed the do-not-merge/work-in-progress label Aug 10, 2022

ethercflow requested review from BusyJay, tabokie and Connor1996 August 10, 2022 03:06

ti-chi-bot removed the do-not-merge/needs-linked-issue label Aug 10, 2022

tonyxuqqi reviewed Oct 6, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

tonyxuqqi reviewed Oct 6, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

tonyxuqqi reviewed Oct 6, 2022

View reviewed changes

components/raftstore/src/store/peer.rs Show resolved Hide resolved

change log level to debug

27c0ad6

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ethercflow requested a review from tonyxuqqi October 7, 2022 00:48

BusyJay reviewed Oct 8, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

components/raftstore/src/store/fsm/peer.rs Show resolved Hide resolved

Connor1996 approved these changes Oct 9, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Oct 9, 2022

address comments

38f1b62

Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ethercflow mentioned this pull request Oct 10, 2022

raftstore: Introduce witness peer #12972

Merged

ethercflow requested a review from BusyJay October 10, 2022 14:21

update kvproto

22ddaf5

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

ethercflow force-pushed the support-witness branch from 990c3e3 to 22ddaf5 Compare October 10, 2022 14:23

BusyJay reviewed Oct 11, 2022

View reviewed changes

components/raftstore/src/store/fsm/peer.rs Outdated Show resolved Hide resolved

components/raftstore/src/store/fsm/peer.rs Show resolved Hide resolved

address comment

4ee908a

Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>

BusyJay approved these changes Oct 11, 2022

View reviewed changes

ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Oct 11, 2022

ti-chi-bot added the status/can-merge Status: Can merge to base branch label Oct 12, 2022

Merge branch 'master' into support-witness

31c2ecb

ti-chi-bot merged commit f702db2 into tikv:master Oct 12, 2022

ti-chi-bot added this to the Pool milestone Oct 12, 2022

ethercflow added a commit to ethercflow/tikv that referenced this pull request Oct 19, 2022

trace peers' availability info on leader side (tikv#13209)

45afd61

ref tikv#12876 Signed-off-by: Wenbo Zhang <ethercflow@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trace peers' availability info on leader side #13209

trace peers' availability info on leader side #13209

ethercflow commented Aug 2, 2022 •

edited

ti-chi-bot commented Aug 2, 2022 •

edited

BusyJay commented Aug 10, 2022

ethercflow commented Aug 10, 2022

BusyJay commented Aug 10, 2022 •

edited

Connor1996 commented Aug 10, 2022 •

edited

BusyJay commented Aug 10, 2022 •

edited by Connor1996

Connor1996 commented Aug 10, 2022

BusyJay commented Aug 10, 2022

Connor1996 commented Aug 10, 2022 •

edited

ethercflow commented Oct 7, 2022

ethercflow commented Oct 7, 2022

Connor1996 left a comment

BusyJay left a comment

ethercflow commented Oct 11, 2022 •

edited

Connor1996 commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

trace peers' availability info on leader side #13209

trace peers' availability info on leader side #13209

Conversation

ethercflow commented Aug 2, 2022 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot commented Aug 2, 2022 • edited

BusyJay commented Aug 10, 2022

ethercflow commented Aug 10, 2022

BusyJay commented Aug 10, 2022 • edited

Connor1996 commented Aug 10, 2022 • edited

BusyJay commented Aug 10, 2022 • edited by Connor1996

Connor1996 commented Aug 10, 2022

BusyJay commented Aug 10, 2022

Connor1996 commented Aug 10, 2022 • edited

ethercflow commented Oct 7, 2022

ethercflow commented Oct 7, 2022

Connor1996 left a comment

Choose a reason for hiding this comment

BusyJay left a comment

Choose a reason for hiding this comment

ethercflow commented Oct 11, 2022 • edited

Connor1996 commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

ti-chi-bot commented Oct 12, 2022

ethercflow commented Aug 2, 2022 •

edited

ti-chi-bot commented Aug 2, 2022 •

edited

BusyJay commented Aug 10, 2022 •

edited

Connor1996 commented Aug 10, 2022 •

edited

BusyJay commented Aug 10, 2022 •

edited by Connor1996

Connor1996 commented Aug 10, 2022 •

edited

ethercflow commented Oct 11, 2022 •

edited