Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: fix the committed group not assign after apply snapshot #10802

Merged
merged 6 commits into from Aug 25, 2021

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented Aug 24, 2021

Signed-off-by: nolouch nolouch@gmail.com

Issue Number: close #10772 tikv/pd#4009

Problem Summary:

What problem does this PR solve?

Step: ingest network isolation chaos -> wait 10m -> delete network isolation chaos

From the logs:

  1. assign the committed group
  2. restore the snapshot (will clear commit group) https://github.com/tikv/raft-rs/blob/74d3bb58f19741cecc16698008103a446559837d/src/raft.rs#L2567-L2575
  3. finished apply snapshot
  4. transfer leader operator let the tc2 peer (this peer) became leader
  5. blocking on: still not reach integrity over label, and you can see all commit groups are 0.
[2021/08/23 12:24:14.766 +00:00] [Info] [raft.rs:1092] ["became follower at term 6"] [term=6] [raft_id=1332] [region_id=1108]
[2021/08/23 12:26:10.659 +00:00] [Info] [peer.rs:702] ["switch replication mode"] [peer_id=1332] [region_id=1108] [version=1376]
[2021/08/23 12:26:14.065 +00:00] [Info] [peer.rs:689] ["Debug DR on pdates replication mode."] [gb="[(1111, 2), (1332, 1), (1343, 2)]"] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:14.065 +00:00] [Info] [peer.rs:702] ["switch replication mode"] [peer_id=1332] [region_id=1108] [version=1472]
[2021/08/23 12:26:21.572 +00:00] [Info] [raft_log.rs:614] ["log [committed=20652, persisted=20653, applied=20652, unstable.offset=20654, unstable.entries.len()=0] starts to restore snapshot [index: 350273, term: 6]"] [snapshot_term=6] [snapshot_index=350273] [log="committed=20652, persisted=20653, applied=20652, unstable.offset=20654, unstable.entries.len()=0"] [raft_id=1332] [region_id=1108]
[2021/08/23 12:26:21.572 +00:00] [Info] [raft.rs:2609] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {1111, 1332, 1343} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=1332] [region_id=1108]
[2021/08/23 12:26:21.572 +00:00] [Info] [raft.rs:2593] ["restored snapshot"] [snapshot_term=6] [snapshot_index=350273] [last_term=6] [last_index=350273] [commit=350273] [raft_id=1332] [region_id=1108]
[2021/08/23 12:26:21.572 +00:00] [Info] [raft.rs:2474] ["[commit: 350273, term: 6] restored snapshot [index: 350273, term: 6]"] [snapshot_term=6] [snapshot_index=350273] [commit=350273] [term=6] [raft_id=1332] [region_id=1108]
[2021/08/23 12:26:21.572 +00:00] [Info] [peer_storage.rs:1244] ["begin to apply snapshot"] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:21.629 +00:00] [Info] [peer_storage.rs:1638] ["finish clear peer meta"] [takes=56.321771ms] [raft_key=1] [apply_key=1] [meta_key=1] [region_id=1108]
[2021/08/23 12:26:21.629 +00:00] [Info] [peer_storage.rs:1287] ["apply snapshot with state ok"] [state="applied_index: 350273 commit_index: 20652 commit_term: 6 truncated_state { index: 350273 term: 6 }"] [region="id: 1108 start_key: 7480000000000000FF885F728000000000FF0ABA320000000000FA end_key: 7480000000000000FF8900000000000000F8 region_epoch { conf_ver: 53 version: 58 } peers { id: 1111 store_id: 4 } peers { id: 1332 store_id: 5 } peers { id: 1343 store_id: 108 }"] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:21.631 +00:00] [Info] [apply.rs:3196] ["re-register to apply delegates"] [term=6] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:21.631 +00:00] [Info] [peer.rs:3117] ["snapshot is applied"] [region="id: 1108 start_key: 7480000000000000FF885F728000000000FF0ABA320000000000FA end_key: 7480000000000000FF8900000000000000F8 region_epoch { conf_ver: 53 version: 58 } peers { id: 1111 store_id: 4 } peers { id: 1332 store_id: 5 } peers { id: 1343 store_id: 108 }"] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:21.631 +00:00] [Info] [peer.rs:3174] ["region changed after applying snapshot"] [region="id: 1108 start_key: 7480000000000000FF885F728000000000FF0ABA320000000000FA end_key: 7480000000000000FF8900000000000000F8 region_epoch { conf_ver: 53 version: 58 } peers { id: 1111 store_id: 4 } peers { id: 1332 store_id: 5 } peers { id: 1343 store_id: 108 }"] [prev_region="id: 1108 start_key: 7480000000000000FF885F728000000000FF0ABA320000000000FA end_key: 7480000000000000FF8900000000000000F8 region_epoch { conf_ver: 53 version: 58 } peers { id: 1111 store_id: 4 } peers { id: 1332 store_id: 5 } peers { id: 1343 store_id: 108 }"] [peer_id=1332] [region_id=1108]
[2021/08/23 12:26:21.631 +00:00] [Info] [region_info_accessor.rs:238] ["trying to create region but it already exists, try to update it"] [region_id=1108]
[2021/08/23 12:28:53.865 +00:00] [Info] [region.rs:325] ["begin apply snap data"] [region_id=1108]
[2021/08/23 12:28:54.277 +00:00] [Info] [region.rs:390] ["apply new data"] [time_takes=25.590424ms] [region_id=1108]
[2021/08/23 12:28:56.289 +00:00] [Info] [raft.rs:1336] ["received a message with higher term from 1343"] ["msg type"=MsgRequestVote] [message_term=7] [term=6] [from=1343] [raft_id=1332] [region_id=1108]
[2021/08/23 12:28:56.289 +00:00] [Info] [raft.rs:1092] ["became follower at term 7"] [term=7] [raft_id=1332] [region_id=1108]
[2021/08/23 12:28:56.289 +00:00] [Info] [raft.rs:1532] ["[logterm: 6, index: 351667, vote: 0] cast vote for 1343 [logterm: 6, index: 351667] at term 7"] ["msg type"=MsgRequestVote] [term=7] [msg_index=351667] [msg_term=6] [from=1343] [vote=0] [log_index=351667] [log_term=6] [raft_id=1332] [region_id=1108]
[2021/08/23 12:29:32.107 +00:00] [Info] [peer.rs:857] ["deleting applied snap file"] [snap_file=1108_6_350273] [peer_id=1332] [region_id=1108]
[2021/08/23 12:31:56.965 +00:00] [Info] [raft.rs:2304] ["[term 7] received MsgTimeoutNow from 1343 and starts an election to get leadership."] [from=1343] [term=7] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.965 +00:00] [Info] [raft.rs:1517] ["starting a new election"] [term=7] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.965 +00:00] [Info] [raft.rs:1116] ["became candidate at term 8"] [term=8] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.965 +00:00] [Info] [raft.rs:1271] ["broadcasting vote request"] [to="[1111, 1343]"] [log_index=351669] [log_term=7] [term=8] [type=MsgRequestVote] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.966 +00:00] [Info] [raft.rs:2175] ["received votes response"] [term=8] [type=MsgRequestVoteResponse] [approvals=2] [rejections=0] [from=1111] [vote=true] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.966 +00:00] [Info] [raft.rs:1200] ["became leader at term 8"] [term=8] [raft_id=1332] [region_id=1108]
[2021/08/23 12:31:56.967 +00:00] [Info] [peer.rs:3664] ["require updating max ts"] [initial_status=34359738484] [region_id=1108]
[2021/08/23 12:31:56.967 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[]"] [peer_id=1332] [region_id=1108] [status=None]
[2021/08/23 12:31:56.967 +00:00] [Info] [peer.rs:3433] ["DEBUG DR: Unknow status"] [peer_id=1332] [region_id=1108] [term=8] [apply_to_current_term=false] [status=None]
[2021/08/23 12:31:56.967 +00:00] [Info] [pd.rs:1235] ["succeed to update max timestamp"] [region_id=1108]
[2021/08/23 12:31:56.968 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[]"] [peer_id=1332] [region_id=1108] [status=None]
[2021/08/23 12:31:56.968 +00:00] [Info] [peer.rs:3433] ["DEBUG DR: Unknow status"] [peer_id=1332] [region_id=1108] [term=8] [apply_to_current_term=false] [status=None]
[2021/08/23 12:31:57.123 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[]"] [peer_id=1332] [region_id=1108] [status=None]
[2021/08/23 12:31:57.123 +00:00] [Info] [peer.rs:3433] ["DEBUG DR: Unknow status"] [peer_id=1332] [region_id=1108] [term=8] [apply_to_current_term=false] [status=None]
[2021/08/23 12:32:56.967 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:33:56.968 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:34:56.969 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:35:56.970 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:36:56.971 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:37:56.972 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]
[2021/08/23 12:38:56.973 +00:00] [Info] [peer.rs:3419] ["still not reach integrity over label"] [progress="[(1111, 0, 351670), (1332, 0, 351670), (1343, 0, 351670)]"] [peer_id=1332] [region_id=1108] [status=Some(false)]

What is changed and how it works?

re-assign the commit group everytime after apply snapshot

Check List

Tests

  • Unit test
  • Integration test

Release note

raftstore: fix the committed group not assign after apply snapshot

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Aug 24, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • BusyJay
  • NingLin-P

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@nolouch
Copy link
Contributor Author

nolouch commented Aug 24, 2021

/rebuild

Signed-off-by: nolouch <nolouch@gmail.com>
@nolouch nolouch marked this pull request as ready for review August 25, 2021 04:49
@nolouch nolouch changed the title raftstore: fix the committed group not assign after apply sanpshot raftstore: fix the committed group not assign after apply snapshot Aug 25, 2021
@nolouch nolouch added the needs-cherry-pick-release-5.2 Type: Need cherry pick to release-5.2 label Aug 25, 2021
@nolouch
Copy link
Contributor Author

nolouch commented Aug 25, 2021

/test

Copy link
Member

@NingLin-P NingLin-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Aug 25, 2021
@nolouch
Copy link
Contributor Author

nolouch commented Aug 25, 2021

/test

@ti-chi-bot ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Aug 25, 2021
@nolouch
Copy link
Contributor Author

nolouch commented Aug 25, 2021

/merge

@ti-chi-bot
Copy link
Member

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

@nolouch: /merge is only allowed for the committers, you can assign this pull request to the committer in list by filling /assign @committer in the comment to help merge this pull request.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@NingLin-P
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@NingLin-P: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: e0c5f53

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label Aug 25, 2021
@nolouch
Copy link
Contributor Author

nolouch commented Aug 25, 2021

/test

@nolouch
Copy link
Contributor Author

nolouch commented Aug 25, 2021

/test

@ti-chi-bot ti-chi-bot merged commit cfb1ad1 into tikv:master Aug 25, 2021
ti-srebot pushed a commit to ti-srebot/tikv that referenced this pull request Aug 25, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-5.2 in PR #10826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.2 Type: Need cherry pick to release-5.2 release-note size/M status/can-merge Status: Can merge to base branch status/LGT2 Status: PR - There are already 2 approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

region cannot be INTEGRITY_OVER_LABEL in dr-auto-sync
5 participants