Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore-v2: support tracing peer lifetime #14056

Merged
merged 11 commits into from Feb 3, 2023

Conversation

BusyJay
Copy link
Member

@BusyJay BusyJay commented Jan 16, 2023

What is changed and how it works?

Issue Number: Ref #12842

What's Changed:

In V1, a peer is responsible to destroy itself. The design is to make leader do
less work and reduce writes. But from the practice of the pass years, not
making it a strong guarantee actually makes the implementation
complicated and hard to be correct and difficult to understand.

In V2, we changes to make leader the very role to make sure all removed
peers or merged peers must be destroyed in the end. Push mode is way
easier to understand and implement correctly. The downside is extra writes
are introduced but it's worthy.

The PR contains implementation of merge cleanup. But the initialize part should be done in #13818 and its related PRs.

Check List

Tests

  • Unit test
  • Integration test

Release note

None

In V1, a peer is responsible to destroy itself. The design is to make leader
do less work and reduce writes. But from the practice of the pass years, not
making it a strong guarantee actually makes the implementation complicated
and hard to be correct and difficult to understand.

In V2, we changes to make leader the very role to make sures all removed
peers or merged peers must be destroyed in the end. Push mode is way easier
to understand and implement correctly. The downside is extra writes are
introduced but it's worthy.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 16, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • tabokie
  • tonyxuqqi

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@BusyJay
Copy link
Member Author

BusyJay commented Jan 29, 2023

PTAL

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@overvenus
Copy link
Member

In V2, we changes to make leader the very role to make sure all removed
peers or merged peers must be destroyed in the end.

What if restarting tikv cluster after majority peers were destroyed? How does tikv GC remaining peers?

@BusyJay
Copy link
Member Author

BusyJay commented Jan 29, 2023

What if restarting tikv cluster after majority peers were destroyed?

How can that happen? Peers are destroyed by conf change. As long as conf change can proceed, there must be majorities.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@@ -75,6 +75,7 @@ pub struct StoreContext<EK: KvEngine, ER: RaftEngine, T> {
pub schedulers: Schedulers<EK, ER>,
/// store meta
pub store_meta: Arc<Mutex<StoreMeta<EK>>>,
pub shutdown: Arc<AtomicBool>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this set? And FYI there's already a shutdown in StoreSystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the exact same bool flag. It's for query only.

@@ -527,10 +557,27 @@ impl<EK: KvEngine, ER: RaftEngine> Peer<EK, ER> {
.flat_map(|m| self.build_raft_message(m))
.collect();
}
if self.has_pending_messages() {
if write_task.messages.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid unnecessary allocation and memcopy.

// return;
match msg.get_extra_msg().get_type() {
ExtraMessageType::MsgGcPeerResponse => self.on_gc_peer_response(&msg),
_ => unimplemented!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about MsgGcPeerRequest?

Some(msg)
}

fn tell_source_peer_to_destroy<EK, ER, T>(ctx: &mut StoreContext<EK, ER, T>, msg: &RaftMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_source_peer, source/target can be confused with msg sender/recipient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source/target is referring merge within this project unless stated explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You used target to represent message receiver at L309.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. There's no hurt to make it more explicit as this file is not about merge. Or, at least have a comment to clarify source_peer is merge source peer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also regarding the naming, tell is a bit too oral and maybe request is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already update the target usage at L309. source peer and target peer are very frequent within in the project. It's more ambiguous to make it represent different meanings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tell is actually a technical term in actor. But I will change it to forward instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in existing code, it is rare to find usage of "source" directly without hinting the context of merge. Here this function alone is confusing to grasp because neither the implementation nor the function name implies such context. You have to trace it back to caller to get the context, which is not ideal. I think here the least you could do is adding some comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is rare to find usage of "source" directly without hinting the context of merge.

I think this explains "source/target is referring merge within this project unless stated explicitly" perfectly. A comment is also OK.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
.iter()
.all(|p1| p1.get_store_id() != p0.get_store_id())
{
removed_records.push(p0.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need dedup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's impossible to be duplicated.

@@ -260,7 +306,223 @@ impl Store {
}
}

/// Tell leader that `to_peer` is destroyed.
fn report_peer_destroyed(tombstone_msg: &mut RaftMessage) -> Option<RaftMessage> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think the fn name could be renamed as create_raft_tombstone_message(), as there's no report logic in this function at all.

@@ -110,6 +111,10 @@ impl<EK: KvEngine, ER: RaftEngine> Peer<EK, ER> {
}
}
AdminCmdType::CompactLog => self.propose_compact_log(ctx, req),
AdminCmdType::UpdateGcPeer => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, we rely on this cmd to propagate the removed peer list.
But what if a conf change happen and a newly added node become leader. And at that time how does the new node get the removed node list? (maybe this cmd not run yet before that node get elected).
My original thought is this information needs to be transferred from the snapshot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is part of region local state and snapshot state. Every node should have the exact same peer list when they applies to the same index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol updates can be found in pingcap/kvproto#1040.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol updates can be found in pingcap/kvproto#1040.

I see. So it's part of snapshot. Thanks.

if write_task.messages.is_empty() {
write_task.messages = self.take_pending_messages();
} else {
write_task.messages.extend(self.take_pending_messages());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append is better.

@ti-chi-bot ti-chi-bot added status/LGT1 Status: PR - There is already 1 approval needs-rebase labels Jan 31, 2023
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@ti-chi-bot ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Feb 2, 2023
@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/merge

@ti-chi-bot
Copy link
Member

@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: b227f9f

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label Feb 3, 2023
@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/test

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@ti-chi-bot ti-chi-bot removed status/can-merge Status: Can merge to base branch needs-rebase labels Feb 3, 2023
@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/merge

@ti-chi-bot
Copy link
Member

@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 45bd91c

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label Feb 3, 2023
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@ti-chi-bot ti-chi-bot removed the status/can-merge Status: Can merge to base branch label Feb 3, 2023
@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/test

@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/merge

@ti-chi-bot
Copy link
Member

@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: df107c5

@ti-chi-bot ti-chi-bot added the status/can-merge Status: Can merge to base branch label Feb 3, 2023
@ti-chi-bot
Copy link
Member

@BusyJay: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/test

1 similar comment
@BusyJay
Copy link
Member Author

BusyJay commented Feb 3, 2023

/test

@ti-chi-bot ti-chi-bot merged commit 6daed4f into tikv:master Feb 3, 2023
@ti-chi-bot ti-chi-bot added this to the Pool milestone Feb 3, 2023
@BusyJay BusyJay deleted the introduce-gc-peers branch February 3, 2023 10:59
// No need to set epoch as we don't know what it is.
tombstone_msg
.mut_extra_msg()
.set_type(ExtraMessageType::MsgGcPeerRequest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be MsgGcPeerResponse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be MsgGcPeerRequest. As the function name and comment says, the request is forwarded to source peer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget it, I misread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is confusing though, is the use of MsgGcPeerRequest, when in fact only the set_is_tombstone is needed. For a reader reading MsgGcPeerRequest, he naturally wants to know the handling of it by skipping to on_gc_peer_request, but that function only makes sense for target peer not source peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MsgGcPeerRequest is necessary. is_tombstone is used by regular destroy, which indicates from is in the same region as to. MsgGcPeerRequest is for merge destroy, from may not be in the same region as to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/XXL status/can-merge Status: Can merge to base branch status/LGT2 Status: PR - There are already 2 approvals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants