New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raftstore-v2: support tracing peer lifetime #14056
Conversation
In V1, a peer is responsible to destroy itself. The design is to make leader do less work and reduce writes. But from the practice of the pass years, not making it a strong guarantee actually makes the implementation complicated and hard to be correct and difficult to understand. In V2, we changes to make leader the very role to make sures all removed peers or merged peers must be destroyed in the end. Push mode is way easier to understand and implement correctly. The downside is extra writes are introduced but it's worthy. Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
PTAL |
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
What if restarting tikv cluster after majority peers were destroyed? How does tikv GC remaining peers? |
How can that happen? Peers are destroyed by conf change. As long as conf change can proceed, there must be majorities. |
@@ -75,6 +75,7 @@ pub struct StoreContext<EK: KvEngine, ER: RaftEngine, T> { | |||
pub schedulers: Schedulers<EK, ER>, | |||
/// store meta | |||
pub store_meta: Arc<Mutex<StoreMeta<EK>>>, | |||
pub shutdown: Arc<AtomicBool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this set? And FYI there's already a shutdown in StoreSystem
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the exact same bool flag. It's for query only.
@@ -527,10 +557,27 @@ impl<EK: KvEngine, ER: RaftEngine> Peer<EK, ER> { | |||
.flat_map(|m| self.build_raft_message(m)) | |||
.collect(); | |||
} | |||
if self.has_pending_messages() { | |||
if write_task.messages.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid unnecessary allocation and memcopy.
// return; | ||
match msg.get_extra_msg().get_type() { | ||
ExtraMessageType::MsgGcPeerResponse => self.on_gc_peer_response(&msg), | ||
_ => unimplemented!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about MsgGcPeerRequest
?
Some(msg) | ||
} | ||
|
||
fn tell_source_peer_to_destroy<EK, ER, T>(ctx: &mut StoreContext<EK, ER, T>, msg: &RaftMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge_source_peer, source/target can be confused with msg sender/recipient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source/target is referring merge within this project unless stated explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You used target to represent message receiver at L309.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. There's no hurt to make it more explicit as this file is not about merge. Or, at least have a comment to clarify source_peer is merge source peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also regarding the naming, tell is a bit too oral and maybe request is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already update the target
usage at L309. source peer
and target peer
are very frequent within in the project. It's more ambiguous to make it represent different meanings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tell
is actually a technical term in actor. But I will change it to forward
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in existing code, it is rare to find usage of "source" directly without hinting the context of merge. Here this function alone is confusing to grasp because neither the implementation nor the function name implies such context. You have to trace it back to caller to get the context, which is not ideal. I think here the least you could do is adding some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is rare to find usage of "source" directly without hinting the context of merge.
I think this explains "source/target is referring merge within this project unless stated explicitly" perfectly. A comment is also OK.
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
.iter() | ||
.all(|p1| p1.get_store_id() != p0.get_store_id()) | ||
{ | ||
removed_records.push(p0.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need dedup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's impossible to be duplicated.
@@ -260,7 +306,223 @@ impl Store { | |||
} | |||
} | |||
|
|||
/// Tell leader that `to_peer` is destroyed. | |||
fn report_peer_destroyed(tombstone_msg: &mut RaftMessage) -> Option<RaftMessage> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I think the fn name could be renamed as create_raft_tombstone_message(), as there's no report logic in this function at all.
@@ -110,6 +111,10 @@ impl<EK: KvEngine, ER: RaftEngine> Peer<EK, ER> { | |||
} | |||
} | |||
AdminCmdType::CompactLog => self.propose_compact_log(ctx, req), | |||
AdminCmdType::UpdateGcPeer => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, we rely on this cmd to propagate the removed peer list.
But what if a conf change happen and a newly added node become leader. And at that time how does the new node get the removed node list? (maybe this cmd not run yet before that node get elected).
My original thought is this information needs to be transferred from the snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list is part of region local state and snapshot state. Every node should have the exact same peer list when they applies to the same index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol updates can be found in pingcap/kvproto#1040.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol updates can be found in pingcap/kvproto#1040.
I see. So it's part of snapshot. Thanks.
if write_task.messages.is_empty() { | ||
write_task.messages = self.take_pending_messages(); | ||
} else { | ||
write_task.messages.extend(self.take_pending_messages()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append is better.
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
/merge |
@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: b227f9f
|
/test |
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
/merge |
@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 45bd91c
|
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
/test |
/merge |
@BusyJay: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: df107c5
|
@BusyJay: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/test |
1 similar comment
/test |
// No need to set epoch as we don't know what it is. | ||
tombstone_msg | ||
.mut_extra_msg() | ||
.set_type(ExtraMessageType::MsgGcPeerRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be MsgGcPeerResponse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it should be MsgGcPeerRequest
. As the function name and comment says, the request is forwarded to source peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget it, I misread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is confusing though, is the use of MsgGcPeerRequest
, when in fact only the set_is_tombstone
is needed. For a reader reading MsgGcPeerRequest
, he naturally wants to know the handling of it by skipping to on_gc_peer_request
, but that function only makes sense for target peer not source peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MsgGcPeerRequest
is necessary. is_tombstone
is used by regular destroy, which indicates from
is in the same region as to
. MsgGcPeerRequest
is for merge destroy, from
may not be in the same region as to
.
What is changed and how it works?
Issue Number: Ref #12842
What's Changed:
The PR contains implementation of merge cleanup. But the initialize part should be done in #13818 and its related PRs.
Check List
Tests
Release note