-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsr: prevent state sync from breaking hash chain #1598
Conversation
af60c36
to
2335b56
Compare
2335b56
to
cf25219
Compare
src/vsr/replica_test.zig
Outdated
} | ||
}.drop_message); | ||
try c.request(checkpoint_1_trigger + 1, checkpoint_1 - 1); | ||
try expectEqual(a0.op_head(), checkpoint_1_trigger - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I was expecting a0's head to be checkpoint_1_trigger + 1
. But I suppose with a commit-max of checkpoint_1 - 1
and both a pipeline and batch-multiple of 4 a0 won't prepare that far ahead. But then why request it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I was expecting a0's head to be checkpoint_1_trigger + 1.
Huh, I think I expected the same, and ended up with checkpoint_1_trigger - 1
by accident after fighting unrelated test failures.
My thinking here is that preparing all the way to past the trigger would be a more interesting test, but, indeed, it seems we can't do that given that pipeline is not longer that batch multiple.
Simplified this to just checkpoint_1 + 1
, this still triggers the assert.
if (candidate.view > self.view) { | ||
log.debug("{}: on_{s}: jump_sync_target: ignoring, newer view" ++ | ||
" (view={} candidate.view={})", .{ | ||
self.replica, | ||
@tagName(header.command), | ||
self.view, | ||
candidate.view, | ||
}); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if I understand how this works:
In on_start_view()
, we will reject the SV and transition to state sync if all of the SV's headers are beyond our prepare_max
:
for (view_headers.slice) |*header| {
assert(header.commit <= message.header.commit);
if (header.op <= self.op_prepare_max()) {
if (self.log_view < self.view or
(self.log_view == self.view and header.op >= self.op))
{
self.set_op_and_commit_max(header.op, message.header.commit, @src());
assert(self.op == header.op);
assert(self.commit_max >= message.header.commit);
break;
}
}
} else {
// This replica is too far behind, i.e. the new `self.op` is too far ahead of
// the last checkpoint. If we wrap now, we overwrite un-checkpointed transfers
// in the WAL, precluding recovery.
if (self.syncing == .idle) {
log.warn("{}: on_start_view: start sync; lagging behind cluster " ++
"(op_prepare_max={} quorum_head={})", .{
self.replica,
self.op_prepare_max(),
view_headers.slice[0].op,
});
self.sync_start_from_committing();
}
return;
}
So to advance our view (such that this new jump_sync_target()
will not drop the message) we rely on pings or other messages that jump_view()
uses to transition to view-change status (not normal status, since we will ignore the SV that we would request).
And jump_view()
does indeed precede jump_sync_target()
, so that can happen with a single ping:
self.jump_view(message.header);
self.jump_sync_target(message.header);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly! This is related to me saying "the code already has the fix". My original thinking was that, indeed, accepting SV requires to do the state sync first. But to do state sync we need to change view. And when we receive a commit we don't jump view immediately, but wait for SV, which seems to create a deadlock.
So the fix I was planing to implement here was to jump to view_change
when we receive a checkpoint target from the next log wrap and the next view.
But, when I started coding, I realised that there's already code to do that! Right here:
tigerbeetle/src/vsr/replica.zig
Lines 1933 to 1935 in 7daa0b6
if (self.view < message.header.view) { | |
self.transition_to_view_change_status(message.header.view); | |
} |
That is, even though we don't append any headers from an SV, we still go into view_change! And that resolves the deadlock.
So the full sequence of events is this:
- we receive a
.commit
with the next view & checkpoint - when
jump_view
on this commit, we send out.request_start_view
- when
jump_sync_target
on this commit, we ignore the target, as it's in the future view - sometime later, we receive the requested SV
- if we install anything from that SV, we transition to .normal in the new view and can proceed with state sync
- if we don't install anything, we trantisition to .view_change in the new view, and can still proceed with state sync (and will ask for another SV after sync is done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that makes sense, I forgot that on_start_view
did that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made merealize that there's a subtlety here! I forgot about view_durable
and check only for view
, but this is still correct:
when view_durable is not updated, we can crash and restart into an earlier view, but that also means that we'll restart without newer sync target, so it ends up ok in the end.
Added a maybe to that effect.
The added test is a minimization of #1532. If replica state-syncs into a newer view, its log may be disconnected from a checkpoint. To fix this, require that replica jumps view before jumping sync target. The effect here is that, after state sync, there's _still_ a break between the checkpoint and the log, but, as replica is in a view_change state, it doesn't assume valid hash chain and waits until an SV to correct the log. SEED: 17270152460592452540 Closes: #1532
cf25219
to
eced32a
Compare
cc8ec7b
to
7b18605
Compare
It is correct to check only view, not view_durable here. This can't lead to a situation where we crash and restart with an older view and a newer sync target, because superblock updates are serialized.
7b18605
to
058b7c8
Compare
The added test is a minimization of #1532.
If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.
The effect here is that, after state sync, there's still a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.
SEED: 17270152460592452540
Closes: #1532