Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsr: prevent state sync from breaking hash chain #1598

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

matklad
Copy link
Member

@matklad matklad commented Feb 26, 2024

The added test is a minimization of #1532.

If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.

The effect here is that, after state sync, there's still a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.

SEED: 17270152460592452540
Closes: #1532

@matklad matklad changed the title wip: reproduce test failure vsr: prevent state sync from breaking hash chain Feb 27, 2024
@matklad matklad marked this pull request as ready for review February 27, 2024 13:09
src/vsr/replica_test.zig Outdated Show resolved Hide resolved
src/vsr/replica_test.zig Outdated Show resolved Hide resolved
}
}.drop_message);
try c.request(checkpoint_1_trigger + 1, checkpoint_1 - 1);
try expectEqual(a0.op_head(), checkpoint_1_trigger - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I was expecting a0's head to be checkpoint_1_trigger + 1. But I suppose with a commit-max of checkpoint_1 - 1 and both a pipeline and batch-multiple of 4 a0 won't prepare that far ahead. But then why request it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I was expecting a0's head to be checkpoint_1_trigger + 1.

Huh, I think I expected the same, and ended up with checkpoint_1_trigger - 1 by accident after fighting unrelated test failures.

My thinking here is that preparing all the way to past the trigger would be a more interesting test, but, indeed, it seems we can't do that given that pipeline is not longer that batch multiple.

Simplified this to just checkpoint_1 + 1, this still triggers the assert.

Comment on lines +8392 to +8401
if (candidate.view > self.view) {
log.debug("{}: on_{s}: jump_sync_target: ignoring, newer view" ++
" (view={} candidate.view={})", .{
self.replica,
@tagName(header.command),
self.view,
candidate.view,
});
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if I understand how this works:

In on_start_view(), we will reject the SV and transition to state sync if all of the SV's headers are beyond our prepare_max:

            for (view_headers.slice) |*header| {
                assert(header.commit <= message.header.commit);

                if (header.op <= self.op_prepare_max()) {
                    if (self.log_view < self.view or
                        (self.log_view == self.view and header.op >= self.op))
                    {
                        self.set_op_and_commit_max(header.op, message.header.commit, @src());
                        assert(self.op == header.op);
                        assert(self.commit_max >= message.header.commit);
                        break;
                    }
                }
            } else {
                // This replica is too far behind, i.e. the new `self.op` is too far ahead of
                // the last checkpoint. If we wrap now, we overwrite un-checkpointed transfers
                // in the WAL, precluding recovery.
                if (self.syncing == .idle) {
                    log.warn("{}: on_start_view: start sync; lagging behind cluster " ++
                        "(op_prepare_max={} quorum_head={})", .{
                        self.replica,
                        self.op_prepare_max(),
                        view_headers.slice[0].op,
                    });
                    self.sync_start_from_committing();
                }
                return;
            }

So to advance our view (such that this new jump_sync_target() will not drop the message) we rely on pings or other messages that jump_view() uses to transition to view-change status (not normal status, since we will ignore the SV that we would request).

And jump_view() does indeed precede jump_sync_target(), so that can happen with a single ping:

            self.jump_view(message.header);
            self.jump_sync_target(message.header);

Copy link
Member Author

@matklad matklad Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly! This is related to me saying "the code already has the fix". My original thinking was that, indeed, accepting SV requires to do the state sync first. But to do state sync we need to change view. And when we receive a commit we don't jump view immediately, but wait for SV, which seems to create a deadlock.

So the fix I was planing to implement here was to jump to view_change when we receive a checkpoint target from the next log wrap and the next view.

But, when I started coding, I realised that there's already code to do that! Right here:

if (self.view < message.header.view) {
self.transition_to_view_change_status(message.header.view);
}

That is, even though we don't append any headers from an SV, we still go into view_change! And that resolves the deadlock.

So the full sequence of events is this:

  • we receive a .commit with the next view & checkpoint
  • when jump_view on this commit, we send out .request_start_view
  • when jump_sync_target on this commit, we ignore the target, as it's in the future view
  • sometime later, we receive the requested SV
  • if we install anything from that SV, we transition to .normal in the new view and can proceed with state sync
  • if we don't install anything, we trantisition to .view_change in the new view, and can still proceed with state sync (and will ask for another SV after sync is done)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that makes sense, I forgot that on_start_view did that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made merealize that there's a subtlety here! I forgot about view_durable and check only for view, but this is still correct:

when view_durable is not updated, we can crash and restart into an earlier view, but that also means that we'll restart without newer sync target, so it ends up ok in the end.

Added a maybe to that effect.

The added test is a minimization of #1532.

If replica state-syncs into a newer view, its log may be disconnected
from a checkpoint. To fix this, require that replica jumps view before
jumping sync target.

The effect here is that, after state sync, there's _still_ a break
between the checkpoint and the log, but, as replica is in a view_change
state, it doesn't assume valid hash chain and waits until an SV to
correct the log.

SEED: 17270152460592452540
Closes: #1532
sentientwaffle
sentientwaffle previously approved these changes Feb 27, 2024
@matklad matklad added this pull request to the merge queue Feb 27, 2024
@matklad matklad removed this pull request from the merge queue due to a manual request Feb 27, 2024
It is correct to check only view, not view_durable here. This can't lead
to a situation where we crash and restart with an older view and a newer
sync target, because superblock updates are serialized.
@matklad matklad added this pull request to the merge queue Feb 27, 2024
Merged via the queue into main with commit 740a4f1 Feb 27, 2024
27 checks passed
@matklad matklad deleted the matklad/sync-recovering-head branch February 27, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash: 17270152460592452540
2 participants