Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vopr: state checker knows about all commits in liveness mode #1707

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

matklad
Copy link
Member

@matklad matklad commented Mar 15, 2024

We currently switch to liveness mode only when everything is committed by the cluster, which is not as live as we ideally want to get, but that's a start.

In particular, when liveness mode checks for too many corruptions, it makes use of the fact that for every op the state checker knows the correct header.

Before upgrades, this invariant was enforced by checking that every request received a reply. But upgrade requests don't generate replies!

So it might be the case that, at the time we transition to liveness mode, there's still an uncommitted op!

Fix this by checking that at least one replica upgraded all the way up to the latest release.

Seed: 2749715711070932366
Closes: #1697

We currently switch to liveness mode only when everything is committed
by the cluster, which is not as live as we ideally want to get, but
that's a start.

In particular, when liveness mode checks for too many corruptions, it
makes use of the fact that for every op the state checker knows the
correct header.

Before upgrades, this invariant was enforced by checking that every
request received a reply. But upgrade requests don't generate replies!

So it might be the case that, at the time we transition to liveness
mode, there's still an uncommitted op!

Fix this by checking that at least one replica upgraded all the way up
to the latest release.

Seed: 2749715711070932366
Closes: #1697
@matklad matklad added this pull request to the merge queue Mar 15, 2024
Merged via the queue into main with commit a5f4b57 Mar 15, 2024
27 checks passed
@matklad matklad deleted the matklad/liveness-release branch March 15, 2024 14:09
@matklad
Copy link
Member Author

matklad commented Mar 15, 2024

Huh, it seems like a similar failure still triggers: #1712

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash: 2749715711070932366
2 participants