New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsr: synchronize op and commit_max updates on_prepare #1576
Conversation
This currently fails: when appending a prepare, we set op and commit_max separately. Will fix in the next commit. Seed: 1648503712295043012
654ad1e
to
4d7d2a6
Compare
}); | ||
self.commit_max = self.commit_min; | ||
} | ||
self.advance_commit_max(self.commit_min, @src().fn_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels something suspicious. I think it would be better to update commit_max
before the primary starts committing, not after it is done. Though I guess for solo mode updating later is actually desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I like the "on primary, commit_min == commit_max invariant" that we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, this feels extra tricky --- it seems like we could be advancing commit_max
here even if we are no longer primary, if view change happens while we are committing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 if we allow primary's commit_max
to run ahead of commit_min
, that might increase latency tolerance. If the primary is slow to commit, clients could get replies from backups which commit ahead of the primary.
@@ -1393,14 +1393,11 @@ pub fn ReplicaType( | |||
assert(message.header.op > self.op); | |||
assert(message.header.op > self.commit_min); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a assert(message.header.op > message.header.commit);
to emphasize that the self.advance_commit_max(message.header.commit, @src().fn_name);
below is ok even though we won't have loaded message.header
into our journal yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
advancing commit_max should be ok even if we don't have a header? We allow maybe(self.op < self.commit_max)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point!
src/vsr/replica.zig
Outdated
@@ -309,7 +309,6 @@ pub fn ReplicaType( | |||
/// * never decreases. | |||
/// Invariants (status=normal primary): | |||
/// * `replica.commit_max = replica.commit_min`. | |||
/// * `replica.commit_max = replica.op - pipeline.queue.prepare_queue.count`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this not hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, forgot to mention that I was confused by this. This almost never holds? For example, in an idle cluster replica.commit_max = replica.op
. Or am I misreading this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline.queue.prepare_queue.count
is the number of items currently in the pipeline, not the capacity. So in an idle cluster pipeline.queue.prepare_queue.count == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed, I misread that as pipeline_prepare_queue_max
🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now advance_commit_max actually checks that invariant (had to move pipeline prepare popping to commit_op for that)
@@ -784,7 +783,8 @@ pub fn ReplicaType( | |||
|
|||
if (self.solo()) { | |||
if (self.commit_min < self.op) { | |||
self.commit_journal(self.op); | |||
self.advance_commit_max(self.op, @src().fn_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether I like @src().fn_name
. On the one hand, it is cryptic and too clever. On the other hand, we did have bugs where we moved code around but didn't update call-sites.
If we double-down here, we might pass the whole @src()
in even
763793e
to
4196714
Compare
Before, `commit_journal` played dual role: * it advanced commit_max * it tried executing everything up to commit_max In parciular, we had funny calls `commit_journal(self.commit_max)`, which neutered the first role in a somewhat roundabout way. Now, we have two functions: * "sensitive" `advance_commit_max` to be called when you have learned something that justifies _logically_ committing more * "safe" `commit_journal`, which can be called whenever, and just executes locally available prepares up to commit_max There shouldn't be any behavior changes here, but in the next commit I'll tweak the logic in `on_prepare` to advance commit_max _before_ appending the prepare, so as not to break op/commit_max invariant even temporarily.
Before, op was updated first, and commit_max at the end, breaking ``` assert(self.commit_max >= self.op -| constants.pipeline_prepare_queue_max); ``` invariant Seed: 1648503712295043012
4196714
to
3d021fd
Compare
Apparently, it's better not to forec-push VOPR-labeled pull requests! |
assert(prepare.message.header.op == self.commit_max); | ||
assert(prepare.ok_quorum_received); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did like how "pop prepare" was immediately followed by "pop request -> push prepare to pipeline". But commit_op
is already pretty gigantic 😓
Before, op was updated first, and commit_max at the end, breaking
invariant. Broken invariant was observed by, eg, op_repair_min (see the first commit).
To fix this: