-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VSR: Use previous checkpoint_id
in prepares' headers
#1501
VSR: Use previous checkpoint_id
in prepares' headers
#1501
Conversation
# Background Right now, a replica will not prepare (or even queue) requests whose op would extend beyond their next checkpoint's trigger op. This is problematic for performance (the "checkpoint latency spike"), since the replica enters the new checkpoint with nothing to commit. And the "latency spike" is farther exacerbated by the fact that clients will back off retrying their requests. Originally we didn't start preparing past the checkpoint to guard against overwriting WAL entries before they will definitely not be needed again. But thanks to `vsr_checkpoint_interval`, that reason does not apply. However, prepare headers include a `checkpoint_id` ([motivation here](https://github.com/tigerbeetle/tigerbeetle/blob/30cfbfa2eca94b8cd0b1d2a8ce41c7e7720128f0/src/vsr/message_header.zig#L531-L539)). That checkpoint id isn't available until the checkpoint trigger prepare commits. # Fix The first `constants.pipeline_prepare_queue_max` prepares that follow a checkpoint trigger op will now contain the _previous_ checkpoint's id, since that is available both before and after that checkpoint trigger commits. These are called the "border" prepares – they may be prepared during either checkpoint. (This part if not implemented in this commit, though!)
2c5e3f4
to
19f069c
Compare
Joran floated an interesting idea during our 1-1 today: what if we don't do border prepares, and instead include checkpoint id of the previous checkpoint for all prepares, so that we don't have a special case? I think this not only removes special case, but simplifies the mental model overall somewhat: prepares before the first trigger are not special. All prepares in a single checkpoint have the same checkpoint id (that of the previous checkpoint). |
// NOTE: Within the vsr_state.checkpoint assignment below, do not read from vsr_state | ||
// directly. A miscompilation bug (as of Zig 0.11.0) causes fields to receive the | ||
// incorrect values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not sure if this is technically a 'miscompilation' or just underspecification, but it is surprising and worth a note either way.)
31f7ba8
to
d73b9ee
Compare
checkpoint_id
in border prepares' headerscheckpoint_id
in prepares' headers
src/vsr/replica.zig
Outdated
/// Returns checkpoint id associated with the op. | ||
/// | ||
/// Normally, this is just the id of the checkpoint the op builds on top. However, ops | ||
/// Normally, this is just the id of the op's previous checkpoint. However, ops | ||
/// between a checkpoint and its trigger can't know checkpoint's id yet, and instead use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have trigger special case anymore.
@@ -5013,35 +5014,41 @@ pub fn ReplicaType( | |||
return vsr.Checkpoint.trigger_for_checkpoint(self.op_checkpoint_next()).?; | |||
} | |||
|
|||
/// Returns the highest op that this replica can safely prepare to its WAL. | |||
fn op_checkpoint_next_border(self: *const Self) u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we dont' have "border prepares", the "border" name doesn't seem really great, though I can't suggest a better alternative...
Maybe:
- trigger --- the
op
when the checkpoints starts - due --- the
op
by which the checkpoint should have completed
Or maybe even not introduce any name here?
self.op_checkpoint_next_trigger() + constants.pipeline_prepare_queue_max
actually reads great to me: trigger, plus whatever we can pipeline over the trigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR in the sequence (which actually enables "prepare beyond the checkpoint") op_checkpoint_next_border
is used very frequently, so imo having it in a helper function rather than inlining self.op_checkpoint_next_trigger() + constants.pipeline_prepare_queue_max
is quite useful.
It is used in ways that "trigger" is often used right now -- "the highest op that we will accept in our WAL". So I don't think "trigger" or "due" make sense. Maybe "limit" or "ceiling"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yeah, due
has inverted semantics here -- checkpoint is indeed due to be done by the time we accept that op
into our WAL, but, in code, we don't block until checkpoint is ready, we rather drop prepares.
Can't say that limit
or ceiling
is much better than the border. Hm, maybe
op_checkpoint_pipeline_limit
? I think "pipeline" is key here. Maybe even just
op_pipeline_limit
? The fact that it depends on the checkpoint is sort of an implementation detail I think?
Though, no strong opinion either way, we can live with border_prepare
as well .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'm going to leave it as "border" at least for now. The fact that it is a term that is not used elsewhere in the code (so can't be confused with anything else) makes up for the fact that it is not self-explaining.
Background
Right now, a replica will not prepare (or even queue) requests whose op would extend beyond their next checkpoint's trigger op.
This is problematic for performance (the "checkpoint latency spike"), since the replica enters the new checkpoint with nothing to commit.
And the "latency spike" is farther exacerbated by the fact that clients will back off retrying their requests.
Originally we didn't start preparing past the checkpoint to guard against overwriting WAL entries before they will definitely not be needed again. But thanks to
vsr_checkpoint_interval
, that reason does not apply.However, prepare headers include a
checkpoint_id
(motivation here). That checkpoint id isn't available until the checkpoint trigger prepare commits.Fix
The firstconstants.pipeline_prepare_queue_max
prepares that follow a checkpoint trigger op will now contain the previous checkpoint's id, since that is available both before and after that checkpoint trigger commits.These are called the "border" prepares – they may be prepared during either checkpoint. (This part is not implemented in this commit, though!)All prepares within a checkpoint include the
checkpoint_id
of the previous checkpoint. The transition occurs at the checkpoint op.