Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSR: Use previous checkpoint_id in prepares' headers #1501

Merged

Conversation

sentientwaffle
Copy link
Member

@sentientwaffle sentientwaffle commented Jan 30, 2024

Background

Right now, a replica will not prepare (or even queue) requests whose op would extend beyond their next checkpoint's trigger op.

This is problematic for performance (the "checkpoint latency spike"), since the replica enters the new checkpoint with nothing to commit.
And the "latency spike" is farther exacerbated by the fact that clients will back off retrying their requests.

Originally we didn't start preparing past the checkpoint to guard against overwriting WAL entries before they will definitely not be needed again. But thanks to vsr_checkpoint_interval, that reason does not apply.

However, prepare headers include a checkpoint_id (motivation here). That checkpoint id isn't available until the checkpoint trigger prepare commits.

Fix

The first constants.pipeline_prepare_queue_max prepares that follow a checkpoint trigger op will now contain the previous checkpoint's id, since that is available both before and after that checkpoint trigger commits.

These are called the "border" prepares – they may be prepared during either checkpoint. (This part is not implemented in this commit, though!)

All prepares within a checkpoint include the checkpoint_id of the previous checkpoint. The transition occurs at the checkpoint op.

# Background

Right now, a replica will not prepare (or even queue) requests whose op would extend beyond their next checkpoint's trigger op.

This is problematic for performance (the "checkpoint latency spike"), since the replica enters the new checkpoint with nothing to commit.
And the "latency spike" is farther exacerbated by the fact that clients will back off retrying their requests.

Originally we didn't start preparing past the checkpoint to guard against overwriting WAL entries before they will definitely not be needed again.
But thanks to `vsr_checkpoint_interval`, that reason does not apply.

However, prepare headers include a `checkpoint_id` ([motivation here](https://github.com/tigerbeetle/tigerbeetle/blob/30cfbfa2eca94b8cd0b1d2a8ce41c7e7720128f0/src/vsr/message_header.zig#L531-L539)). That checkpoint id isn't available until the checkpoint trigger prepare commits.

# Fix

The first `constants.pipeline_prepare_queue_max` prepares that follow a checkpoint trigger op will now contain the _previous_ checkpoint's id, since that is available both before and after that checkpoint trigger commits.

These are called the "border" prepares – they may be prepared during either checkpoint. (This part if not implemented in this commit, though!)
@sentientwaffle sentientwaffle force-pushed the dj-vsr-prepare-beyond-checkpoint-border-prepares branch from 2c5e3f4 to 19f069c Compare January 30, 2024 22:11
@sentientwaffle sentientwaffle marked this pull request as ready for review January 30, 2024 22:22
@matklad
Copy link
Member

matklad commented Jan 31, 2024

Joran floated an interesting idea during our 1-1 today: what if we don't do border prepares, and instead include checkpoint id of the previous checkpoint for all prepares, so that we don't have a special case?

I think this not only removes special case, but simplifies the mental model overall somewhat: prepares before the first trigger are not special. All prepares in a single checkpoint have the same checkpoint id (that of the previous checkpoint).

Comment on lines +821 to +823
// NOTE: Within the vsr_state.checkpoint assignment below, do not read from vsr_state
// directly. A miscompilation bug (as of Zig 0.11.0) causes fields to receive the
// incorrect values.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not sure if this is technically a 'miscompilation' or just underspecification, but it is surprising and worth a note either way.)

@sentientwaffle sentientwaffle force-pushed the dj-vsr-prepare-beyond-checkpoint-border-prepares branch from 31f7ba8 to d73b9ee Compare January 31, 2024 20:59
@sentientwaffle sentientwaffle changed the title VSR: Use previous checkpoint_id in border prepares' headers VSR: Use previous checkpoint_id in prepares' headers Feb 1, 2024
/// Returns checkpoint id associated with the op.
///
/// Normally, this is just the id of the checkpoint the op builds on top. However, ops
/// Normally, this is just the id of the op's previous checkpoint. However, ops
/// between a checkpoint and its trigger can't know checkpoint's id yet, and instead use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have trigger special case anymore.

@@ -5013,35 +5014,41 @@ pub fn ReplicaType(
return vsr.Checkpoint.trigger_for_checkpoint(self.op_checkpoint_next()).?;
}

/// Returns the highest op that this replica can safely prepare to its WAL.
fn op_checkpoint_next_border(self: *const Self) u64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we dont' have "border prepares", the "border" name doesn't seem really great, though I can't suggest a better alternative...

Maybe:

  • trigger --- the op when the checkpoints starts
  • due --- the op by which the checkpoint should have completed

Or maybe even not introduce any name here?

self.op_checkpoint_next_trigger() + constants.pipeline_prepare_queue_max

actually reads great to me: trigger, plus whatever we can pipeline over the trigger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR in the sequence (which actually enables "prepare beyond the checkpoint") op_checkpoint_next_border is used very frequently, so imo having it in a helper function rather than inlining self.op_checkpoint_next_trigger() + constants.pipeline_prepare_queue_max is quite useful.

It is used in ways that "trigger" is often used right now -- "the highest op that we will accept in our WAL". So I don't think "trigger" or "due" make sense. Maybe "limit" or "ceiling"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yeah, due has inverted semantics here -- checkpoint is indeed due to be done by the time we accept that op into our WAL, but, in code, we don't block until checkpoint is ready, we rather drop prepares.

Can't say that limit or ceiling is much better than the border. Hm, maybe

op_checkpoint_pipeline_limit

? I think "pipeline" is key here. Maybe even just

op_pipeline_limit

? The fact that it depends on the checkpoint is sort of an implementation detail I think?

Though, no strong opinion either way, we can live with border_prepare as well .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm going to leave it as "border" at least for now. The fact that it is a term that is not used elsewhere in the code (so can't be confused with anything else) makes up for the fact that it is not self-explaining.

@sentientwaffle sentientwaffle added this pull request to the merge queue Feb 1, 2024
Merged via the queue into main with commit 09634c7 Feb 1, 2024
25 checks passed
@sentientwaffle sentientwaffle deleted the dj-vsr-prepare-beyond-checkpoint-border-prepares branch February 1, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants