-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VSR: Move client sessions from superblock to grid #1356
Conversation
506f574
to
2ffb729
Compare
(Follow-up for #1351). No more trailers in the superblock! Client sessions is encoded with `CheckpointTrailer`, like the free set. Unlike the free set, client sessions encoding/decoding is handled directly by `replica.zig`, not `grid.zig`. This simplifies the superblock checkpoint/sync control flow, which now just needs to care about the superblock headers. This simplifies state sync, since the superblock sync is now just a single `sync_checkpoint` message.
2ffb729
to
5e60d4b
Compare
/// The size of an individual superblock including padding. | ||
/// TODO Add some padding between copies and include that here. | ||
pub const superblock_copy_size = @sizeOf(SuperBlockHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few reasons to include padding in this number:
- we probably want to reserve a bit of extra space as a just-in-case, since reformatting a whole data file isn't really an option
- can add padding here without requiring an additional zone (
grid_padding
) - separate physical block addresses (in SSD terms, not grid blocks) of the superblock headers (probably doesn't matter with SSDs 🤷)
//! Areas verified at compaction (half-measure): | ||
//! - Acquired Grid blocks (ignores skipped recovery compactions) | ||
//! TODO Because ManifestLog acquires blocks potentially several beats prior to actually writing | ||
//! the block, this check will need to be removed or use a different strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment. Originally the storage checker was going to verify storage determinism at half-bar boundaries, but that turns out not to work (due to the manifest log). Checkpoint is more comprehensive anyway.
@@ -872,17 +872,17 @@ test "Cluster: sync: sync, bump target, sync" { | |||
t.replica(.R2).drop_all(.R_, .bidirectional); | |||
try c.request(checkpoint_2_trigger, checkpoint_2_trigger); | |||
|
|||
// Allow R2 to complete SyncStage.requesting_trailers, but get stuck | |||
// during SyncStage.requesting_trailers. | |||
// Allow R2 to complete SyncStage.requesting_target, but get stuck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a typo, I think.
// Faulty bits will be set in state_machine_open_callback(). | ||
// Faulty bits will be set in sync_content(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was out of date. I think state_machine_open_callback
did originally handle this logic, but it was moved to sync_content
as soon as client-replies + table sync was added.
var free_set_buffer = try allocator.alignedAlloc( | ||
u8, | ||
@alignOf(u64), | ||
vsr.FreeSet.encode_size_max(Storage.grid_blocks_max), | ||
); | ||
errdefer allocator.free(free_set); | ||
errdefer allocator.free(free_set_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. Not found by the compiler (prior to adding client_sessions_buffer
) since the errdefer
was not previously reachable (nothing could possibly error before here and the end of the function).
previous_free_set_block_checksum_padding: u128 = 0, | ||
previous_free_set_block_address: u64, | ||
previous_trailer_block_checksum: u128, | ||
previous_trailer_block_checksum_padding: u128 = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, do we need this padding? I think we switched all pure checksumms to be unpadded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this actually is a pure checksum -- it is referring to a block, which would be encrypted, so the "checksum" would be the AEAD tag.
@@ -3177,6 +3158,11 @@ | |||
.checkpoint_client_replies => { | |||
self.client_replies.checkpoint(commit_op_checkpoint_client_replies_callback); | |||
}, | |||
.checkpoint_client_sessions => { | |||
self.client_sessions_checkpoint.size = | |||
self.client_sessions.encode(self.client_sessions_checkpoint.buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if perhaps there's some way to assert that both size and buffer are correctly set
at the same time? But I don't see any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the API is a little awkward. 🤔
@@ -7977,6 +7963,7 @@ pub fn ReplicaType( | |||
|
|||
self.grid.free_set.reset(); | |||
self.grid.free_set_checkpoint.reset(); | |||
self.client_sessions_checkpoint.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels odd that we are not reseting client_sessions here, but rather do it in client_sessions_open_callback
.
I think resetting right before decoding makes sense to me from the perspective of defensive programming the decode code doesn't have to worry whether the object was properly reset. But maybe it makes sense to go on an offensive here? Reset the sessions here, so that, between this moment, and the moment we actually decode them, the sessions are in a well-defined reset state? In decoding, we can then assert that the sessions are reset.
(Follow-up for #1351).
No more trailers in the superblock!
Client sessions is encoded with
CheckpointTrailer
, like the free set. Unlike the free set, client sessions encoding/decoding is handled directly byreplica.zig
, notgrid.zig
.This simplifies the superblock checkpoint/sync control flow, which now just needs to care about the superblock headers. This simplifies state sync, since the superblock sync is now just a single
sync_checkpoint
message.