Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSR: Ping releases #1670

Merged
merged 6 commits into from Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions src/config.zig
Expand Up @@ -142,6 +142,7 @@ const ConfigCluster = struct {
lsm_batch_multiple: comptime_int = 32,
lsm_snapshots_max: usize = 32,
lsm_manifest_compact_extra_blocks: comptime_int = 1,
vsr_releases_max: usize = 64,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vsr_releases_max is a new config/constant which defines the maximum number of releases that can be advertised by a replica in a ping message. (So this is effectively the maximum number of releases that can be compiled into a multiversion binary).

I set it to 64 right now, which I chose somewhat arbitrarily: We release on a weekly cadence, there are 52 weeks/year, rounded up to the nearest power-of-2 is 64.


// Arbitrary value.
// TODO(batiati): Maybe this constant should be derived from `grid_iops_read_max`,
Expand Down
11 changes: 11 additions & 0 deletions src/constants.zig
Expand Up @@ -83,6 +83,17 @@ comptime {
assert(clients_max >= Config.Cluster.clients_max_min);
}

/// The maximum number of release versions (upgrade candidates) that can be advertised by a replica
/// in each ping message body.
pub const vsr_releases_max = config.cluster.vsr_releases_max;

comptime {
assert(vsr_releases_max >= 2);
assert(vsr_releases_max * @sizeOf(u16) <= message_body_size_max);
// The number of releases is encoded into ping headers as a u16.
assert(vsr_releases_max <= std.math.maxInt(u16));
}

/// The maximum number of nodes required to form a quorum for replication.
/// Majority quorums are only required across view change and replication phases (not within).
/// As per Flexible Paxos, provided `quorum_replication + quorum_view_change > replicas`:
Expand Down
1 change: 1 addition & 0 deletions src/testing/cluster.zig
Expand Up @@ -429,6 +429,7 @@ pub fn ClusterType(comptime StateMachineType: anytype) type {
// TODO Use "real" release numbers.
.release = 1,
.release_client_min = 1,
.releases_bundled = vsr.ReleaseList.from_slice(&[_]u16{1}) catch unreachable,
},
);
assert(replica.cluster == cluster.options.cluster_id);
Expand Down
1 change: 1 addition & 0 deletions src/tigerbeetle/main.zig
Expand Up @@ -183,6 +183,7 @@ const Command = struct {
// TODO Use real release numbers.
.release = 1,
.release_client_min = 1,
.releases_bundled = vsr.ReleaseList.from_slice(&[_]u16{1}) catch unreachable,
.storage_size_limit = args.storage_size_limit,
.storage = &command.storage,
.aof = &aof,
Expand Down
15 changes: 15 additions & 0 deletions src/vsr.zig
Expand Up @@ -62,6 +62,9 @@ pub const CheckpointTrailerType = @import("vsr/checkpoint_trailer.zig").Checkpoi
/// For backwards compatibility through breaking changes (e.g. upgrading checksums/ciphers).
pub const Version: u16 = 0;

/// A ReleaseList is ordered from highest-to-lowest(i.e. newest-to-oldest) version.
pub const ReleaseList = stdx.BoundedArray(u16, constants.vsr_releases_max);

pub const ProcessType = enum { replica, client };

pub const Zone = enum {
Expand Down Expand Up @@ -1044,6 +1047,18 @@ pub fn member_index(members: *const Members, replica_id: u128) ?u8 {
} else return null;
}

pub fn verify_release_list(releases: []const u16) void {
assert(releases.len >= 1);
assert(releases.len <= constants.vsr_releases_max);

for (
releases[0 .. releases.len - 1],
releases[1..],
) |release_a, release_b| {
assert(release_a > release_b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks for the descending list, right? That is, that's the opposite of natural sort order? What's the reason for deviation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to emphasize that the highest versions are higher priorities... in retrospect, probably an unnecessary complication.

}
}

pub const Headers = struct {
pub const Array = stdx.BoundedArray(Header.Prepare, constants.view_change_headers_max);
/// The SuperBlock's persisted VSR headers.
Expand Down
19 changes: 19 additions & 0 deletions src/vsr/replica.zig
Expand Up @@ -210,6 +210,14 @@ pub fn ReplicaType(
/// It should never be modified by a running replica.
release_client_min: u16,

/// A list of all versions of code that are available in the current binary.
/// Includes the current version, newer versions, and older versions.
/// Ordered from highest/newest to lowest/oldest.
///
/// Note that this is a property (rather than a constant) for the purpose of testing.
/// It should never be modified for a running replica.
releases_bundled: vsr.ReleaseList,

/// A globally unique integer generated by a crypto rng during replica process startup.
/// Presently, it is used to detect outdated start view messages in recovering head status.
nonce: Nonce,
Expand Down Expand Up @@ -487,6 +495,7 @@ pub fn ReplicaType(
grid_cache_blocks_count: u32 = Grid.Cache.value_count_max_multiple,
release: u16,
release_client_min: u16,
releases_bundled: vsr.ReleaseList,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but I'v bound bounded array to be clunky to work with (esp for vsr headers). Given that vsr.ReleaseList doesn't actually enforce validity in types (we need to call verify_release_list), then maybe just pass a slice of u16s here?

That is, I think the two stationary design points are:

  • fully enforce semantic validity in types, such that, if the caller has vsr.ReleaseList, that's already a proof that the list is valid
  • use types to specify structural representation, and rely on assertions for invariance checking. That is, use simple types like u16, []u128 etc, instead of vsr.Relaese, vsr.Checksum etc.

Here we are sort-of in the middle, where the interface requires a vsr.ReleaseList, but that doesn't actually check most of invariants itself.

};

/// Initializes and opens the provided replica using the options.
Expand Down Expand Up @@ -548,6 +557,7 @@ pub fn ReplicaType(
.grid_cache_blocks_count = options.grid_cache_blocks_count,
.release = options.release,
.release_client_min = options.release_client_min,
.releases_bundled = options.releases_bundled,
});

// Disable all dynamic allocation from this point onwards.
Expand Down Expand Up @@ -847,6 +857,7 @@ pub fn ReplicaType(
grid_cache_blocks_count: u32,
release: u16,
release_client_min: u16,
releases_bundled: vsr.ReleaseList,
};

/// NOTE: self.superblock must be initialized and opened prior to this call.
Expand Down Expand Up @@ -889,6 +900,13 @@ pub fn ReplicaType(
// Flexible quorums are safe if these two quorums intersect so that this relation holds:
assert(quorum_replication + quorum_view_change > replica_count);

vsr.verify_release_list(options.releases_bundled.const_slice());
assert(std.mem.indexOfScalar(
u16,
options.releases_bundled.const_slice(),
options.release,
) != null);

self.time = options.time;

// The clock is special-cased for standbys. We want to balance two concerns:
Expand Down Expand Up @@ -976,6 +994,7 @@ pub fn ReplicaType(
.quorum_majority = quorum_majority,
.release = options.release,
.release_client_min = options.release_client_min,
.releases_bundled = options.releases_bundled,
.nonce = options.nonce,
// Copy the (already-initialized) time back, to avoid regressing the monotonic
// clock guard.
Expand Down