Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiversion: backport execing and hardcode 0.15.3 as having 0.15.4 available #1935

Draft
wants to merge 1 commit into
base: release
Choose a base branch
from

Conversation

cb22
Copy link
Contributor

@cb22 cb22 commented May 10, 2024

0.15.4 will be the first version with the ability to read the multiversion metadata embedded in TigerBeetle. This presents a bit of a bootstrapping problem:

  • Operator replaces 0.15.3 with 0.15.4,
  • 0.15.4 starts up, re-execs into 0.15.3,
  • 0.15.3 knows nothing about 0.15.4 or how to check it's available, so we just hang.

Work around this by hardcoding that if 0.15.3 is present in a pack, 0.15.4 must be too.

Additionally, fix up setting CLOEXEC on the listening sockets. We were only setting it correctly on the accepted sockets.

@cb22 cb22 changed the base branch from main to release May 10, 2024 14:25
@cb22 cb22 force-pushed the re-release branch 6 times, most recently from 70598cc to 6204906 Compare June 5, 2024 12:42
@cb22
Copy link
Contributor Author

cb22 commented Jun 5, 2024

@sentientwaffle this is ready for review; but leaving as a draft PR since the merging / build process will be... Interesting.

Comment on lines 408 to 400
var binary_path = replica_release_execute_args.?.next().?;
assert(std.fs.path.isAbsolute(binary_path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the global replica_release_execute_args needed by "real" multiversioning too?

If so, maybe we should just use that instead of std.os.argv for macos/linux too, to avoid some of the branching on .windows at the top-level and in main().

...Then again, which how much type casting is going on, maybe it wouldn't actually simplify anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the global replica_release_execute_args needed by "real" multiversioning too?

Nope, there it's handled inside multiversioning.zig.

I've removed the branch for now, at the cost of a little extra memory on Linux / macOS, but since this is a transitional release only I think that's OK!

const cast_envp: [*:null]const ?[*:0]const u8 = @ptrCast(std.os.environ.ptr);

std.log.info("replica_release_execute: executing '{s}'...\n", .{binary_path});
std.os.execveZ(binary_path, cast_args, cast_envp) catch @panic("execveZ failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If execveZ fails, the error would be nice to have. Maybe replace the panic with a |err| log.fatal(..., .{err})? (Likewise for windows.)

@@ -956,7 +956,7 @@ pub const IO = struct {

pub const INVALID_FILE = os.windows.INVALID_HANDLE_VALUE;

fn open_file_handle(relative_path: []const u8, method: enum { create, open }) !os.fd_t {
fn open_file_handle(relative_path: []const u8, method: enum { create, open, open_wait }) !os.fd_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but why not .deinit() the IO to unlock the data file before starting the new process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason - that's a great idea! I think I should be able to deinit the replica and storage; afaik it's the port and the data file that we need to ensure are free.

(I was using the file as a proxy for both, here)

@cb22 cb22 force-pushed the re-release branch 3 times, most recently from cd94e07 to 05d2eb0 Compare June 6, 2024 13:48
…vailable

0.15.4 will be the first version with the ability to read the multiversion
metadata embedded in TigerBeetle. This presents a bit of a bootstrapping
problem:

* Operator replaces 0.15.3 with 0.15.4,
* 0.15.4 starts up, re-execs into 0.15.3,
* 0.15.3 knows nothing about 0.15.4 or how to check it's available, so we
  just hang.

Work around this by hardcoding that if 0.15.3 is present in a pack, 0.15.4
must be too.
const cast_args: [*:null]const ?[*:0]const u8 = @ptrCast(std.os.argv.ptr);
const cast_envp: [*:null]const ?[*:0]const u8 = @ptrCast(std.os.environ.ptr);

std.log.info("replica_release_execute: executing '{s}'...\n", .{binary_path});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but you could use log_main for the scoped logger.

Also, log.info() doesn't need an explicit \n.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but you could use log_main for the scoped logger.

Ah yes!

Also, log.info() doesn't need an explicit \n.

Indeed - this is to make the logging output when its exec'ing through multiple versions easier to read - that'll be the last line printed before the new process takes over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants