disk: replay, epoch compatibility, and lmdb modules #241

matthew-levan · 2024-06-13T15:20:54Z

This PR adds bare bones replay functionality for vere-v3.0-style piers, meaning that one may:

Boot ships with the vere-v3.0 king and Ares serf
Restart ships which have been booted as above
Perform a full replay of such a ship by deleting its .urb/chk/data.pma file

To achieve these results, very basic modules have been added for lmdb, disk, and mars. These are not well thought-out or designed-- instead, they are "minimum viable" and, as such, include numerous todo comments sprinkled throughout.

I have tested this PR on:

macos-aarch64
linux-x86_64
- ~~For some reason, booting a fresh fake ship on this platform just hangs after binary copy succeeded~~

To fix functionality for linux-x86_64, we use MDB_NOLOCK (see disk.rs:44):

Do not do any locking. If concurrent access is anticipated, the caller must manage all concurrency themself. For proper operation the caller must enforce single-writer semantics, and must ensure that no readers are using old transactions while a writer is active. The simplest approach is to use an exclusive lock so that no readers may be active at all when a writer begins.

Thus, we have two invariants to maintain:

Only one writer is active at any given time.
- We satisfy this because we never write from Rust (in fact, we don't even have any write functions implemented).
No reader uses old transactions while a writer is active.
- We satisfy this because we only read from the database during replay or when initializing the serf (our patterns mirror Vere's) and our read transactions are never active while a write transaction is also active.

Future work:

Use mugs to verify replay correctness
Achieve parity with Vere replay, including play arguments
Use a real CLI parsing crate (or something along the lines that's better than our manual argv parsing)
Handle IPC responses correctly, instead of printing garble
Ensure safety against snapshot and/or event log corruption during replay
Clean up code (Rust idioms, etc.)
Look into why serf: bail is printed during boot and replay
Think through and document proper designs for these systems when working on Ares "full Mars" implementation
Implement replay tests for CI

matthew-levan · 2024-06-24T14:27:27Z

@eamsden Not sure why linting fails on the serf::next function as it exists in status as well.

disk: basic replay, epoch compat, and lmdb modules

a7364bb

matthew-levan added the enhancement New feature or request label Jun 13, 2024

matthew-levan requested review from eamsden and ashelkovnykov June 13, 2024 15:20

matthew-levan self-assigned this Jun 13, 2024

matthew-levan mentioned this pull request Jun 13, 2024

event log, replay, epoch system compatibility #237

Closed

Merge branch 'status' into msl/basic-replay

dc4afeb

matthew-levan added this to the Vere compatibility milestone Jun 14, 2024

disk: open env with MDB_NOLOCK XX

14d8c13

matthew-levan marked this pull request as ready for review June 24, 2024 13:29

cargo: clippy

da1cf12

matthew-levan changed the title ~~disk: basic replay, epoch compatibility, and lmdb modules~~ disk: replay, epoch compatibility, and lmdb modules Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk: replay, epoch compatibility, and lmdb modules #241

disk: replay, epoch compatibility, and lmdb modules #241

matthew-levan commented Jun 13, 2024 •

edited

Loading

matthew-levan commented Jun 24, 2024

disk: replay, epoch compatibility, and lmdb modules #241

Are you sure you want to change the base?

disk: replay, epoch compatibility, and lmdb modules #241

Conversation

matthew-levan commented Jun 13, 2024 • edited Loading

matthew-levan commented Jun 24, 2024

matthew-levan commented Jun 13, 2024 •

edited

Loading