Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pier: epoch system #313

Closed
matthew-levan opened this issue Mar 27, 2023 · 0 comments · Fixed by #459
Closed

pier: epoch system #313

matthew-levan opened this issue Mar 27, 2023 · 0 comments · Fixed by #459
Assignees
Labels
feature New feature or feature request

Comments

@matthew-levan
Copy link
Contributor

matthew-levan commented Mar 27, 2023

To ameliorate jet mismatching issues (especially during event log replay), an "epoch system" should be created. The epoch system is a new <pier>/.urb/log format which enables event log replays to correctly match binary versions with particular subsets of events from the pier's event log.

Related:

@matthew-levan matthew-levan added the feature New feature or feature request label Mar 27, 2023
@matthew-levan matthew-levan self-assigned this Mar 27, 2023
@matthew-levan matthew-levan mentioned this issue Apr 12, 2023
20 tasks
@matthew-levan matthew-levan mentioned this issue Jun 21, 2023
36 tasks
pkova added a commit that referenced this issue Sep 18, 2023
This PR implements a new format for how piers store their event logs on
disk.

Resolves #313.

### Design

Existing format:
```
./zod/.urb/log
├── data.mdb
└── lock.mdb
```

New format:
```
./zod/.urb/log
├── 0i0             # epoch dirnames specify the last event of the previous epoch
│   ├── data.mdb    # lmdb file containing events 1-132
│   ├── epoc.txt    # disk format version (this PR starts versioning at 1)
│   ├── lock.mdb    # lmdb lock file
│   └── vere.txt    # binary version this set of events was originally run with
└── 0i132
    ├── data.mdb
    ├── epoc.txt
    ├── lock.mdb
    ├── north.bin   #
    ├── south.bin   # snapshot files (state as of event 132), strictly read-only
    └── vere.txt
```

The new format introduces *epochs*, which are simply "slices" or
"chunks" of a ship's complete event log. Above, you can see the ship's
event log chunked into two epochs: `0i0` and `0i132`.

New ships booted with the code in this PR instantiate their `log`
directories with the new format. Existing piers are automatically
migrated on boot.

Epoch "rollovers" (when the current epoch is ended and a new, empty
epoch is created) occur under three conditions:
1. The pilot uses the new `roll` subcommand to manually rollover.
2. The pilot runs the `chop` subcommand.
3. We detect a different running binary version than the one pinned in
the current epoch.

Both migrations and epoch rollovers ensure there's a current snapshot
before running.

A few TODOs left:
- [x] Iron out small kink in migration behavior for previously chopped
piers
- [x] Make sure correct binary version gets pinned to first epoch of
migrated piers
- [x] Rollover to new epoch when a new binary version is detected
- [x] Make sure manual migration logic is idempotent
- [x] ~~Update `prep` command~~
- [x] Fix `chop` so it works when there are 3 epochs starting with `0i0`
- [x] ~~Reproduce and fix partially-deleted epoch `0i0` after `chop`~~
- [x] Pair with someone to run manual GDB testing for migration
idempotency and rollover logic
- [x] Take a look at @joemfb's replay code and compare/find overlaps
- [x] Document final system design in this PR
- [x] Correct epoch naming scheme
- [x] Make `chop` leave the latest two epochs
- [x] Better error handling
- [x] Better cleanup
- [x] Test migration with real ships running on local-networking mode
- [x] Test epoch rollover idempotency
- [x] Test fresh boot
- [x] Handle case where snapshot has been deleted from `chk/`
- [x] Ensure `u3_disk_epoc_good()` is implemented and used how we want
- [x] Ensure `u3_disk_epoc_init()` is implemented and used how we want
- [x] Replay works with `urbit play` and `urbit`
- [x] Replay works in edge case where only epoch 0 and no valid snapshot
exist
- [x] Move new-epoch-on-vere-version-mismatch logic to
`_pier_wyrd_init()`
- [x] Make subcommands which call `u3_disk_init()` auto-migrate
  - [x] `info`
  - [x] `cram`
  - [x] `queu`
  - [x] `meld`
  - [x] `pack`
  - [x] `play`
  - [x] `chop`
  - [x] `roll`
- [x] Make replay on boot use `u3_mars_play()`
- [x] Test migration from an old pier (again)
- [x] Test migration from an old pier that needs a full replay (i.e.,
from beginning of its event log) first works
- [x] Test that `./urbit roll zod` with an updated binary version *and*
an empty latest epoch, it does not roll but instead just updates the
`vere.txt` file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant