Conversation
| fclose(epv_f); | ||
| snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c); | ||
| rename(epi_c, epv_c); | ||
| c3_unlink(epi_c); |
There was a problem hiding this comment.
I don't think there's any reason to write this metadata in one place and then rename() it. That operation is atomic, but that doesn't guard against anything that we care about -- we're just creating a new epoc here, not updating meaningful data "in place".
I think the issue is simply that these version files are not being synced to disc. After fwrite(), we need to call fflush() and then c3_sync(fileno(...)).
There was a problem hiding this comment.
After further discussion, there is one possible advantage to rename() -- any parsing errors in reading epoc.txt would then indicate tampering with the file, not a crash while writing it (or the current problem, failure to sync). The practical result would be that, in the case that vere crashes while creating a new epoc but before rename()-ing epoc.txt into place, that epoc would be automatically removed on startup as opposed to producing a fatal error that leaves it in place.
There was a problem hiding this comment.
Yes, I just tested this with a fresh fakezod. I simulated a crash before rename() of the epoc.tmp to epoc.txt file during a roll, thereby leaving only the snapshot files and an epoc.tmp file in the new epoch's folder. Afterwards, I restarted the ship with ./urbit zod and it successfully detected a bogus epoch, removed it, and booted from the latest valid epoch.
Fixes #640