Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve index crash recovery #2394

Merged
merged 6 commits into from Jul 4, 2022

Conversation

tobim
Copy link
Member

@tobim tobim commented Jul 1, 2022

This causes VAST to restore from the available partition synopses on the filesystem instead of the partition synopses listed in the index.bin file, which can easily go out of sync during crashes.

This is a first change towards getting rid of the index.bin file entirely.

馃摑 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

馃幆 Review Instructions

Review commit-by-commit. Test by terminating VAST ungracefully during a rebuild.

In cases where the database only gets updated from partition transforms the
index state file can go heavily out of sync. This becomes a problem in case of
unexpected termination when the state can't get persisted, effectively resulting
in data loss.

This change adds calls to the persist function for that state in appropriate
places so that the problematic situation is prevented moving forward.
@tobim tobim added the bug Incorrect behavior label Jul 1, 2022
@tobim tobim force-pushed the story/sc-35105/improve-index-crash-recovery branch from b582750 to cb6c011 Compare July 1, 2022 20:21
@tobim tobim marked this pull request as ready for review July 1, 2022 20:21
@tobim tobim requested review from lava and dominiklohmann July 1, 2022 20:22
We need to nudge them a bit so we can drop support for older partition versions
more freely.
The condition for this was exactly inverted, so the command considered only
oversized partitions.
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this extensively with manually induced crashes during rebuilds. This works really well.

@dominiklohmann dominiklohmann merged commit cc009c0 into master Jul 4, 2022
@dominiklohmann dominiklohmann deleted the story/sc-35105/improve-index-crash-recovery branch July 4, 2022 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
2 participants