Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-19617: Refresh stale paxos commit #3289

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

belliottsmith
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

default: throw new AssertionError();
case legacy:
case gc_grace:
overrideTtlSeconds = metadata.params.gcGraceSeconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this creates another window for state resurrection if you switch back to repaired without having run a paxos repair in the meantime though, since you'd no longer be applying a synthetic ttl to data that had previously been synthetically ttl’d. The timeline would be: you have incomplete operation A happen with purging mode repaired, you switch to gc_grace and the ttl window elapses, you run operation B, which deletes the data that would have been written by operation A, the ttl window for that operation passes and it’s removed from disk, you switch back to repaired purging without having run a repair in the meantime, operation A becomes visible again and is applied.

Re-reading the PaxosStatePurging docs, we only state that it’s unsafe to go back to legacy once migrating from it, which implies that it’s ok to return to gc_grace if you’re having repair problems and a growing paxos state table, which I think is a useful escape hatch to keep around.

Switching back to gc_grace would also prevent the operations written while in repaired mode from being purged properly. While we’re on the topic, increasing gc grace when using the gc grace purge mode could also resurrect older operations by the same general mechanism described in the JIRA.

I think we need to track a local purgeable low bound across config and schema changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants