CASSANDRA-19617: Refresh stale paxos commit #3289

belliottsmith · 2024-05-03T18:03:46Z

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

bdeggleston · 2024-05-03T22:41:48Z

src/java/org/apache/cassandra/db/SystemKeyspace.java

+            default: throw new AssertionError();
+            case legacy:
+            case gc_grace:
+                overrideTtlSeconds = metadata.params.gcGraceSeconds;


I think this creates another window for state resurrection if you switch back to repaired without having run a paxos repair in the meantime though, since you'd no longer be applying a synthetic ttl to data that had previously been synthetically ttl’d. The timeline would be: you have incomplete operation A happen with purging mode repaired, you switch to gc_grace and the ttl window elapses, you run operation B, which deletes the data that would have been written by operation A, the ttl window for that operation passes and it’s removed from disk, you switch back to repaired purging without having run a repair in the meantime, operation A becomes visible again and is applied.

Re-reading the PaxosStatePurging docs, we only state that it’s unsafe to go back to legacy once migrating from it, which implies that it’s ok to return to gc_grace if you’re having repair problems and a growing paxos state table, which I think is a useful escape hatch to keep around.

Switching back to gc_grace would also prevent the operations written while in repaired mode from being purged properly. While we’re on the topic, increasing gc grace when using the gc grace purge mode could also resurrect older operations by the same general mechanism described in the JIRA.

I think we need to track a local purgeable low bound across config and schema changes.

belliottsmith added 3 commits April 26, 2024 12:50

dtest

4d147ef

Fix

707691e

rename

04f1eb6

bdeggleston requested changes May 3, 2024

View reviewed changes

Fix NPE

5591a27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-19617: Refresh stale paxos commit #3289

CASSANDRA-19617: Refresh stale paxos commit #3289

belliottsmith commented May 3, 2024

bdeggleston May 3, 2024

CASSANDRA-19617: Refresh stale paxos commit #3289

Are you sure you want to change the base?

CASSANDRA-19617: Refresh stale paxos commit #3289

Conversation

belliottsmith commented May 3, 2024

bdeggleston May 3, 2024

Choose a reason for hiding this comment