-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revocation phase using durability policies #9659
Conversation
…emiSyncAcks for a given primary Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…event a tablet from making forward progress Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…o know that no primary capable tablet can accept any new write Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…ds additional tests Signed-off-by: Manan Gupta <manan@planetscale.com>
…ds when 2 tablets are failing Signed-off-by: Manan Gupta <manan@planetscale.com>
…lures Signed-off-by: Manan Gupta <manan@planetscale.com>
c18aef5
to
5236dc4
Compare
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…rwise they start overriding each others durability policies configuration Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from a first pass. I still need to understand the new Revoke funcs.
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
bd7f315
to
5b8faad
Compare
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@deepthi I have renamed the files back so it is easier to review and added the requested comments in the |
Signed-off-by: Manan Gupta <manan@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some naming/testing suggestions! i think the rest is looking good :)
} | ||
|
||
// RevokeForTablet checks whether we have reached enough tablets such that the given primary eligible tablet cannot accept any new writes | ||
func RevokeForTablet(primaryEligible *topodatapb.Tablet, tabletsReached []*topodatapb.Tablet, allTablets []*topodatapb.Tablet) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these functions that check and return bools, i'd suggest renaming to from <Thing>
to Is<Thing>
to make them read a little nicer where they are used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about, haveRevokedForTablet
? instead of isRevokedForTablet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the change for haveRevokedForTablet
. If you feel the other name is better, I'll rename it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me! as long as it implies the boolean/question nature, and we're consistent with what we use for both, i'm happy :)
} | ||
|
||
// Revoked checks whether we have reached enough tablets to guarantee that no tablet eligible to become a primary can accept any write | ||
func Revoked(tabletsReached []*topodatapb.Tablet, allTablets []*topodatapb.Tablet) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same naming suggestion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to haveRevoked
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
lgtm, but let's get deepthi's approval (and clean up the commit history length) before merging |
Do you mean squash some of the commits together? |
Signed-off-by: Manan Gupta <manan@planetscale.com>
Description
This PR adds the ability to check for revocation using the durability policies in
EmergencyReparentShard
. What that means is that instead of exiting incase of more than 1 failure,ERS
will first try and find whether it is safe to proceed even when we have multiple failures!So, if the durability_policies allow for more than 1 failure, ERS will succeed!
Consider an example for the same. Let us say we are using
semi_sync
as our durability policy and we have four servers, A, B, C and D. A is our current primary, B and C are replicas and D is a rdonly tablet.Lets say that A and D, both go down.
In the older version, ERS would have failed since there are more than 1 failures
But now, ERS will see that even with A and D down, we can proceed safely. Safety, here means that we can guarantee that the unreachable (equivalent to down) tablets cannot make forward progress. In this case, only A is eligible to be a primary and D cannot provide Semi_Sync ACKs, which means that A and D cannot make progress, so it is safe to continue with the ERS.
Related Issue(s)
Checklist
Deployment Notes