-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix resilient shard #3584
Fix resilient shard #3584
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1869,7 +1869,7 @@ func (s *ContextImpl) acquireShard() { | |
// | ||
// We stop retrying on any of: | ||
// 1. We succeed in acquiring the rangeid lock. | ||
// 2. We get any error other than transient errors. | ||
// 2. We get ShardOwnershipLostError or lifecycleCtx ended. | ||
// 3. The state changes to Stopping or Stopped. | ||
// | ||
// If the shard controller sees that service resolver has assigned ownership to someone | ||
|
@@ -1933,7 +1933,18 @@ func (s *ContextImpl) acquireShard() { | |
return nil | ||
} | ||
|
||
err := backoff.ThrottleRetry(op, policy, common.IsPersistenceTransientError) | ||
// keep retrying except ShardOwnershipLostError or lifecycle context ended | ||
acquireShardRetryable := func(err error) bool { | ||
if s.lifecycleCtx.Err() != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you don't really have to check I'd prefer not doing this additional check since it makes the code more confusing (to me). but it doesn't hurt |
||
return false | ||
} | ||
switch err.(type) { | ||
case *persistence.ShardOwnershipLostError: | ||
return false | ||
} | ||
return true | ||
} | ||
err := backoff.ThrottleRetry(op, policy, acquireShardRetryable) | ||
if err != nil { | ||
// We got an unretryable error (perhaps context cancelled or ShardOwnershipLostError). | ||
s.contextTaggedLogger.Error("Couldn't acquire shard", tag.Error(err)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update common.IsPersistenceTransientError to exclude shard ownership lost error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsPersistenceTransientError only return true for Unavailable and ResourceExhausted. ShardOwnershipLostError is already excluded there.