This repository has been archived by the owner on May 22, 2019. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
allow copy cancellation and add extra flags to shard busy status
- Loading branch information
Showing
3 changed files
with
75 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 comments
on commit 010ba8e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like that the only level of granularity that we have on the busyness flag is at a shard level. I think we should cancel jobs, not cancel shards. Also, I'd like to see the error message logged in the nameserver for easy retrieval.
This code looks good though, and is a step in the right direction, so I still say ship it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the busy flag as it is is extremely limited. I was going for incremental change with this patch. The real solution should also:
- Track individual jobs.
- Allow space for jobs to record arbitrary status.
- Allow operator to tune job in flight (say page size, perhaps.)
- Prevent multiple copies to the same shard at once.
- Prevent multiple pages of the same copy job from re-spawning due to races in error condition handling. (Same mechanism as above, different application)
- Completely handle its own error retrying: The error retrying here is in addition to that of the containing queue. The logic here and that above can have bad interactions.
This could also be made to check randomly some percentage of the time.