New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per version set timestamp for when it was last made default in queue #4524
Conversation
// Deletions are performed by a background process which verifies build IDs are no longer in use and safe to delete (not yet implemented). | ||
// | ||
// Update may fail with FailedPrecondition if it would cause exceeding the supplied limits. | ||
func UpdateVersionSets(clock hlc.Clock, data *persistencespb.VersioningData, req *workflowservice.UpdateWorkerBuildIdCompatibilityRequest, maxSets, maxBuildIds int) (*persistencespb.VersioningData, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this guy move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closer to the implementation
service/matching/version_sets.go
Outdated
return nil, serviceerror.NewInvalidArgument(fmt.Sprintf("version %s already exists and is not default in set", targetedVersion)) | ||
} | ||
// Make the operation idempotent | ||
return existingData, nil | ||
return data, nil | ||
} | ||
return nil, serviceerror.NewInvalidArgument(fmt.Sprintf("%s requested to be made compatible with %s but both versions exist and are incompatible", targetedVersion, compatVer)) | ||
} | ||
|
||
// First duplicate the build IDs to avoid mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is wrong now that it's not being cloned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
//nolint:revive // cyclomatic complexity | ||
func updateImpl(timestamp hlc.Clock, existingData *persistencespb.VersioningData, req *workflowservice.UpdateWorkerBuildIdCompatibilityRequest) (*persistencespb.VersioningData, error) { | ||
func updateImpl(timestamp hlc.Clock, data *persistencespb.VersioningData, req *workflowservice.UpdateWorkerBuildIdCompatibilityRequest) (*persistencespb.VersioningData, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention that it now mutates data, and possibly shouldn't return a pointer any more since that can imply it's creating a new thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still needs to return a pointer because it might recreate the entire structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or data would have to be a double pointer.
809960e
to
724af38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubber stamp here. :)
What changed?
馃挜 Breaking Change (for a not yet released piece of code though)
Removed the default_update_timestamp on versioning data in favor of a per version set field
queue_default_update_timestamp
that denotes when it was last made default in the queue.Why?
During testing, I discovered that we may mark a previously default build ID (
vP
) completely unreachable workflows immediately after a new queue default is added. This happens while there are no open workflows forvP
's set and workflow tasks are currently being processed byvP
workers. When those tasks complete,vP
will be reachable again.Having a per set timestamp for when it was last made queue default will allow us to be more careful when reporting build IDs as unreachable in the reachability API and the build ID scavenger.
How did you test it?
Refactored exiting tests.
Potential risks
Need to delete all user data from existing cluster deployments for any 1.21 rc release.