-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioning 2 API #393
Versioning 2 API #393
Conversation
* Add create_time to rules list output * address comments
* Add versioning info to pending activities and started tasks * address feedback * improved comment * apply suggestions
* clean up versioning protos with the narrowed scope * address comments
Enhanced DescribeTaskQueue
* add redirect_counter filed and more * use int64 * return rules in update response * remove empty line * address comments
# Conflicts: # temporal/api/history/v1/message.proto # temporal/api/workflow/v1/message.proto # temporal/api/workflowservice/v1/request_response.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Notes/questions:
- Can you confirm that, besides the SDK replaying now-changed history JSON, there are no other issues you can think of in any SDK with the runtime component of workflows? Meaning no workers or workflows will break because of this? (but the administrative side it may make sense to break)
- Let's add a "WARNING: Versioning is not yet stable and the API and behavior may change incompatibly" above the unstable pieces. We don't technically have to and we didn't last time and we for sure will mark them this way in the SDK interfacing with these APIs, but I think it's also good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted, this actually will break workflow replays for users using versioning today. Today when people download/save history, they use JSON format, and they keep this JSON format to run on replayers in SDKs. Changing field names is a backwards incompatible history format change from the perspective of replayers.
This may be ok, but it needs to be made very clear. We also may want to offer a way in CLI or something to "fix" invalid history, up to y'all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing up the replay history from JSON issue. I did not think about that. Will make sure we mention that in the docs. Maybe some instructions for the user to replace "use_compatible_version" with "inherit_build_id" in the JSON string would suffice.
Regarding the first question, I cannot think of anything else that can break. We have functional tests for old and new versioning (and of course unversioned wfs). The "features" test pass in the versioning-2 branch which use old/current SDKs. As you noticed, the changes are mostly for the admin side, not the data plane. the fields added to data-plain are all optional and mostly for the use of server itself, and only for versioned workflows.
Added the warning to the RPC calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the first question, there is a problem with older UI/CLI (like only recently changed IIRC) where even unset fields are present in the JSON, but most of our replayers ignore unknown JSON fields. Also, sorry I forget, but can you tell whether any of the fields that were changed in the history events are set implicitly in the SDK instead of by explicitly by the workflow author (e.g. as a result of worker setting only)?
I will circulate the history JSON change internally to gather thoughts there.
# Conflicts: # temporal/api/workflow/v1/message.proto
message TaskQueueVersionInfo { | ||
// Empty means unversioned. | ||
string build_id = 1; | ||
repeated TaskQueueTypeInfo types_info = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a huge difference, but consider making this a map<temporal.api.enums.v1.TaskQueueType, TaskQueueTypeInfo>
and removing type
from that? (sorry, I didn't review this on the feature branch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems map key cannot be enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. It could be int32 with a comment, though? Now it's weird to have one level be a map and the next level not.
// For workflow task queues, we only report the normal queue metrics, not sticky queues. This means the stats | ||
// reported here do not count all workflow tasks. However, because the tasks queued in sticky queues only remain | ||
// valid for a few seconds, the inaccuracy becomes less significant as the backlog age grows. | ||
message BacklogInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we comment this whole thing out for now, if we're not using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this for now
|
||
enum DescribeTaskQueueMode { | ||
// Unspecified means legacy behavior. | ||
DESCRIBE_TASK_QUEUE_MODE_UNSPECIFIED = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means if you don't want to use ENHANCED mode, you have to use UNSPECIFIED, which is kind of weird. should there be an explicit BASIC and then have UNSPECIFIED default to BASIC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is no one should need to ask for basic mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not even if they want a cheaper call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by cheaper you mean something that only looks at local info for root partition and does not fan out to all partition, I'd still want to put that as an option for the enhanced mode so we return that result in the new structure. (of course if we really thing the cheap option would be useful enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, as long as we can turn off the reachability queries, which I think is the real expensive part, that's okay. Fan-out is fine to have by default (it should always have been doing that).
But what about the "internal" stuff that's deprecated? ack level, rate limit, etc. We don't have InternalTaskQueueStatus yet so now there's no "supported" way to get that info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to add a tdbg command for the internal info before CLIs stop supporting legacy mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm fine then (not a big deal anyway)
temporal.api.taskqueue.v1.TaskQueueStatus task_queue_status = 2; | ||
|
||
// Only set in `ENHANCED` mode. | ||
repeated temporal.api.taskqueue.v1.TaskQueueVersionInfo versions_info = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be a map<string, TaskQueueVersionInfo>
and remove build_id
from that?
The CI fails for expected reasons due to field renames. Need to an admin to bypass the check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with SDK team that the field name changes in JSON should be ok
|
||
enum DescribeTaskQueueMode { | ||
// Unspecified means legacy behavior. | ||
DESCRIBE_TASK_QUEUE_MODE_UNSPECIFIED = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm fine then (not a big deal anyway)
What changed?
UpdateWorkerVersioningRules
APIGetWorkerVersioningRules
APIDescribeTaskQueue
APIuse_compatible_version
toinherit_build_id
Why?
The new API provides better versioning experience for users, and provides a better foundation for us to build more features on top of.
Breaking changes
Removed bundle_id field in WorkerVersionStamp (but it's confirmed not to be used anywhere, so no breaking change really)
Server PR
temporalio/temporal#5691