Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning 2 API #393

Merged
merged 23 commits into from
Apr 10, 2024
Merged

Versioning 2 API #393

merged 23 commits into from
Apr 10, 2024

Conversation

ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Apr 9, 2024

What changed?

  • UpdateWorkerVersioningRules API
  • GetWorkerVersioningRules API
  • Enhanced DescribeTaskQueue API
  • Added fields to the following proto messages to track versioning info
    • WorkflowExecutionInfo
    • PendingActivityInfo
    • WorkflowExecutionStartedEventAttributes
    • WorkflowTaskStartedEventAttributes
    • ActivityTaskStartedEventAttributes
  • Wire-safe renaming of use_compatible_version to inherit_build_id
  • Getting rid of bundle_id field in WorkerVersionStamp. (the field is not used in any SDK, nor in server)

Why?
The new API provides better versioning experience for users, and provides a better foundation for us to build more features on top of.

Breaking changes
Removed bundle_id field in WorkerVersionStamp (but it's confirmed not to be used anywhere, so no breaking change really)

Server PR
temporalio/temporal#5691

ShahabT and others added 15 commits January 11, 2024 19:04
* Add create_time to rules list output

* address comments
* Add versioning info to pending activities and started tasks

* address feedback

* improved comment

* apply suggestions
* clean up versioning protos with the narrowed scope

* address comments
Enhanced DescribeTaskQueue
* add redirect_counter filed and more

* use int64

* return rules in update response

* remove empty line

* address comments
# Conflicts:
#	temporal/api/history/v1/message.proto
#	temporal/api/workflow/v1/message.proto
#	temporal/api/workflowservice/v1/request_response.proto
@ShahabT ShahabT requested review from a team as code owners April 9, 2024 18:18
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Notes/questions:

  1. Can you confirm that, besides the SDK replaying now-changed history JSON, there are no other issues you can think of in any SDK with the runtime component of workflows? Meaning no workers or workflows will break because of this? (but the administrative side it may make sense to break)
  2. Let's add a "WARNING: Versioning is not yet stable and the API and behavior may change incompatibly" above the unstable pieces. We don't technically have to and we didn't last time and we for sure will mark them this way in the SDK interfacing with these APIs, but I think it's also good here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted, this actually will break workflow replays for users using versioning today. Today when people download/save history, they use JSON format, and they keep this JSON format to run on replayers in SDKs. Changing field names is a backwards incompatible history format change from the perspective of replayers.

This may be ok, but it needs to be made very clear. We also may want to offer a way in CLI or something to "fix" invalid history, up to y'all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up the replay history from JSON issue. I did not think about that. Will make sure we mention that in the docs. Maybe some instructions for the user to replace "use_compatible_version" with "inherit_build_id" in the JSON string would suffice.

Regarding the first question, I cannot think of anything else that can break. We have functional tests for old and new versioning (and of course unversioned wfs). The "features" test pass in the versioning-2 branch which use old/current SDKs. As you noticed, the changes are mostly for the admin side, not the data plane. the fields added to data-plain are all optional and mostly for the use of server itself, and only for versioned workflows.

Added the warning to the RPC calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first question, there is a problem with older UI/CLI (like only recently changed IIRC) where even unset fields are present in the JSON, but most of our replayers ignore unknown JSON fields. Also, sorry I forget, but can you tell whether any of the fields that were changed in the history events are set implicitly in the SDK instead of by explicitly by the workflow author (e.g. as a result of worker setting only)?

I will circulate the history JSON change internally to gather thoughts there.

message TaskQueueVersionInfo {
// Empty means unversioned.
string build_id = 1;
repeated TaskQueueTypeInfo types_info = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge difference, but consider making this a map<temporal.api.enums.v1.TaskQueueType, TaskQueueTypeInfo> and removing type from that? (sorry, I didn't review this on the feature branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems map key cannot be enum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. It could be int32 with a comment, though? Now it's weird to have one level be a map and the next level not.

// For workflow task queues, we only report the normal queue metrics, not sticky queues. This means the stats
// reported here do not count all workflow tasks. However, because the tasks queued in sticky queues only remain
// valid for a few seconds, the inaccuracy becomes less significant as the backlog age grows.
message BacklogInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we comment this whole thing out for now, if we're not using it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this for now


enum DescribeTaskQueueMode {
// Unspecified means legacy behavior.
DESCRIBE_TASK_QUEUE_MODE_UNSPECIFIED = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means if you don't want to use ENHANCED mode, you have to use UNSPECIFIED, which is kind of weird. should there be an explicit BASIC and then have UNSPECIFIED default to BASIC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is no one should need to ask for basic mode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not even if they want a cheaper call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by cheaper you mean something that only looks at local info for root partition and does not fan out to all partition, I'd still want to put that as an option for the enhanced mode so we return that result in the new structure. (of course if we really thing the cheap option would be useful enough).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, as long as we can turn off the reachability queries, which I think is the real expensive part, that's okay. Fan-out is fine to have by default (it should always have been doing that).

But what about the "internal" stuff that's deprecated? ack level, rate limit, etc. We don't have InternalTaskQueueStatus yet so now there's no "supported" way to get that info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to add a tdbg command for the internal info before CLIs stop supporting legacy mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'm fine then (not a big deal anyway)

temporal.api.taskqueue.v1.TaskQueueStatus task_queue_status = 2;

// Only set in `ENHANCED` mode.
repeated temporal.api.taskqueue.v1.TaskQueueVersionInfo versions_info = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be a map<string, TaskQueueVersionInfo> and remove build_id from that?

@ShahabT
Copy link
Contributor Author

ShahabT commented Apr 10, 2024

The CI fails for expected reasons due to field renames. Need to an admin to bypass the check.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with SDK team that the field name changes in JSON should be ok


enum DescribeTaskQueueMode {
// Unspecified means legacy behavior.
DESCRIBE_TASK_QUEUE_MODE_UNSPECIFIED = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'm fine then (not a big deal anyway)

@ShahabT ShahabT merged commit 2227a14 into master Apr 10, 2024
3 checks passed
@ShahabT ShahabT deleted the versioning-2 branch April 10, 2024 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants