Skip to content

Conversation

@yuandrew
Copy link
Contributor

@yuandrew yuandrew commented Oct 23, 2025

What was changed

Added deployment_build_id as key for eager workflow slot

Why?

Valid use case that today causes an error

Checklist

  1. Closes

  2. How was this tested:

  1. Any docs updates needed?

Note

Eager slot management now keys by namespace/task_queue/build_id, allows multiple local workers per queue with randomized selection/retry, updates duplicate-registration rules, and adds rand dependency.

  • Client worker registry (eager workflow start):
    • Key slot providers by namespace + task_queue + deployment_build_id (store Vec<(Uuid, Option<String>)>), allowing multiple providers per queue when build IDs differ.
    • Selection retries across providers if a worker has no slot; randomized order in prod, deterministic in tests.
    • Registration now rejects duplicates only when same namespace/task_queue/build_id; error message updated. Unregister removes only the specific worker and cleans up when empty.
    • num_providers now counts total providers across keys.
    • Extensive tests added/updated for retry behavior, task-queue boundaries, duplicate handling, and build ID cases.
  • API/docs:
    • WorkflowOptions.enable_eager_workflow_start doc note updated to remove BuildID incompatibility warning (still experimental).
  • Dependencies:
    • Add rand for provider selection shuffling.

Written by Cursor Bugbot for commit 427dd96. This will update automatically on new commits. Configure here.

@yuandrew yuandrew requested a review from a team as a code owner October 23, 2025 21:38
cursor[bot]

This comment was marked as outdated.

.any(|(_, opt)| {
opt.as_ref() == build_id.as_ref()
})
&& !skip_client_worker_set_check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is right

let worker_id = worker_list.choose(&mut rand::rng());
if let Some(worker_entry) = worker_id
&& let Some(worker) = self.all_workers.get(&worker_entry.0)
&& let Some(slot) = worker.try_reserve_wft_slot()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle we should then try the next worker in the list if this reservation fails. In practice I doubt this is going to come up much. Probably not hard to add, but maybe annoying to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, added the retry mechanism, and codex helped me write a solid test, with some minor tweaking/fixing of course

cursor[bot]

This comment was marked as outdated.

Comment on lines 108 to 109
worker_list
.choose_multiple(&mut rng, worker_list.len())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I I think can be written more readably as worker_list.as_slice().shuffle()

https://docs.rs/rand/latest/rand/seq/trait.SliceRandom.html#tymethod.shuffle

@yuandrew yuandrew merged commit 407043b into temporalio:master Oct 24, 2025
20 checks passed
@yuandrew yuandrew deleted the build-id-into-slot-supplier branch October 24, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants