Skip to content

fix: skip CountWorkflow in batch operations when --yes is set#1012

Open
bitalizer wants to merge 1 commit intotemporalio:mainfrom
bitalizer:skip-count-on-yes
Open

fix: skip CountWorkflow in batch operations when --yes is set#1012
bitalizer wants to merge 1 commit intotemporalio:mainfrom
bitalizer:skip-count-on-yes

Conversation

@bitalizer
Copy link
Copy Markdown

@bitalizer bitalizer commented Apr 29, 2026

Closes #838.

Implements the design @cretz proposed in the issue thread: when --yes bypasses the confirmation prompt, skip the CountWorkflowExecutions request entirely.

What was changed

When --yes skips the count, the prompt text changes from Start batch against approximately N workflow(s)? y/N to Start batch against workflows matching query "<query>"? y/N, so the output (which promptYes still prints in the auto-confirm path) stays informative.

The non---yes interactive flow is untouched: it still counts, still prompts, still prints the same approximately N workflow(s) message.

Why

workflowExecOrBatch (terminate / signal / cancel) and the reset command both unconditionally call CountWorkflowExecutions before starting a batch. The result is only used to fill in the approximately N workflow(s) confirmation. When --yes is set, the prompt is skipped — but the count call still runs, and on clusters where the visibility API is timing out it fails the entire batch start. The original report is from a Postgres-backed cluster where the batch query itself works but the count times out; users can't start batch jobs they otherwise have permission to run.

How was this tested

Added TestWorkflow_Terminate_BatchWorkflow_SkipsCountWhenYes that:

  1. Installs a unary gRPC interceptor that counts CountWorkflowExecutionsRequest and StartBatchOperationRequest calls.
  2. Starts one workflow, then runs workflow terminate --query ... --yes.
  3. Asserts: 0 CountWorkflow calls, 1 StartBatchOperation call, prompt text contains matching query and not approximately.

The interceptor pattern matches the existing testTerminateBatchWorkflow helper.

Ran the new test plus a sample of the existing batch tests locally; they pass. (TestWorkflow_Terminate_BatchWorkflowSuccess flakes locally on the unrelated Completed assertion both with and without my changes — pre-existing timing issue, not caused by this PR.)

The visibility CountWorkflowExecutions request was issued unconditionally
before every batch terminate / signal / cancel / reset. The count is only
used to populate the "Start batch against approximately N workflow(s)?"
confirmation prompt. When --yes bypasses the prompt entirely, the count
result is never read.

In clusters whose visibility API is overloaded (e.g. Postgres-backed
clusters with many workflows), this CountWorkflow call can time out and
prevent batch jobs from being started at all, even though the batch
operation itself uses the same query and would succeed. Skipping the
count when --yes is set lets these batch jobs proceed unconditionally.

Both batch entry points are updated:
- commands.workflow.go (terminate / signal / cancel)
- commands.workflow_reset.go (reset)

When the count is skipped, the prompt text shown by --yes changes from
"Start batch against approximately N workflow(s)? y/N" to
"Start batch against workflows matching query "<query>"? y/N" so the
output remains informative.

Adds TestWorkflow_Terminate_BatchWorkflow_SkipsCountWhenYes which uses
a gRPC unary interceptor to assert that
CountWorkflowExecutionsRequest is *not* sent when --yes is passed,
while StartBatchOperationRequest still is. The existing without-yes
tests are unaffected.

Closes temporalio#838
@bitalizer bitalizer requested a review from a team as a code owner April 29, 2026 08:32
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 29, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unable to start a batch job if counting the workflows times out

2 participants