fix: skip CountWorkflow in batch operations when --yes is set#1012
Open
bitalizer wants to merge 1 commit intotemporalio:mainfrom
Open
fix: skip CountWorkflow in batch operations when --yes is set#1012bitalizer wants to merge 1 commit intotemporalio:mainfrom
bitalizer wants to merge 1 commit intotemporalio:mainfrom
Conversation
The visibility CountWorkflowExecutions request was issued unconditionally before every batch terminate / signal / cancel / reset. The count is only used to populate the "Start batch against approximately N workflow(s)?" confirmation prompt. When --yes bypasses the prompt entirely, the count result is never read. In clusters whose visibility API is overloaded (e.g. Postgres-backed clusters with many workflows), this CountWorkflow call can time out and prevent batch jobs from being started at all, even though the batch operation itself uses the same query and would succeed. Skipping the count when --yes is set lets these batch jobs proceed unconditionally. Both batch entry points are updated: - commands.workflow.go (terminate / signal / cancel) - commands.workflow_reset.go (reset) When the count is skipped, the prompt text shown by --yes changes from "Start batch against approximately N workflow(s)? y/N" to "Start batch against workflows matching query "<query>"? y/N" so the output remains informative. Adds TestWorkflow_Terminate_BatchWorkflow_SkipsCountWhenYes which uses a gRPC unary interceptor to assert that CountWorkflowExecutionsRequest is *not* sent when --yes is passed, while StartBatchOperationRequest still is. The existing without-yes tests are unaffected. Closes temporalio#838
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #838.
Implements the design @cretz proposed in the issue thread: when
--yesbypasses the confirmation prompt, skip theCountWorkflowExecutionsrequest entirely.What was changed
SingleWorkflowOrBatchOptions.workflowExecOrBatch:CountWorkflowonly runs in the!s.Yesbranch.TemporalWorkflowResetCommand's batch path.When
--yesskips the count, the prompt text changes fromStart batch against approximately N workflow(s)? y/NtoStart batch against workflows matching query "<query>"? y/N, so the output (whichpromptYesstill prints in the auto-confirm path) stays informative.The non-
--yesinteractive flow is untouched: it still counts, still prompts, still prints the sameapproximately N workflow(s)message.Why
workflowExecOrBatch(terminate / signal / cancel) and the reset command both unconditionally callCountWorkflowExecutionsbefore starting a batch. The result is only used to fill in theapproximately N workflow(s)confirmation. When--yesis set, the prompt is skipped — but the count call still runs, and on clusters where the visibility API is timing out it fails the entire batch start. The original report is from a Postgres-backed cluster where the batch query itself works but the count times out; users can't start batch jobs they otherwise have permission to run.How was this tested
Added
TestWorkflow_Terminate_BatchWorkflow_SkipsCountWhenYesthat:CountWorkflowExecutionsRequestandStartBatchOperationRequestcalls.workflow terminate --query ... --yes.CountWorkflowcalls, 1StartBatchOperationcall, prompt text containsmatching queryand notapproximately.The interceptor pattern matches the existing
testTerminateBatchWorkflowhelper.Ran the new test plus a sample of the existing batch tests locally; they pass. (
TestWorkflow_Terminate_BatchWorkflowSuccessflakes locally on the unrelatedCompletedassertion both with and without my changes — pre-existing timing issue, not caused by this PR.)