Trampolining fix: Suppress targetWorkerDeploymentVersionChanged on inline WFT path by Shivs11 · Pull Request #10217 · temporalio/temporal

Shivs11 · 2026-05-11T17:23:22Z

Summary

Added revision number mechanics to strengthen checks while deciding between clearing/setting the LastNotifiedTargetVersion!

Problem

The inline WFT path in RespondWorkflowTaskCompleted (both bypassTaskGeneration at line 596 and speculative at line 722) calls AddWorkflowTaskStartedEvent with targetDeploymentVersion=nil and targetRevisionNumber=0 because matching is never consulted on this path.
In AddWorkflowTaskStartedEvent, case 3 compared only buildId + deploymentName, so a nil target fails that case and the default fires — also corrupting LastNotifiedTargetVersion with {nil, 0}. For update- or signal-driven PINNED workflows (where the updates and the signals were buffered on workflow task completion) this manifested as a continuous CaN loop on a stable deployment.
A separate but related hazard: even on the normal poll path, a stale matching partition that happens to report target == effective by buildId would spuriously hit case 3 and clear legitimate notifications, silently defeating the trampolining-suppression mechanism from Fix pinned workflow trampolining via revision-based signal suppression #9895.

Test plan

TestInlinePath_StableRouting_NoSpuriousFlag — new integration test that exercises the user-reported scenario:
- Sends a buffered signal during a regular WFT, completes with ReturnNewWorkflowTask=true.
- Asserts the inline follow-up WFT's WFT-Started event has requestId == "request-from-RespondWorkflowTaskCompleted" (self-verification that the test actually exercises the inline code path).
- Asserts TargetWorkerDeploymentVersionChanged == false on the inline WFT-Started event.
- Confirmed to fail before the state machine change (test commit is on top of the fix commit) and pass after.
All 8 related existing tests still pass:
- TestStalePartition_RevisionSuppressesTrampolining (from Fix pinned workflow trampolining via revision-based signal suppression #9895)
- TestPinnedCaN_NoAUOnCaN_NoInfiniteLoop
- TestOverride_SuppressesTargetVersionChangedSignal
- TestAutoUpgrade_SuppressesTargetVersionChangedSignal
- TestPinnedCaN_TargetChangesAgain_SignalsTrue
- TestRemoveOverride_ClearsDeclinedState
- TestRetryOfDeclinedCaN_SignalsOnNewTarget
- TestPinnedCaN_RollbackResetsDeclined

🤖 Generated with Claude Code

Note

Medium Risk
Touches workflow-task start/versioning decision logic in history service; mistakes could suppress legitimate upgrade notifications or alter trampolining behavior, but changes are localized and covered by a new integration test.

Overview
Fixes a trampolining loop where inline workflow tasks created in RespondWorkflowTaskCompleted could incorrectly set targetWorkerDeploymentVersionChanged and corrupt LastNotifiedTargetVersion despite not consulting matching.

Inline/eager/synthetic AddWorkflowTaskStartedEvent call sites now pass targetRevisionNumber=-1 as a sentinel, and the workflow-task state machine strengthens its decision logic by tracking the highest seen matching revision and refusing to clear/set notification state based on stale (older-revision) reports.

Adds TestInlinePath_StableRouting_NoSpuriousFlag to reproduce the buffered-signal inline-WFT scenario and assert the flag remains false on stable routing.

^{Reviewed by Cursor Bugbot for commit baf0f63. Bugbot is set up for automated code reviews on this repo. Configure here.}

…path The inline WFT path in RespondWorkflowTaskCompleted (both bypassTaskGeneration and speculative branches) calls AddWorkflowTaskStartedEvent with targetDeploymentVersion=nil and targetRevisionNumber=0 because matching is never consulted. The state machine's case 3 compares buildId+deploymentName only, so nil target fails that case and the default fires the flag, also corrupting LastNotifiedTargetVersion with {nil, 0}. For update-driven PINNED workflows this manifests as a continuous CaN loop. Strengthen the state machine in two places: - Case 3 now also requires targetRevisionNumber >= highestSeenRevNumber before clearing declined/notified state. A stale partition that happens to match by buildId no longer wipes legitimate notifications. - Default now breaks out early when targetRevisionNumber <= highestSeenRevNumber. Stale matching reports (or non-routing-derived revision-0 calls like the inline path) cannot fire the flag or corrupt LastNotifiedTargetVersion. highestSeenRevNumber is computed as max(LastNotifiedTargetVersion.Revision, DeclinedTargetVersionUpgrade.Revision). No new persistence fields, no proto changes, no extra RPCs to matching, and no changes to inline call sites which keep passing nil/0. Also closes the TODO at line 516 (case 3 revision strengthening). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reproduces the lab scenario where a PINNED workflow on stable routing hits the inline path in RespondWorkflowTaskCompleted (via a buffered signal during WFT processing + ReturnNewWorkflowTask=true) and asserts the resulting WFT-Started event does NOT carry targetWorkerDeploymentVersionChanged=true. The test self-verifies that it exercises the inline path by asserting requestId == "request-from-RespondWorkflowTaskCompleted" on the WFT-Started event before checking the core flag assertion. Without that, a future change to the test setup could silently stop hitting the buggy code path without anyone noticing. Confirmed: - Pre-fix (HEAD~1): test fails on the flag assertion (with the inline marker requestId already present, proving the test reaches the bug). - Post-fix: test passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cosmetic — gci rewrites the test's doc comment numbered list to use two-space indentation per godoc convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous default-branch strengthening used `<=` against highestSeenRevNumber, which incorrectly suppressed legitimate re-firings on subsequent WFTs that reported the same target/revision. This broke TestPinnedCaN_UpgradeOnCaN_TransientWFT_WithSuggest (all 8 sub-variants) because the transient retry has the same target/revision as the first attempt — `M <= M` is true, so the retry's WFT-Started event no longer carried TargetWorkerDeploymentVersionChanged=true. Switch to strict `<` and pass a -1 revision sentinel from callers that don't consult matching: - workflow_task_state_machine.go default branch: `<=` → `<`. Same-revision re-firings now fire as the original design intended. - respondworkflowtaskcompleted/api.go inline call sites (lines 606, 732): pass `-1` for targetRevisionNumber instead of `0`. This is an explicit "no routing info" sentinel that lives below the range of real matching revisions, so `<` cleanly suppresses inline-path calls without affecting legitimate same-revision re-fires. - workflow_resetter.go: same -1 sentinel for the synthetic reset WFT, for symmetry. This is a synthetic event that never had real routing info either. Verified via the 12-test suite (PR #9895's 8 tests + new TestInlinePath_StableRouting_NoSpuriousFlag + 3 _UpgradeOnCaN_*_WithSuggest variants). All pass after this change, including the previously-failing transient variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Symmetric with the other synthetic-WFT call sites (inline path in RespondWorkflowTaskCompleted, workflow resetter). The eager-exec path also creates a WFT-Started event without consulting matching for routing info. Today the outer gate at workflow_task_state_machine.go:496 skips the flag-firing block when behavior is UNSPECIFIED — which is the typical case for fresh top-level workflow starts. But for eager exec on a workflow with inherited versioning info (cron next-run, retries, parent-inherited child runs), the block does run, and passing `0` could falsely fire the flag on the synthetic WFT-Started event. The TODO at line 105 ("get build ID from Starter so eager workflows can be versioned") indicates the team plans to extend eager exec to versioned workflows. Using `-1` here today is forward-compatible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Shivs11 · 2026-05-11T19:02:28Z

+			// Strict `<` (not `<=`) so legitimate same-revision re-firings (e.g., transient retries, repeated updates) still fire; inline path uses revision=-1 sentinel to be caught here.
+			if targetRevisionNumber < highestSeenRevNumber {


@ShahabT / @carlydf - this proved to be quite important since TestPinnedCaN_UpgradeOnCaN_TransientWFT_WithSuggest test started failing.

Basically, that test was forcefully failing a couple of updates and it tests out if transient workflow tasks still have the SDK flag correctly set to true. With the <= check, it was failing since this constraint is basically saying that if we have ever seen this, we drop the toggling of this flag (which is not great)

s.WaitForChannel (inherited from FunctionalTestBase) is marked deprecated in favor of (*TestEnv).WaitForChannel, but Versioning3Suite tests universally use the suite-level method. Add //nolint:staticcheck to match the existing pattern in the file (SA1019 nolints are used in 4+ other places in versioning_3_test.go). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Shivs11 · 2026-05-11T19:11:50Z

 			true,
 			nil,
-			0,
+			-1, // sentinel: synthetic event, no routing info


the idea of sending a -1 here is so that this path does not affect the path that workflow tasks take normally (coming from matching)

a -1 represents a sentinel value since these in-line workflow tasks don't really have up to date routing information

…line WFT path (#10217) ## Summary - Added revision number mechanics to strengthen checks while deciding between clearing/setting the LastNotifiedTargetVersion! ## Problem - The inline WFT path in `RespondWorkflowTaskCompleted` (both `bypassTaskGeneration` at line 596 and speculative at line 722) calls `AddWorkflowTaskStartedEvent` with `targetDeploymentVersion=nil` and `targetRevisionNumber=0` because matching is never consulted on this path. - In AddWorkflowTaskStartedEvent, case 3 compared only `buildId + deploymentName`, so a `nil` target fails that case and the default fires — also corrupting `LastNotifiedTargetVersion` with `{nil, 0}`. For update- or signal-driven PINNED workflows (where the updates and the signals were buffered on workflow task completion) this manifested as a continuous CaN loop on a stable deployment. - A separate but related hazard: even on the normal poll path, a stale matching partition that happens to report `target == effective` by buildId would spuriously hit case 3 and clear legitimate notifications, silently defeating the trampolining-suppression mechanism from #9895. ## Test plan - [x] `TestInlinePath_StableRouting_NoSpuriousFlag` — new integration test that exercises the user-reported scenario: - Sends a buffered signal during a regular WFT, completes with `ReturnNewWorkflowTask=true`. - Asserts the inline follow-up WFT's WFT-Started event has `requestId == "request-from-RespondWorkflowTaskCompleted"` (self-verification that the test actually exercises the inline code path). - Asserts `TargetWorkerDeploymentVersionChanged == false` on the inline WFT-Started event. - Confirmed to **fail** before the state machine change (test commit is on top of the fix commit) and **pass** after. - [x] All 8 related existing tests still pass: - `TestStalePartition_RevisionSuppressesTrampolining` (from #9895) - `TestPinnedCaN_NoAUOnCaN_NoInfiniteLoop` - `TestOverride_SuppressesTargetVersionChangedSignal` - `TestAutoUpgrade_SuppressesTargetVersionChangedSignal` - `TestPinnedCaN_TargetChangesAgain_SignalsTrue` - `TestRemoveOverride_ClearsDeclinedState` - `TestRetryOfDeclinedCaN_SignalsOnNewTarget` - `TestPinnedCaN_RollbackResetsDeclined` 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Touches workflow-task start/versioning decision logic in history service; mistakes could suppress legitimate upgrade notifications or alter trampolining behavior, but changes are localized and covered by a new integration test. > > **Overview** > Fixes a trampolining loop where inline workflow tasks created in `RespondWorkflowTaskCompleted` could incorrectly set `targetWorkerDeploymentVersionChanged` and corrupt `LastNotifiedTargetVersion` despite not consulting matching. > > Inline/eager/synthetic `AddWorkflowTaskStartedEvent` call sites now pass `targetRevisionNumber=-1` as a sentinel, and the workflow-task state machine strengthens its decision logic by tracking the *highest seen* matching revision and refusing to clear/set notification state based on stale (older-revision) reports. > > Adds `TestInlinePath_StableRouting_NoSpuriousFlag` to reproduce the buffered-signal inline-WFT scenario and assert the flag remains false on stable routing. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit baf0f63. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Shivs11 and others added 2 commits May 11, 2026 12:52

Shivs11 requested review from a team as code owners May 11, 2026 17:23

Shivs11 changed the title ~~Suppress spurious targetWorkerDeploymentVersionChanged on inline WFT path~~ Trampolining fix: Suppress targetWorkerDeploymentVersionChanged on inline WFT path May 11, 2026

Fix godoc numbered-list formatting from make fmt-imports

f8d8b38

Cosmetic — gci rewrites the test's doc comment numbered list to use two-space indentation per godoc convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ShahabT approved these changes May 11, 2026

View reviewed changes

Shivs11 and others added 2 commits May 11, 2026 14:50

Shivs11 commented May 11, 2026

View reviewed changes

stephanos approved these changes May 11, 2026

View reviewed changes

Shivs11 enabled auto-merge (squash) May 11, 2026 21:50

davewhat approved these changes May 11, 2026

View reviewed changes

Shivs11 merged commit 246d752 into main May 11, 2026
49 checks passed

Shivs11 deleted the inline-path-flag-fix branch May 11, 2026 21:55

temporal-cicd Bot mentioned this pull request May 12, 2026

Patch v1.32.0-155.2 #10231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trampolining fix: Suppress targetWorkerDeploymentVersionChanged on inline WFT path#10217

Trampolining fix: Suppress targetWorkerDeploymentVersionChanged on inline WFT path#10217
Shivs11 merged 6 commits into
mainfrom
inline-path-flag-fix

Shivs11 commented May 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

Shivs11 May 11, 2026

Uh oh!

Shivs11 May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// Strict `<` (not `<=`) so legitimate same-revision re-firings (e.g., transient retries, repeated updates) still fire; inline path uses revision=-1 sentinel to be caught here.
		if targetRevisionNumber < highestSeenRevNumber {

Conversation

Shivs11 commented May 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test plan

Uh oh!

Shivs11 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Shivs11 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Shivs11 commented May 11, 2026 •

edited by cursor Bot

Loading