Skip to content

Fix TestWorkflowStartConflict flaky test#9776

Merged
spkane31 merged 2 commits intomainfrom
spk/update-with-start-failure
Apr 2, 2026
Merged

Fix TestWorkflowStartConflict flaky test#9776
spkane31 merged 2 commits intomainfrom
spk/update-with-start-failure

Conversation

@spkane31
Copy link
Copy Markdown
Contributor

@spkane31 spkane31 commented Apr 1, 2026

What changed?

Fix flaky test TestUpdateWithStartSuite/TestWorkflowStartConflict/workflow_id_conflict_policy_fail:_use-existing handling both possible orderings of the race condition it tests.

Why?

The test injects a hook (UpdateWithStartInBetweenLockAndStart) that fires a concurrent StartWorkflowExecution between the update-with-start lock acquisition and start attempt to simulate a race condition. The test assumes the update always lands in a second speculative WFT:

  1. WFT Add workspace file to gitignore for vscode development #1 polled with no messages → completed (empty)
  2. UWS retry creates speculative WFT Sync from Cadence (22 October 2019) #2 → polled with update → accepted/completed

This assumption broke after the parallelsuite migration, which gives each test its own isolated testcore.NewEnv running in parallel. With a dedicated env, the retryable history client's immediate retry of the Unavailable error races tightly with RecordWorkflowTaskStarted:

The original two-poll design panicked in ordering A (index out of range [0] with length 0 on task.Messages[0] in the empty-response first poll). The initial single-poll fix panicked in ordering B for the same reason. Neither ordering is
guaranteed, so the test must handle both.

How did you test it?

  • built
  • run locally and tested manually - tested with -count=50 locally
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

None, test only

@spkane31 spkane31 requested review from a team as code owners April 1, 2026 22:08
@spkane31 spkane31 requested a review from stephanos April 1, 2026 22:09
@spkane31 spkane31 enabled auto-merge (squash) April 1, 2026 22:31
@spkane31 spkane31 force-pushed the spk/update-with-start-failure branch from a8bf294 to ecb44ba Compare April 2, 2026 16:26
@spkane31 spkane31 merged commit f09abd1 into main Apr 2, 2026
46 checks passed
@spkane31 spkane31 deleted the spk/update-with-start-failure branch April 2, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants