Skip to content

[Bug] 1.11.2 workflow histories failing to replay on 1.11.5 #1582

@selbyk

Description

@selbyk

What are you really trying to do?

We want to update to the latest and have been running @temporalio/*@1.11.2 since 9/25/24. I tried to update to @temporalio/*@1.11.5 a few days ago and was faced with:

2024-12-10T21:41:34.484512Z  WARN temporal_sdk_core::worker::workflow: Failing workflow task run_id={{runId}} failure=Failure { failure: Some(Failure { message: "[TMPRL1100] Nondeterminism error: Child workflow id of scheduled event 'task/{{uuid}}' does not match child workflow id of command 'task/{{different_uuid}}'", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None, next_retry_delay: None })) }), force_cause: NonDeterministicError }
Replay failed https://cloud.temporal.io/namespaces/{{namespace}}/workflows/{{wfid}}/{{runid}}
DeterminismViolationError: Replay failed with a nondeterminism error. This means that the workflow code as written is not compatible with the history that was fed in. Details: Workflow activation completion failed: Failure { failure: Some(Failure { message: "[TMPRL1100] Nondeterminism error: Child workflow id of scheduled event 'task/{{uuid}}' does not match child workflow id of command 'task/{{different_uuid}}'", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None, next_retry_delay: None })) }), force_cause: NonDeterministicError }
    at evictionReasonToReplayError (/Users/selby/projects/monorepo/node_modules/.pnpm/@temporalio+worker@1.11.5_@swc+helpers@0.5.6_esbuild@0.21.5/node_modules/@temporalio/worker/lib/replay.js:34:20)
    at Worker.runReplayHistories (/Users/selby/projects/monorepo/node_modules/.pnpm/@temporalio+worker@1.11.5_@swc+helpers@0.5.6_esbuild@0.21.5/node_modules/@temporalio/worker/lib/worker.js:228:76)
    at async replayWorkflows (/Users/selby/projects/monorepo/apps/temporal-workers/workflow-tests/replay.ts:2:3826)
    at async replayWorkflowsInEnv (/Users/selby/projects/monorepo/apps/temporal-workers/workflow-tests/replay.ts:2:4907)
    at async replay (/Users/selby/projects/monorepo/apps/temporal-workers/workflow-tests/replay.ts:2:6297) 

on almost all of some of our workflow history types, including our latest histories. They seem to replay fine on @temporalio/*@1.11.3.

Describe the bug

We use uuid4() (from @temporalio/workflow) to generate a workflowId => task/uuid4(), and we use that workflowId in startChild({ workflowId }). We replay all running and up to the last 500 completed histories on each of our workflow types on every PR/deploy, and this part of our code and the replays have been stable for months.

Minimal Reproduction

Maybe we can find a minimal reproduction together. I just want to get this on your radar.

Environment/Versions

  • OS and processor: M2 Mac and Linux (GH Actions)
  • Temporal Version: 1.11.5

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions