Bug Report
Description
When multiple tool calls execute in parallel via Promise.all (as DurableAgent does internally), concurrent sleep() calls cause event log corruption during replay, resulting in:
Unconsumed event in event log: eventType=wait_completed, correlationId=wait_xxx, eventId=evnt_xxx.
This indicates a corrupted or invalid event log.
Reproduction
- Create a
DurableAgent with tools that use sleep() for polling (e.g., polling an external API queue)
- Ask the AI to call 5+ tools simultaneously
DurableAgent executes tools via Promise.all (line 189 of durable-agent.js)
- Each tool calls
sleep() concurrently
- On replay, the event log has unconsumed
wait_completed events
Minimal Example
export async function myWorkflow() {
"use workflow";
const agent = new DurableAgent({
model: "anthropic/claude-sonnet-4.6",
tools: {
generateImage: tool({
description: "Generate an image",
inputSchema: z.object({ prompt: z.string() }),
execute: async (params) => {
// Submit to external queue
const requestId = await submitStep(params.prompt);
// Poll with sleep — breaks when multiple tools run concurrently
for (let i = 0; i < 100; i++) {
await sleep("5s");
const status = await checkStatusStep(requestId, i);
if (status === "COMPLETED") break;
}
return await getResultStep(requestId);
},
}),
},
});
await agent.stream({ messages, writable: getWritable() });
}
When the LLM returns 5 generateImage tool calls, DurableAgent runs them all via Promise.all, creating 5 concurrent sleep() calls. On replay, the execution order of these concurrent sleeps may differ, causing event correlation mismatches.
Root Cause Analysis
sleep() uses order-based correlation IDs internally. With Promise.all, the order in which concurrent sleep() calls execute is non-deterministic. During replay, the runtime may match a wait_completed event to the wrong sleep() call because the execution order differs from the original run.
This contrasts with "use step" functions, which are identified by file + function + args (content-based), making them safe for concurrent use.
Suggested Fix
Add an optional token or name parameter to sleep() for content-based correlation:
// Current — order-based, breaks with Promise.all
await sleep("5s");
// Proposed — content-based, safe for concurrent use
await sleep("5s", { token: `poll-${requestId}-${iteration}` });
This would align with how createWebhook({ token }) and createHook({ token }) already support deterministic correlation via explicit tokens.
Environment
workflow@4.1.0-beta.63
@workflow/ai@4.0.1-beta.54
@workflow/core@4.1.0-beta.63
- Platform: Vercel (Next.js)
Workaround
Currently working around this by avoiding sleep() in tools that may execute concurrently — using fal.subscribe() (blocking poll) for fast operations and planning to use createWebhook() for long-running ones.
Bug Report
Description
When multiple tool calls execute in parallel via
Promise.all(asDurableAgentdoes internally), concurrentsleep()calls cause event log corruption during replay, resulting in:Reproduction
DurableAgentwith tools that usesleep()for polling (e.g., polling an external API queue)DurableAgentexecutes tools viaPromise.all(line 189 ofdurable-agent.js)sleep()concurrentlywait_completedeventsMinimal Example
When the LLM returns 5
generateImagetool calls,DurableAgentruns them all viaPromise.all, creating 5 concurrentsleep()calls. On replay, the execution order of these concurrent sleeps may differ, causing event correlation mismatches.Root Cause Analysis
sleep()uses order-based correlation IDs internally. WithPromise.all, the order in which concurrentsleep()calls execute is non-deterministic. During replay, the runtime may match await_completedevent to the wrongsleep()call because the execution order differs from the original run.This contrasts with
"use step"functions, which are identified byfile + function + args(content-based), making them safe for concurrent use.Suggested Fix
Add an optional
tokenornameparameter tosleep()for content-based correlation:This would align with how
createWebhook({ token })andcreateHook({ token })already support deterministic correlation via explicit tokens.Environment
workflow@4.1.0-beta.63@workflow/ai@4.0.1-beta.54@workflow/core@4.1.0-beta.63Workaround
Currently working around this by avoiding
sleep()in tools that may execute concurrently — usingfal.subscribe()(blocking poll) for fast operations and planning to usecreateWebhook()for long-running ones.