world-local: atomically dedupe duplicate step_created/wait_created events#1877
world-local: atomically dedupe duplicate step_created/wait_created events#1877TooTallNate wants to merge 1 commit intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 632010f The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
workflow with 1 step💻 Local Development
workflow with 10 sequential steps💻 Local Development
workflow with 25 sequential steps💻 Local Development
workflow with 50 sequential steps💻 Local Development
Promise.all with 10 concurrent steps💻 Local Development
Promise.all with 25 concurrent steps💻 Local Development
Promise.all with 50 concurrent steps💻 Local Development
Promise.race with 10 concurrent steps💻 Local Development
Promise.race with 25 concurrent steps💻 Local Development
Promise.race with 50 concurrent steps💻 Local Development
workflow with 10 sequential data payload steps (10KB)💻 Local Development
workflow with 25 sequential data payload steps (10KB)💻 Local Development
workflow with 50 sequential data payload steps (10KB)💻 Local Development
workflow with 10 concurrent data payload steps (10KB)💻 Local Development
workflow with 25 concurrent data payload steps (10KB)💻 Local Development
workflow with 50 concurrent data payload steps (10KB)💻 Local Development
Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
stream pipeline with 5 transform steps (1MB)💻 Local Development
10 parallel streams (1MB each)💻 Local Development
fan-out fan-in 10 streams (1MB each)💻 Local Development
SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (2 failed)nextjs-turbopack (1 failed):
vite (1 failed):
💻 Local Development (1 failed)hono-stable (1 failed):
🐘 Local Postgres (2 failed)nitro-stable (2 failed):
Details by Category❌ ▲ Vercel Production
❌ 💻 Local Development
✅ 📦 Local Production
❌ 🐘 Local Postgres
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
There was a problem hiding this comment.
Pull request overview
Fixes a race in @workflow/world-local where concurrent writers could create duplicate step_created / wait_created events (and overwrite entities) when the same correlationId is produced concurrently (notably by the snapshot runtime’s deterministic correlation IDs).
Changes:
- Add an atomic per-
(runId, correlationId)constraint-file claim (viawriteExclusive/O_CREAT|O_EXCL) forstep_created. - Replace
wait_created’s TOCTOU read-then-check with the same atomic constraint-file claim. - Add regression tests covering concurrent duplicates for steps/waits and sequential duplicate steps, plus a changeset for a patch release.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/world-local/src/storage/events-storage.ts | Adds atomic .locks/{steps,waits} constraint-file claims to dedupe concurrent step_created/wait_created. |
| packages/world-local/src/storage.test.ts | Adds regression coverage for concurrent duplicate creation races and sequential duplicate step_created. |
| .changeset/fix-world-local-step-created-race.md | Publishes a patch changeset describing the concurrency fix and behavior change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…-local Concurrent invocations producing identical correlationIds (as the snapshot runtime does by design across replays) previously both succeeded and persisted duplicate events. step_created had no guard at all; wait_created used a TOCTOU read-then-check that allowed both writers through under concurrency. Both now claim a per-(runId, correlationId) constraint file with O_CREAT|O_EXCL before writing, so the loser surfaces as EntityConflictError — which the runtime's dedup catch path already handles.
9dd8c99 to
632010f
Compare
Summary
Fixes a race condition in
@workflow/world-localwhere concurrent invocations producing identicalcorrelationIds forstep_createdorwait_createdevents would both succeed and persist duplicate events in the log.Background
step_createdpreviously had no atomicity guard — two concurrent calls with the samecorrelationIdboth wrote the entity and the event, leaving the second write to silently overwrite the first.wait_createdused a TOCTOU read-then-check pattern: read the existing wait, throw if found, otherwise write. Under concurrency both readers can pass the existence check before either writes.The rest of the runtime already expects
EntityConflictErrorto be thrown on duplicate writes (see theEntityConflictError.is(err)catch path inruntime/snapshot-entrypoint.ts), so the missing guard was a real correctness gap.Fix
Both branches now claim a per-
(runId, correlationId)constraint file under.locks/{steps,waits}/withO_CREAT|O_EXCLsemantics (via the existingwriteExclusivehelper used for hook tokens). The loser surfaces asEntityConflictError.Includes 3 regression tests covering:
step_createdwith same correlationId.wait_createdwith same correlationId (replaces the prior TOCTOU pattern).step_created(existing pass-through behavior preserved).Verification
Extracted from PR #1300 (snapshot-runtime), where this fix originated. The snapshot runtime produces deterministic correlationIds across concurrent VM invocations of the same resumption by design — that path made the dedup gap reliably reproducible — but the fix is also valuable on its own for the replay runtime under any concurrent-create scenario.