[core] Don't fail to queue on 409 responses#1418
Conversation
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
🦋 Changeset detectedLatest commit: 9668d09 The changes in this PR will be included in the next version bump. This PR includes changesets to release 16 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (55 failed)mongodb (3 failed):
redis (2 failed):
turso (50 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
| (err.status === 409 || err.status === 410) | ||
| ) { | ||
| runtimeLogger.info( | ||
| 'Run already finished during setup, skipping', |
There was a problem hiding this comment.
| 'Run already finished during setup, skipping', | |
| 'Run already finished during setup, skipping replay', |
maybe this would be clearer to the user? (assuming the codepath is only for run replays)
There was a problem hiding this comment.
or "skipping redundant workflow execution"
pranaygp
left a comment
There was a problem hiding this comment.
human: LGTM! great detailed state check and ty for posting that on github PR review comment
Noticed error logs from 409s falling through the queue, which are confusing for users. Log example:
so I ensures we catch and gracefully return for these cases. I had Claude run an eval on whether this blocks any real use cases. Eval below:
Concurrent scenarios and why skipping is safe
Scenario 1: Two handlers race on
run_startedTwo queue messages for the same run arrive while it's still
pending. Both handlers callworld.runs.get(), seepending, and try to create arun_startedevent.Why it's safe: Handler A transitions the run to
runningand enters the replay loop. The replay loop either completes the workflow or suspends it with queued continuations. Handler B's early exit is a no-op — Handler A guarantees progress.Location:
runtime.tscatch block afterrun_startedevent creation.Scenario 2: Two handlers race on
run_completedTwo concurrent replay loops both reach the end of the workflow and try to create
run_completed.Why it's safe: The run is completed. Both handlers return. No continuation is needed.
Location:
runtime.tscatch block afterrun_completedevent creation.Scenario 3: Two handlers race on
run_failedSame as Scenario 2 but for failure. Both handlers detect an error in user code and try to fail the run.
Why it's safe: The run is failed. Both handlers return. No continuation is needed.
Location:
runtime.tscatch block afterrun_failedevent creation.Scenario 4: Run completes while another handler creates hooks
Handler A completes the workflow. Handler B (from an earlier queue message) is still in the suspension handler creating hook events.
Why it's safe: Handler A already completed the workflow. Handler B's hook creation fails, and if any steps were queued, the step executor receives 410 on
step_startedand exits gracefully. The 410 onstep_startedis the safety net — even if the suspension handler returns pending steps after a 409, the step executor won't make progress on a finished run.Location:
suspension-handler.tscatch blocks forhook_createdandhook_disposed;step-executor.ts410 handling onstep_started.Scenario 5: Two handlers race on
wait_completedDuring the replay loop, both handlers see an elapsed wait and try to complete it.
Why it's safe: Both handlers continue their replay loops. The event log is the same for both (the winning handler's
wait_completedis visible to future reads). Subsequent replay from either handler converges to the same state because replay is deterministic and event-sourced.Location:
runtime.tswait completion loop withcontinueon 409.Scenario 6: Two handlers race on
step_completedTwo handlers execute the same step concurrently (e.g., both received the step's queue message). Both finish execution and try to create
step_completed.Why it's safe: Handler A queues the workflow continuation. Handler B does not — this is correct because only one continuation should be queued per step completion. If both queued, there would be redundant replay (which is safe but wasteful).
Location:
step-executor.tscatch onstep_completedevent creation.Scenario 7: Two handlers race on
step_startedTwo handlers try to start the same step. Handler A wins; Handler B gets 409 because the step transitioned to a terminal state.
Why it's safe: Both handlers ensure workflow continuation — Handler A via step completion, Handler B via the replay loop re-queuing the workflow. The redundant replay is safe because event-sourced replay is convergent.
Location:
step-executor.ts409 handling onstep_started.Scenario 8:
step_created/wait_createdduplicateDuring suspension handling, two concurrent replays both try to create the same step or wait event.
Why it's safe: The step/wait entity already exists. The suspension handler continues processing other items. The step will be executed by whichever handler picks it up from the queue.
Location:
suspension-handler.tscatch blocks forstep_createdandwait_created.