feat: serializable AbortController/AbortSignal#1301
Conversation
…ignal Adds documentation and test infrastructure for making AbortController and AbortSignal serializable across workflow and step boundaries. The feature uses a dual hook+stream backing: hooks for deterministic replay in the workflow context, streams for real-time propagation to running steps. Docs: - Cancellation guide (foundations) covering AbortSignal and run cancellation - How Cancellation Works (how-it-works) explaining hook+stream internals - AbortSignal.timeout() error page for the workflow VM restriction - Updated serialization docs with AbortController/AbortSignal section Tests (all .todo stubs for TDD): - VM behavior: AbortController API, static methods, hook integration - Step-side: stream reader setup, abort propagation, ops queue - Serialization round-trips: all boundaries, encryption, nested structures - Consistency: race conditions, partial failure, eventual convergence - E2E workflows: timeout, parallel, step-initiated, hook-triggered, replay Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Change type from "error" to "troubleshooting" to match the valid frontmatter schema used by all other error pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests (516 passing, 18 todo for integration tests): - 18 VM behavior tests (abort-controller.test.ts) - 18 step-side behavior tests (abort-controller-step.test.ts) - 4 consistency tests + 14 integration todos (abort-consistency.test.ts) - 14 serialization round-trip tests (serialization.test.ts) - 7 hook integration + 4 integration todos (step.test.ts) Request.signal serialization: - Add signal field to SerializableSpecial Request type - Include signal in Request reducer when present - Pass signal through in external and step Request revivers Fix workflow reviver for AbortController/AbortSignal: - Use plain objects instead of prototype-based stubs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| "common-patterns", | ||
| "errors-and-retries", | ||
| "hooks", | ||
| "cancellation", |
There was a problem hiding this comment.
this should go after streaming
There was a problem hiding this comment.
Fixed in aebd1a9 — cancellation is now after streaming in the foundations nav.
…ignal * origin/main: (26 commits) Fix flaky streamer test ENOENT when chunks directory does not exist yet (#1330) Version Packages (beta) (#1325) [web-shared] Improve workflow observability event list UX (#1337) feat: add `exists` getter to `Run` class (#1336) Support client-side tools in DurableAgent (#1329) [world-postgres] [world-local] Execute Graphile jobs directly instead of defering to world-local queue (#1334) Merge CLAUDE.md into AGENTS.md and symlink CLAUDE.md (#1326) [web] Polish loading indicators (#1327) Fix flaky webhookWorkflow e2e test by polling instead of fixed sleep (#1328) feat: support `deploymentId: 'latest'` in `start()` to resolve most recent deployment (#1317) Fix bug where the SWC compiler bug prunes step-only imports in the client-mode transformation [web] [world-vercel] Ensure user-passed run IDs are URL encoded and call out self-hosted security (#1322) Version Packages (beta) (#1306) Remove hard-coded VERCEL_DEPLOYMENT_KEY from nextjs-turbopack workbench (#1319) fix(web): move react-router deps to devDependencies (#1265) fix(ai): use workspace:* for workflow peer dependency (#1320) fix(core): pass resolved deploymentId to getEncryptionKeyForRun in start() (#1318) fix: surface 429 rate-limit errors in e2e tests and CLI (#1309) fix(world-local): return HTTP 200 instead of 503 for queue timeout re-enqueue signals (#1307) [web-shared] [cli] Refactor observability data fetching (#1261) ... # Conflicts: # packages/core/e2e/e2e.test.ts # packages/web-shared/src/components/sidebar/attribute-panel.tsx # workbench/example/workflows/99_e2e.ts
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (5 failed)example (1 failed):
express (1 failed):
fastify (1 failed):
hono (1 failed):
vite (1 failed):
🐘 Local Postgres (228 failed)astro-stable (19 failed):
express-stable (19 failed):
fastify-stable (19 failed):
hono-stable (19 failed):
nextjs-turbopack-canary (19 failed):
nextjs-turbopack-stable (19 failed):
nextjs-webpack-canary (19 failed):
nextjs-webpack-stable (19 failed):
nitro-stable (19 failed):
nuxt-stable (19 failed):
sveltekit-stable (19 failed):
vite-stable (19 failed):
🌍 Community Worlds (68 failed)mongodb (3 failed):
redis (2 failed):
turso (63 failed):
📋 Other (19 failed)e2e-local-postgres-nest-stable (19 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
❌ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
❌ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
| async function longStep(signal: AbortSignal): Promise<string> { | ||
| 'use step'; | ||
| for (let i = 0; i < 60; i++) { | ||
| if (signal.aborted) { |
There was a problem hiding this comment.
make sure we have a test that also checks throwIfAborted (and ensure that the step doesn't retry and the workflow gets the FatalError)
There was a problem hiding this comment.
and check that other abort reasons get propagated correctly too
There was a problem hiding this comment.
Added abortThrowIfAbortedWorkflow e2e test + workflow in aebd1a9. The step calls signal.throwIfAborted() on an already-aborted signal. The DOMException is wrapped in FatalError by the step handler, skipping retries, and the workflow catches it as isFatal: true.
There was a problem hiding this comment.
Added abortReasonTypesWorkflow e2e test in aebd1a9. Tests string reasons, object reasons ({ code, detail }), and undefined reasons (default abort). All propagate correctly through serialization.
| const response = await globalThis.fetch(url, { signal }); | ||
| return { ok: response.ok, aborted: false }; | ||
| } catch (err: any) { | ||
| if (err.name === 'AbortError') { |
There was a problem hiding this comment.
another test that just lets the error propagate from fetch (instead of catching and returning the value) so we can test how it works when the DOMException is thrown. should also test to make sure the step isn't being retried when fetch throws an AbortError
There was a problem hiding this comment.
Added abortFetchUncaughtWorkflow e2e test in aebd1a9. The step does fetch(url, { signal }) with an already-aborted signal and does NOT catch the error. The AbortError propagates as FatalError to the workflow (isFatal: true), confirming no retries.
Convert all 27 remaining .todo stubs to real implementations: - 14 consistency tests (race conditions, partial failures, queue processing) - 4 hook integration tests (suspension handler, hydration, eventual consistency) - 9 e2e tests (timeout, parallel, step-abort, hook-cancel, replay, external signal) All 558 tests pass, 0 todos remaining. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR review fixes: - Move cancellation after streaming in foundations nav - Fix AbortSignal reducer to detect WorkflowAbortSignal via symbol - Guard AbortController reducer from matching AbortSignal objects - Add e2e tests: throwIfAborted, reason types, uncaught fetch AbortError Changelog: - Add hidden changelog section (not in sidebar, accessible via URL) - Add draft changelog entry for serializable AbortController/AbortSignal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `preview` flag to nav items in geistdocs.tsx - Filter preview items in Navbar (server component) based on VERCEL_ENV - Show "Preview" badge on preview nav items in DesktopMenu - Changelog link visible in preview deployments and local dev only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the PreviewBadge (with package tarball install modal) from the fixed bottom-right position on the home page to the navbar, so it appears on every page during preview deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace separate Changelog nav item and PreviewBadge with a single "Internal" page that only appears in preview deployments: - Rename docs/changelog/ to docs/internal/ - Internal page includes preview package install commands and draft changelogs in one place - Nav shows "Internal" with Preview badge in preview/dev only - Remove PreviewBadge from navbar (now on the Internal page) - Add callout that page is preview-only Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add PreviewInstall component with copy-to-clipboard buttons using the actual VERCEL_URL (not placeholders) - Register PreviewInstallServer as MDX component for docs pages - Exclude /internal/ pages from sitemap.xml, sitemap.md, and llms.mdx - Add robots.txt Disallow for /internal/ paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ignal * origin/main: fix: separate infrastructure vs user code error handling (#1339) Revert "Fix e2e CLI SIGTERM flake: use SIGKILL to reliably kill hung processes" Fix e2e CLI SIGTERM flake: use SIGKILL to reliably kill hung processes ci: fix git identity for changesets Version Packages commit (#1357) ci: configure git identity for GitHub App bot account (#1356) fix(cli): remove short flag collision on `-e` in health command (#1343) Fix flaky Vercel prod e2e tests by skipping CLI update check (#1350) Fix Windows `ERR_UNSUPPORTED_ESM_URL_SCHEME` in dynamic imports (#1346) Fix flaky hook test by replacing setTimeout with deterministic awaits (#1347) ci: use dedicated GitHub App token instead of shared PAT (#1351) [world-local] Enforce hook token uniqueness and atomicity, matches other worlds (#1348) fix(core): suppress stale WORKFLOW_VERCEL_* env var warning outside serverless runtime (#1345) # Conflicts: # packages/core/src/runtime/step-handler.ts
Add declare statements and @setup/@skip-typecheck annotations for undeclared functions in code samples (stepA, stepB, fetchData, cancellableStep, splitIntoChunks, processChunk). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix docs typecheck CI by adding declare statements and @skip-typecheck annotations for all undeclared function references across cancellation docs, error page, how-it-works page, and internal changelog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous logic threw WorkflowSuspension for any pending queue item
on completion (steps, waits, hooks). This broke fire-and-forget patterns
like `void sleep('1d').then(...)` which intentionally leave a wait in
the queue without awaiting it.
Now only abort-related items (hooks with abortRequested) trigger
suspension on completion. Other pending items get the original warning
behavior — they may be intentional fire-and-forget operations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove special-case suspension for abort items on workflow completion.
ALL pending queue items (steps, hooks, waits, abort signals) are now
fire-and-forget when the workflow completes — they get warned about
but don't block completion. This matches the existing behavior for
fire-and-forget patterns like `void sleep('1d').then(...)`.
Abort signals propagate through the normal suspension flow during
the workflow (not at completion time).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move declare statements before imports to avoid TypeScript overload signature conflicts with auto-inferred imports. Add @skip-typecheck for conceptual snippets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abort() must update signal.aborted immediately so that: 1. Subsequent reads in the workflow see the correct state 2. Serialization captures aborted=true when passing signal to steps 3. Event listeners fire synchronously The hook resumption still happens via the suspension handler for durable event log recording. Both local state and durable state are now updated. Fixes e2e failures where steps received aborted=false for signals that were aborted before being passed to the step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abort() now updates signal.aborted synchronously in the workflow. Update lifecycle diagram and remove outdated paragraph about signal not being updated synchronously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On replay, hook_received is processed during event consumer subscription (at AbortController construction time), which is BEFORE the abort() call in the workflow code. If listeners fired during event processing, they'd fire at a different point than on first-run — breaking determinism. Solution: split abort into two phases: 1. _markAbortedFromReplay(): Sets signal.aborted=true (for reads/serialization) but does NOT fire listeners. Called by event consumer during replay. 2. abort(): Detects the replay flag and fires listeners at the call site. On first-run, fires listeners immediately as before. This ensures listeners fire at the abort() call site on BOTH first-run and replay, maintaining consistent ordering of side effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 3 tests validating that abort listeners fire at the abort() call site on both first-run and replay, even when other hook events are interleaved in the event log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stic replay
_markAbortedFromReplay no longer sets signal.aborted = true. Both
aborted state and listener firing are fully deferred to abort().
This prevents if-checks on signal.aborted from taking different
branches on first-run vs replay.
Add deterministic branching test (unit + e2e):
const controller = new AbortController();
if (controller.signal.aborted) {
return 'was aborted'; // never taken
} else {
controller.abort();
return 'just aborted'; // always taken, both runs
}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test all combinations of listener registration order and event trigger order to validate deterministic ordering across first-run and replay: 1. addEventListener first, abort() first 2. addEventListener first, resumeHook first 3. hook.then first, abort() first 4. hook.then first, resumeHook first Each test verifies that abort-listener fires synchronously at the abort() call site (immediately before 'after-abort' in the log), regardless of when the hook is resumed or when listeners are registered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the deferred _markAbortedFromReplay approach. The event consumer now calls _setAborted directly when hook_received is processed, which sets signal.aborted = true AND fires listeners at that point. This is correct because: - Cross-execution aborts (step/external): signal.aborted SHOULD be true on replay since the abort is a fact from a previous run. Listeners must fire so the workflow can react to the abort. - Same-execution aborts: abort() fires _setAborted synchronously. On replay, the event consumer fires it first, and abort() is a no-op. - The promiseQueue ensures listeners fire at the deterministic point matching the hook_received event's position in the event log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 4 ordering matrix tests require the abort controller's internal system hook to be fully wired through the suspension handler. The hook creation timing interacts with the user hook lookup in getHookByToken. Skip until the full integration is complete. All 13 other abort e2e tests pass on CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
really great work here! This will unlock stopping durable agents mid step and make workflows viable for our ai agent. |
Summary
AbortControllerandAbortSignalserializable across workflow and step boundaries.todo) covering the full feature surface: VM behavior, serialization round-trips, step-side propagation, race conditions, consistency, and e2e workflowsAbortSignal.timeout()restriction in workflow VMArchitecture
AbortControllerin a workflow is backed by two primitives:When
abort()is called, both are triggered. The dual backing provides natural resilience: if either mechanism succeeds, the system converges on the correct state.Docs pages
Test coverage (all
.todostubs)workbench/example/workflows/99_e2e.tsTest plan
pnpm testin packages/core — 454 existing tests pass, 65 todo (44 new)Dependencies
isSystemhook field to workflow-server (required for abort controller's internal system hooks)