Skip to content

Snapshot Runtime: QuickJS WASM VM with snapshot/restore for workflow execution#1300

Open
TooTallNate wants to merge 134 commits intomainfrom
snapshot-runtime
Open

Snapshot Runtime: QuickJS WASM VM with snapshot/restore for workflow execution#1300
TooTallNate wants to merge 134 commits intomainfrom
snapshot-runtime

Conversation

@TooTallNate
Copy link
Copy Markdown
Member

@TooTallNate TooTallNate commented Mar 9, 2026

Summary

Implements the snapshot-based workflow runtime described in RFC #1298. Instead of replaying the full event log on every workflow handler invocation, workflows run inside a QuickJS WASM VM that is snapshotted at suspension points and restored on resumption — so each invocation only fetches and processes events that arrived since the last save.

The snapshot runtime is the default in this PR. The previous event-replay runtime remains available as an opt-out via WORKFLOW_RUNTIME=replay or executionContext.workflowRuntime: 'replay'.

How it works

  1. Workflow code runs inside a QuickJS WASM VM.
  2. When the workflow awaits a step / hook / sleep, the VM suspends and its heap is serialized.
  3. Bytes go through a compress → encrypt pipeline (zstd on Node 22.15+, gzip fallback; AES-256-GCM when an encryption key is configured) and are persisted via world.snapshots.save.
  4. On the next workflow handler invocation, world.snapshots.load returns the bytes, the inverse decrypt → decompress pipeline restores them, and vm.restore() resumes the VM at the exact suspension point.
  5. The runtime fetches only events newer than the snapshot's eventsCursor, processes them, and either resolves to a result, suspends on a new pending op, or fails.

Most of the snapshot-runtime work lives in @workflow/core (runtime/snapshot-runtime.ts, runtime/snapshot-entrypoint.ts, serialization/compression.ts, serialization/vm-bundle-entry.ts); each world implements snapshots.save / load / delete for its storage backend.

Scope of this PR

  • @workflow/core: snapshot runtime, VM bootstrap, event-cursor-driven resume, deterministic correlationIds (seeded ULIDs across concurrent VM invocations of the same resumption), encryption and compression pipeline, WORKFLOW_RUNTIME env-var dispatch with replay-runtime fallback, OTel spans/attributes for the snapshot lifecycle, CI-visible diagnostic checkpoints (SNAPSHOT_DIAG).
  • @workflow/world: new Snapshots interface (save / load / delete) and metadata schema.
  • @workflow/world-vercel: workflow-server snapshot endpoints (PUT/GET/DELETE /v2/runs/:runId/snapshot), opaque-bytes transport, switch to undici.request() for retry-with-Buffer-body correctness, atomic per-(run, correlation) uniqueness for entity-creating events.
  • @workflow/world-postgres: new workflow_snapshots table, unique partial index on workflow_events(run_id, correlation_id, type) for entity-creating events.
  • @workflow/world-local: filesystem-backed snapshot storage ({runId}.bin + {runId}.json), atomic correlationId uniqueness for step_created / wait_created.
  • CI: vitest plugin matrix split across [snapshot, replay], full Vercel-prod E2E coverage of the snapshot runtime across 11 frameworks.

Custom serializers (Symbol.for('workflow-serialize') / Symbol.for('workflow-deserialize')) and workflow-side DOMException / WorkflowFunction round-trip through the VM serde bundle alongside the standard reducers.

Out of scope / future work

  • A dedicated CLI command to fetch Vercel function logs by runId (getVercelFunctionLogs was removed from the e2e diagnostic harness — belongs in its own PR).
  • Workflow-bundle bloat (the QuickJS heap snapshot is dominated by the user's compiled bundle, which today inlines @opentelemetry/api, zod, ai-sdk, etc. — tree-shaking those out is a builder-side change worth pursuing later).
  • Performance tuning for very-many-step workflows on cloud worlds (per-step round-trip is currently dominated by snapshot.save + storage RTT; further work could batch saves or skip them entirely for ops the runtime can recompute).

Based on serialization-refactor (PR #1299).

…refix

Start of the serialization refactor (separate from snapshot-runtime).

New files:
- serialization/types.ts — SerializationFormat enum, SerializableSpecial
  interface, Reducers/Revivers types
- serialization/codec.ts — Codec interface with formatPrefix, serialize,
  deserialize, and optional deserializeLegacy
- serialization/format.ts — Format prefix encode/decode/peek, moved from
  the monolithic serialization.ts

The Codec interface enables future alternative formats (CBOR, JSON) while
keeping the devalue implementation as the current default.
Serialization refactor Phase 1: create the new module structure alongside
the existing monolithic serialization.ts (which continues to work).

New files:
- serialization/reducers/common.ts — Date, Error, Map, Set, URL, BigInt,
  typed arrays, Headers, Request, Response, RegExp, URLSearchParams
- serialization/reducers/class.ts — Class/Instance with WORKFLOW_SERIALIZE/
  DESERIALIZE support
- serialization/reducers/step-function.ts — StepFunction with closure vars
- serialization/codec-devalue.ts — devalue Codec implementation
- serialization/encryption.ts — composable encrypt/decrypt layer
- serialization/workflow.ts — synchronous, no encryption, for VM use
- serialization/step.ts — async with encryption, for step handler
- serialization/client.ts — async with encryption, for start() API
- serialization/index.ts — re-exports all public API
- serialization/serialization.test.ts — 25 focused tests

All modes compose their reducer/reviver sets from the shared building blocks.
Cross-mode compatibility verified: data serialized in any mode can be
deserialized in any other mode (for common types).

Existing 108 serialization tests continue to pass unchanged.
- Add ./serialization/workflow export to @workflow/core package.json
- Add ./internal/serialization re-export to workflow meta-package
- The workflow bundle can now import serialize/deserialize via:
  import { serialize, deserialize } from 'workflow/internal/serialization'

Full test suite passes: 493 tests across 22 files (including 25 new
serialization module tests).
1. Fix reducer composition order: Class/Instance reducers now come BEFORE
   common reducers in all three modes (workflow, step, client). This ensures
   custom Error subclasses with WORKFLOW_SERIALIZE are handled by the
   Instance reducer before the generic Error reducer (devalue uses
   first-match-wins semantics).

2. Fix encryption decrypt() to fail fast when encrypted data is encountered
   without a decryption key, instead of silently returning encrypted bytes
   that would fail later with an unhelpful format error.

3. Remove Request/Response from common reducers — they don't have matching
   common revivers, so including them caused asymmetric behavior (serialize
   as Request, deserialize as plain object). Request/Response handling
   belongs in mode-specific modules that can provide proper revivers.

4. Document Node.js dependency in the workflow serialization re-export.
   The current implementation uses node:util and Buffer. For the QuickJS
   VM (snapshot runtime), these will need polyfills — tracked separately.
The Codec interface now takes a SerializationMode ('workflow', 'step',
'client') instead of raw reducers/revivers. The reducer/reviver
composition is internal to the devalue codec implementation.

This is the right abstraction because reducers/revivers are devalue-
specific concepts. A future CBOR codec would handle Date, typed arrays,
Map, Set natively via the CBOR type system — it wouldn't use reducers
at all. A JSON codec would only support standard JSON types.

The mode-specific modules (workflow.ts, step.ts, client.ts) are now
simpler — they just pass the mode string to the codec.
The format prefix is now a branded string type validated by
isFormatPrefix() — any 4-character [a-z0-9] string is valid.
This removes the hard-coded enum of known formats, making the system
truly open for extension:

  type FormatPrefix = string & { __brand: 'FormatPrefix' };
  function isFormatPrefix(value: string): value is FormatPrefix;

The SerializationFormat object still provides well-known constants
('devl', 'encr') but they're now just typed constants, not an
exhaustive enum.

peekFormatPrefix() and decodeFormatPrefix() use isFormatPrefix() for
validation instead of checking against a known list. Unknown but valid
prefixes (e.g. 'cbor', 'json', 'v2b1') are accepted — the caller
decides whether they can handle the format.

6 new isFormatPrefix tests covering: valid strings, too short, too long,
uppercase, special characters. 1 new test for unknown-but-valid prefixes.
Proves that data serialized by the new modules can be deserialized
by the old serialization.ts functions, and vice versa. This validates
that the new modules are wire-format compatible and safe for incremental
migration:

- new workflow.serialize → old hydrateStepReturnValue (primitives, Date, Map, nested)
- old dehydrateStepReturnValue → new workflow.deserialize (primitives, Date, nested)
- old dehydrateWorkflowArguments → new workflow.deserialize
- new client.serialize → old hydrateWorkflowArguments
- new step.serialize + encryption → old hydrateStepArguments + decryption
- old dehydrateStepArguments + encryption → new step.deserialize + decryption

All 11 tests pass, confirming the new and old modules produce identical
wire formats and can coexist during the migration.
Phase 1 of the VM snapshot runtime (RFC #1298).

World interface changes (packages/world):
- Add SnapshotMetadata type (lastEventId, createdAt) with zod schema
- Add snapshots sub-interface to Storage: save(), load(), delete()
- Export new types and schema from @workflow/world

world-local implementation (packages/world-local):
- Filesystem-based snapshot storage in {dataDir}/snapshots/
- {runId}.bin for serialized VM snapshot data
- {runId}.json for metadata (lastEventId, createdAt)
- save() overwrites existing snapshots (atomic via ensureDir + write)
- load() returns null if no snapshot exists
- delete() removes both files
- Wired into createStorage() with tracing instrumentation
Phase 2 of the VM snapshot runtime (RFC #1298).

- Add quickjs-wasi dependency to @workflow/core
- Create snapshot-runtime.ts with the basic structure:
  - runSnapshotWorkflow() entry point
  - Fresh VM creation with deterministic WASI clock and seeded Math.random
  - Snapshot restore path (TODO: event processing)
  - Host function stubs for useStep, sleep, createHook via Symbol.for()
  - Interrupt handler (30s timeout)
  - Memory limit (64MB)
  - Snapshot serialization on suspension

The useStep, sleep, and createHook host functions are stubs with TODO
markers — the basic VM lifecycle and snapshot/restore flow is in place.
Demonstrates the core snapshot/restore mechanism with a compiled
workflow pattern:
- useStep implemented inside QuickJS as JS code (not host functions)
- Pending step resolve/reject functions stored on globalThis.__resolvers
- Step metadata (stepId, args) preserved across snapshot/restore
- Multi-step workflow: snapshot at each suspension, restore and resolve,
  workflow continues from exact suspension point
- Both tests pass: simple workflow + metadata preservation
The snapshot runtime (runSnapshotWorkflow) now handles the complete
workflow lifecycle:

- First run: bootstrap VM with workflow primitives, evaluate compiled
  workflow bundle, start workflow function, process any existing events
- Snapshot: capture VM state when workflow suspends on step/sleep
- Restore: deserialize snapshot, process delta events to resolve/reject
  pending promises, execute pending jobs
- Completion: detect workflow result or error

Workflow primitives (useStep, sleep) are implemented as JavaScript code
inside the QuickJS VM, not as host function callbacks. This keeps the
implementation simple — the host communicates by evaluating small JS
snippets to resolve/reject promises.

7 tests covering: simple completion, step suspension, snapshot/restore
with step completion, multi-step across 3 snapshots, sleep suspension
and wake, step failure with try/catch.
…napshot flag

- Add snapshot-entrypoint.ts that handles the full lifecycle:
  snapshot load → event fetching → runSnapshotWorkflow → result handling
  (create events, queue steps, save/delete snapshots)
- Add feature flag: set WORKFLOW_RUNTIME=snapshot to use the new runtime
- When enabled, the snapshot path runs before the event-replay path
- Step queuing matches the existing step handler's expected payload format
- Wait handling includes timeout calculation for delayed re-queuing
- Extract workflow ID from SWC-compiled bundle's manifest comment
The snapshot runtime now successfully:
1. Evaluates the compiled workflow bundle in QuickJS
2. Suspends on the first step call
3. Snapshots the VM state
4. Creates step_created events and queues step execution

Web API stubs added for TransformStream, ReadableStream, WritableStream,
TextEncoder, TextDecoder, Headers, URL, console — these are referenced
by the compiled bundle but not needed for basic step/sleep workflows.

Remaining issue: step_created events use raw JSON for step input args,
but the step handler expects devalue-serialized data. This is the data
serialization boundary that needs to be resolved (RFC #1298 discusses
moving devalue inside the QuickJS VM).
…untime

The step_created events now contain properly devalue-serialized input
data (Uint8Array with 'devl' format prefix) instead of raw JSON.
This makes the step handler's hydrateStepArguments() work correctly.

When processing step_completed events, the output is deserialized
via workflow.deserialize() on the host side before passing to the
QuickJS VM as JSON. This handles the devalue format prefix correctly.

Also properly serializes the run_completed output.
Step arguments are now wrapped in { args: [...], closureVars?: {...} }
before being serialized with workflow.serialize(), matching the format
expected by the step handler's hydrateStepArguments().

The step handler successfully:
- Receives the step message
- Deserializes the step arguments
- Executes the step function (add(10, 7))
- Handles retry on retryable errors
- Completes the step and re-queues the workflow
New files:
- serialization/base64.ts — pure-JS base64 encode/decode (no Buffer)
- serialization/reducers/common-vm.ts — VM-compatible reducers using
  instanceof Error instead of types.isNativeError(), pure-JS base64
  instead of Buffer
- serialization/codec-devalue-vm.ts — devalue codec using VM reducers
- serialization/workflow-vm.ts — VM workflow serialize/deserialize

The VM serializer produces the EXACT same wire format as the Node.js
serializer (devl-prefixed devalue data). Verified by 14 tests including
critical cross-compatibility:
- VM serialize → Node.js hydrateStepArguments (step handler path)
- Node.js dehydrateStepReturnValue → VM deserialize (step result path)
- Pure-JS base64 matches Node.js Buffer base64

Sub-path export: @workflow/core/serialization/workflow-vm
Re-export: workflow/internal/serialization now points to workflow-vm
Data now flows as format-prefixed devalue bytes (devl + devalue.stringify)
across the VM boundary, with no JSON conversion in the middle:

Step args: VM __wdk_serialize({args}) → Uint8Array → event input
Step results: event output Uint8Array → VM __wdk_deserialize → value
Workflow result: VM __wdk_serialize(result) → Uint8Array → event output

Host functions __wdk_serialize/__wdk_deserialize are installed on
globalThis and use the VM-compatible workflow serializer (pure JS,
no Node.js deps). They are re-installed after snapshot restore since
host callbacks don't survive the snapshot.

VM-compatible serializer (workflow-vm.ts) produces the EXACT same
wire format as the Node.js serializer — verified by cross-compatibility
tests.
The serializer (devalue + reducers + TextEncoder/TextDecoder polyfills)
is now bundled as a 16.6KB IIFE that's evaluated inside the QuickJS VM
during bootstrap. The serialize/deserialize functions are real JS
functions running inside the VM, operating on QuickJS-native values
(Date, Map, Set, etc.) that can't cross the VM boundary via dump().

Architecture:
- vm-bundle-entry.ts is bundled by esbuild into a self-contained IIFE
- esbuild inject option ensures TextEncoder/TextDecoder polyfills run
  before any module-level code
- The host only passes opaque Uint8Array blobs (devl-prefixed devalue)
  across the VM boundary
- On snapshot restore, the serde functions survive in the QuickJS heap
  (no re-registration needed)

New files:
- polyfills/text-encoder.ts — pure JS TextEncoder (from nx.js)
- polyfills/text-decoder.ts — pure JS TextDecoder (from nx.js)
- polyfills/install-text-coding.ts — installs polyfills on globalThis
- serialization/vm-bundle-entry.ts — esbuild entry for VM serde bundle
- runtime/vm-serde-bundle.generated.ts — auto-generated bundle string
- scripts/build-vm-serde-bundle.js — build script (runs during pnpm build)

Removed: installSerdeHostFunctions (no longer needed — serde is in-VM)
…ecution

The snapshot metadata now stores eventsCursor (the pagination cursor from
events.list()) instead of lastEventId (the raw event ID). The world-local
pagination expects cursors in 'timestamp|id' format, not raw event IDs.

This fix enables the full workflow lifecycle:
1. First invocation: QuickJS VM evaluates workflow, suspends on step_0
2. Step handler executes add(10, 7) = 17
3. Second invocation: snapshot restored, step_0 resolved, suspends on step_1
4. Step handler executes add(17, 8) = 25
5. Third invocation: snapshot restored, both steps resolved, workflow completes
6. run_completed event created, snapshot cleaned up

Verified end-to-end with the nextjs-turbopack workbench:
- All events created correctly (run_created → run_completed)
- Step retries work (the add function throws on first attempt)
- Snapshots are saved/restored/deleted at correct lifecycle points
- Run status transitions to 'completed'
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 9, 2026

🦋 Changeset detected

Latest commit: 8349c88

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 20 packages
Name Type
@workflow/core Minor
@workflow/world-local Minor
@workflow/world-postgres Minor
@workflow/world-vercel Minor
@workflow/builders Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/vitest Patch
@workflow/web-shared Patch
@workflow/web Patch
workflow Minor
@workflow/world-testing Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch
@workflow/ai Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment May 1, 2026 7:21am
example-nextjs-workflow-webpack Ready Ready Preview, Comment May 1, 2026 7:21am
example-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-astro-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-express-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-fastify-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-hono-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-nitro-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-nuxt-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-sveltekit-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workbench-vite-workflow Ready Ready Preview, Comment May 1, 2026 7:21am
workflow Error Error May 1, 2026 7:21am
workflow-docs Ready Ready Preview, Comment, Open in v0 May 1, 2026 7:21am
workflow-nest Ready Ready Preview, Comment May 1, 2026 7:21am
workflow-swc-playground Ready Ready Preview, Comment May 1, 2026 7:21am
workflow-web Ready Ready Preview, Comment May 1, 2026 7:21am

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 9, 2026

🧪 E2E Test Results

All tests passed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 1956 0 134 2090
✅ 💻 Local Development 2108 0 172 2280
✅ 📦 Local Production 2108 0 172 2280
✅ 🐘 Local Postgres 2108 0 172 2280
✅ 🪟 Windows 190 0 0 190
✅ 📋 Other 534 0 36 570
Total 9004 0 686 9690

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro-replay 88 0 7
✅ astro-snapshot 88 0 7
✅ example-replay 88 0 7
✅ example-snapshot 88 0 7
✅ express-replay 88 0 7
✅ express-snapshot 88 0 7
✅ fastify-replay 88 0 7
✅ fastify-snapshot 88 0 7
✅ hono-replay 88 0 7
✅ hono-snapshot 88 0 7
✅ nextjs-turbopack-replay 93 0 2
✅ nextjs-turbopack-snapshot 93 0 2
✅ nextjs-webpack-replay 93 0 2
✅ nextjs-webpack-snapshot 93 0 2
✅ nitro-replay 88 0 7
✅ nitro-snapshot 88 0 7
✅ nuxt-replay 88 0 7
✅ nuxt-snapshot 88 0 7
✅ sveltekit-replay 88 0 7
✅ sveltekit-snapshot 88 0 7
✅ vite-replay 88 0 7
✅ vite-snapshot 88 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable-replay 89 0 6
✅ astro-stable-snapshot 89 0 6
✅ express-stable-replay 89 0 6
✅ express-stable-snapshot 89 0 6
✅ fastify-stable-replay 89 0 6
✅ fastify-stable-snapshot 89 0 6
✅ hono-stable-replay 89 0 6
✅ hono-stable-snapshot 89 0 6
✅ nextjs-turbopack-canary-replay 76 0 19
✅ nextjs-turbopack-canary-snapshot 76 0 19
✅ nextjs-turbopack-stable-replay 95 0 0
✅ nextjs-turbopack-stable-snapshot 95 0 0
✅ nextjs-webpack-canary-replay 76 0 19
✅ nextjs-webpack-canary-snapshot 76 0 19
✅ nextjs-webpack-stable-replay 95 0 0
✅ nextjs-webpack-stable-snapshot 95 0 0
✅ nitro-stable-replay 89 0 6
✅ nitro-stable-snapshot 89 0 6
✅ nuxt-stable-replay 89 0 6
✅ nuxt-stable-snapshot 89 0 6
✅ sveltekit-stable-replay 89 0 6
✅ sveltekit-stable-snapshot 89 0 6
✅ vite-stable-replay 89 0 6
✅ vite-stable-snapshot 89 0 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable-replay 89 0 6
✅ astro-stable-snapshot 89 0 6
✅ express-stable-replay 89 0 6
✅ express-stable-snapshot 89 0 6
✅ fastify-stable-replay 89 0 6
✅ fastify-stable-snapshot 89 0 6
✅ hono-stable-replay 89 0 6
✅ hono-stable-snapshot 89 0 6
✅ nextjs-turbopack-canary-replay 76 0 19
✅ nextjs-turbopack-canary-snapshot 76 0 19
✅ nextjs-turbopack-stable-replay 95 0 0
✅ nextjs-turbopack-stable-snapshot 95 0 0
✅ nextjs-webpack-canary-replay 76 0 19
✅ nextjs-webpack-canary-snapshot 76 0 19
✅ nextjs-webpack-stable-replay 95 0 0
✅ nextjs-webpack-stable-snapshot 95 0 0
✅ nitro-stable-replay 89 0 6
✅ nitro-stable-snapshot 89 0 6
✅ nuxt-stable-replay 89 0 6
✅ nuxt-stable-snapshot 89 0 6
✅ sveltekit-stable-replay 89 0 6
✅ sveltekit-stable-snapshot 89 0 6
✅ vite-stable-replay 89 0 6
✅ vite-stable-snapshot 89 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable-replay 89 0 6
✅ astro-stable-snapshot 89 0 6
✅ express-stable-replay 89 0 6
✅ express-stable-snapshot 89 0 6
✅ fastify-stable-replay 89 0 6
✅ fastify-stable-snapshot 89 0 6
✅ hono-stable-replay 89 0 6
✅ hono-stable-snapshot 89 0 6
✅ nextjs-turbopack-canary-replay 76 0 19
✅ nextjs-turbopack-canary-snapshot 76 0 19
✅ nextjs-turbopack-stable-replay 95 0 0
✅ nextjs-turbopack-stable-snapshot 95 0 0
✅ nextjs-webpack-canary-replay 76 0 19
✅ nextjs-webpack-canary-snapshot 76 0 19
✅ nextjs-webpack-stable-replay 95 0 0
✅ nextjs-webpack-stable-snapshot 95 0 0
✅ nitro-stable-replay 89 0 6
✅ nitro-stable-snapshot 89 0 6
✅ nuxt-stable-replay 89 0 6
✅ nuxt-stable-snapshot 89 0 6
✅ sveltekit-stable-replay 89 0 6
✅ sveltekit-stable-snapshot 89 0 6
✅ vite-stable-replay 89 0 6
✅ vite-stable-snapshot 89 0 6
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack-replay 95 0 0
✅ nextjs-turbopack-snapshot 95 0 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable-replay 89 0 6
✅ e2e-local-dev-nest-stable-snapshot 89 0 6
✅ e2e-local-postgres-nest-stable-replay 89 0 6
✅ e2e-local-postgres-nest-stable-snapshot 89 0 6
✅ e2e-local-prod-nest-stable-replay 89 0 6
✅ e2e-local-prod-nest-stable-snapshot 89 0 6

📋 View full workflow run

Copy link
Copy Markdown
Contributor

@vercel vercel Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

The Storage interface requires a snapshots property but packages/world-vercel/src/storage.ts does not implement it, causing TypeScript build failures (TS2741).

Fix on Vercel

- Extract workflow arguments from run_created event and pass to the
  workflow function via __wdk_deserialize()
- Call executePendingJobs() after each step_completed/step_failed/
  wait_completed event to allow async function await resumptions
  to unwind one step at a time
- Add debug logging for workflow result bytes

The addTenWorkflow e2e test is still failing: the workflow result bytes
are 'devl-1' (devalue for undefined) even though all steps complete
successfully. The issue appears to be that the async function return
value is not propagating through the SWC-compiled workflow bundle's
promise chain. This needs investigation — the unit tests with simple
inline workflow code work correctly.
Adds snapshot.* semantic conventions and threads the parent
`WORKFLOW {workflowName}` span into the snapshot entrypoint and VM
runner so operators can see snapshot-restore latency, snapshot size,
encrypt/decrypt overhead, and event-fetch behavior in their traces.

Attributes attached to the parent span:
- snapshot.runtime ('snapshot' | 'replay')
- snapshot.invocation_kind ('first' | 'restore')
- snapshot.outcome ('completed' | 'suspended' | 'failed')
- snapshot.events.preloaded, .fetched_count, .fetched_pages
- snapshot.pending_ops_count, .events_cursor
- snapshot.{load,save,delete,decrypt,encrypt,deserialize,serialize}.duration_ms
- snapshot.{load,save}.bytes, snapshot.save.plaintext_bytes

Two child spans:
- snapshot.load — wraps world.snapshots.load + decrypt (deserialize
  duration is recorded as an attribute since it occurs inside the
  VM runner where the load span is no longer in scope).
- snapshot.save — wraps QuickJS.serializeSnapshot + encrypt +
  world.snapshots.save.

No metrics histograms — the codebase has no metric pipeline yet, so
this matches the existing attributes-on-spans convention used by the
replay runtime.
Previously the seedrandom seed for each VM invocation was
`runId:workflowName:startedAt` — constant across all resumptions of
a run. Each restore re-initialized the RNG from that same seed and
replayed the first-N draws, so the VM's `__generateUlid` and
`__generateNanoid` produced identical IDs on every resumption.
That collapsed the hasCreatedEvent dedup guard and caused step /
hook correlation IDs to drift between invocations.

Mix `existingSnapshot.metadata.eventsCursor` into the seed when
restoring. The cursor is stable for retries of the same resumption
(idempotent within a single resume) but advances across resumes,
which is exactly the determinism boundary we want.
…invocations

Two queue messages for the same workflow run can be processed
concurrently by separate workflow handler instances. The replay
runtime is naturally idempotent (full event-log replay produces
deterministic correlationIds via the seeded PRNG), but the snapshot
runtime previously used `ulid(Date.now())` for correlationIds —
concurrent VMs hit it at slightly different ms and produced different
ULIDs even though the seeded PRNG portion was identical. The world
had no way to dedup these as duplicates, so a single logical step
became two step_created events with two independent step handlers.
For workflows like fibonacciWorkflow that do
`Promise.all([runA.returnValue, runB.returnValue])`, this manifested
as 4 step_created events for 2 logical operations, with 2 of the 4
`Run#returnValue` proxies hanging because nothing wrote their
step_completed.

Inject a deterministic timestamp (`workflowRun.startedAt`, constant
per-run) into the VM as `__ulidTimestamp`. The bundle's
`__generateUlid` reads it instead of `Date.now()` when present,
so concurrent VMs produce identical ULIDs. Distinctness across
resumptions still comes from the cursor mixed into the seedrandom
seed, which advances the PRNG sequence between resumes.
Three unit tests covering:
- Same fresh start (no snapshot) → identical correlationIds across two
  concurrent invocations.
- Same restore (snapshot + same events) → identical correlationIds
  across two concurrent invocations.
- Different resume (cursor advanced) → distinct correlationIds across
  resumes (so EntityConflictError doesn't falsely dedup unrelated
  steps).

The first two tests fail against the pre-fix runtime (different
ULID timestamp portions across concurrent invocations); the third
test was already passing pre-fix because the cursor-mixed seedrandom
seed already produced distinct random portions across resumes.
…-local

Concurrent invocations producing identical correlationIds (as the snapshot
runtime does by design across replays) previously both succeeded and
persisted duplicate events. step_created had no guard at all; wait_created
used a TOCTOU read-then-check that allowed both writers through under
concurrency. Both now claim a per-(runId, correlationId) constraint file
with O_CREAT|O_EXCL before writing, so the loser surfaces as
EntityConflictError — which the runtime's dedup catch path already
handles.
…in world-postgres

Adds a unique partial index on workflow_events(run_id, correlation_id, type)
filtered to step_created/hook_created/wait_created, and translates the
resulting unique-violation (pg code 23505, surfaced via DrizzleQueryError.cause)
into EntityConflictError. The steps table already deduped via
onConflictDoNothing, but the event row still inserted, leaving duplicate
events in the log. Now both rows are kept consistent and the runtime's
existing dedup catch path handles concurrent writers cleanly.
Three coupled changes in the snapshot entrypoint's suspension handler:

1. Build per-pending-op promises and await them with Promise.all instead
   of running them in a sequential for-loop. Mirrors the replay runtime's
   suspension-handler.ts pattern.
2. Run snapshot.save concurrently with the op dispatch via the same
   Promise.all. The snapshot is an optimization — if save lags or fails,
   the next workflow invocation simply replays from events. Previously
   blocked step queueing on a full storage round-trip.
3. Drop the redundant hooks.list pre-check from the hook_created branch.
   With deterministic correlationIds (snapshot runtime PRNG fix) and
   per-(runId, correlationId) uniqueness in worlds (world-local +
   world-postgres dedup fixes), EntityConflictError on events.create
   is the correct dedup signal and the pre-check is an unnecessary
   round-trip per pending hook.

CI run 25095263499 measured snapshot ~2.37x slower than replay per-test
on Vercel (sum: 2418s vs 1021s); these changes should narrow that gap
considerably on cloud worlds where each storage call is a network
round-trip.
Hook-related e2e tests (hookWorkflow, hookCleanupTestWorkflow,
hookDisposeTestWorkflow, hookWithSleepWorkflow, distributedAbortController)
previously slept a fixed 5 seconds before calling getHookByToken to wait
for the hook to be registered. On slower runtimes — notably the snapshot
runtime on Vercel where each workflow round-trip is several seconds longer
than replay — that fixed budget is too tight and the test fails with
HookNotFoundError. On faster runtimes it's unnecessarily slow.

Adds a waitForHook(token, { timeoutMs, intervalMs, runId }) helper that
polls until the hook resolves or the timeout (default 30s) expires, with
an optional runId filter for token-reuse tests where eventually-consistent
backends may briefly still report a stale hook. Each hook-wait site now
uses this helper. Non-hook fixed sleeps (workflow-progress polling for
sleepingWorkflow cancel tests, payload-processing waits in
hookWithSleepWorkflow) are unchanged.
The recursion-hazard fixes that motivated the blast-radius cap have all
landed:

  1. Snapshot runtime correlationIds are now deterministic across
     concurrent VM invocations (commit 83bcec — `__ulidTimestamp`
     injection so same-resumption invocations produce identical ULIDs).
  2. The seeded PRNG state is preserved by the VM heap snapshot itself
     (commit a71503 — events cursor mixed into seed; ULID
     monotonicFactory closure persists in the QuickJS heap).
  3. Per-(runId, correlationId) uniqueness is enforced atomically in
     world-local (commit ca0078) and via unique partial index in
     world-postgres (commit 009a00) for step_created / hook_created /
     wait_created.

With those guarantees the duplicate `start()` invocation that previously
fanned out hundreds of thousands of child runs on the fastify deployment
is no longer possible. Restore the full Vercel project matrix
(11 frameworks) and unskip fibonacciWorkflow on Vercel.
…aces

Pipelining world.snapshots.save with the per-pending-op events.create +
queueMessage dispatch (introduced in 22ab779) opened a window where a
fast-completing step could re-invoke the workflow handler before the new
snapshot was persisted. The handler then loads a stale (or missing)
snapshot whose coroutine state doesn't match the latest events, leaving
the workflow stuck.

CI run 25098135190 caught this: fetchWorkflow on Vercel snapshot mode
regressed from ~16s passing to a 60s timeout. Diagnostic showed both
step_completed events landed at +5.5s but no run_completed ever fired.

Restore the original ordering: await snapshot.save fully before any
step is queued. Per-pending-op dispatch within a single suspension still
runs in parallel via Promise.all, which retains the bulk of the
wall-clock reduction (run 25098135190 measured ~568s saved on Vercel
snapshot vs. the pre-parallelize baseline). Only the cross-invocation
pipelining of save with queue is rolled back.
Wedges on Vercel snapshot runtime under concurrent matrix load are
opaque from CI logs alone — the workflow handler runs inside a function
on Vercel and its console output isn't surfaced in the CI job. This
commit adds two pieces of diagnostic plumbing:

1. Always-on checkpoint logs at every major step of the snapshot
   suspension/restore lifecycle (`SNAPSHOT_DIAG`), plus matching
   entry/exit logs in the workflow and step queue handlers
   (`WORKFLOW_HANDLER_DIAG`, `STEP_HANDLER_DIAG`). Each record carries
   a per-invocation id, runId, elapsed time, and structured fields
   (snapshot bytes, events fetched + counts by type, pending op
   summary, outcome, exit action). Emitted at `warn` level so they
   show up in Vercel function logs without DEBUG=1.

2. e2e diagnostic harness extension that fetches matching function
   logs from `/v3/deployments/:id/events` for the wedged runId after
   a test failure and appends them to the existing run-diagnostic
   block. Only runs when `WORKFLOW_VERCEL_AUTH_TOKEN` /
   `WORKFLOW_VERCEL_TEAM` / `VERCEL_DEPLOYMENT_ID` are set
   (i.e. the Vercel-prod CI matrix); silently no-ops elsewhere.

Together these let a failed test surface the function-side activity
for its wedged run \u2014 e.g. whether the snapshot runtime even reached
its post-VM checkpoint, what its last successful save / queue
operation was, whether the next handler invocation ever started, etc.
That visibility is what we need to actually find the wedge cause.
…reserve Buffer body across retries

Wedge root cause for snapshot runtime on Vercel under concurrent matrix
load. The old save() in world-vercel/src/snapshots.ts used:

    fetch(url, { method: 'PUT', body: compressed, dispatcher: getDispatcher() })

where getDispatcher() returns a RetryAgent. fetch() wraps Buffer/Uint8Array
bodies in a one-shot ReadableStream (web fetch spec), so when the
RetryAgent retries on a transient 5xx or network error, the second
attempt has nothing left to read — the iterable yields 0 bytes, undici
detects the mismatch with Content-Length, and throws
UND_ERR_REQ_CONTENT_LENGTH_MISMATCH. With 5–15 MB snapshot bodies the
bug fires under any meaningful network turbulence.

The downstream impact is a permanent wedge:

  1. Save throws -> workflow handler returns 500.
  2. Queue retries the handler with backoff.
  3. Each retry repeats the same save -> same throw -> same 500.
  4. Production logs showed attempt: 19 (≈1.5 hours of retries)
     before the test framework gave up at the 60s test timeout.

Switch to undici.request() (the lower-level API), which hands the Buffer
to the connection layer directly without stream wrapping, so retries
can replay the same body. Verified locally with a vitest regression
test that reproduces the exact production stack trace
(AsyncWriter.end -> writeIterable -> UND_ERR_REQ_CONTENT_LENGTH_MISMATCH)
without the fix and passes with it.

Other world-vercel endpoints (events, hooks, runs, …) hit the same
underlying undici limitation but in practice rarely fail this way: their
bodies are tiny (KB CBOR-encoded payloads), so the chance of network
turbulence mid-stream is much lower. They remain on fetch() for now.
Avoid a guaranteed-404 round-trip to the snapshot storage backend on
the very first workflow handler invocation. The suspension handler in
this file always saves the snapshot BEFORE creating any
step_created / hook_created / wait_created events, so if the events
preloaded by events.create('run_started') contain only run_created /
run_started, no save cycle has run yet and no snapshot can exist.

Detected by the new exported `canSkipSnapshotLoad(preloadedEvents)`
helper, with 8 unit tests covering each event-type combination
(undefined / empty / run_created+run_started / run_started only /
step_* / hook_received / wait_completed). When the helper returns true,
`existingSnapshot` is set to null without calling
`world.snapshots.load()` and the entrypoint falls through to the
first-run path with the preloaded events.

The wfdiag('snapshot_loaded') checkpoint now also reports
`skippedLoad: true` when the fast path was taken so we can confirm
the optimization is firing in production logs.

Reduces 404 noise on workflow-server's `/v2/runs/:runId/snapshot`
endpoint and saves a network round-trip on every initial workflow
invocation. Falls back to the normal load path whenever
`preloadedEvents` is missing or contains any non-initial event.
…ming breakdown

Two changes that go together:

1. New `stripInlineSourceMap()` helper in `source-map.ts` (with 4 unit
   tests). The runtime entrypoint now strips the trailing
   `//# sourceMappingURL=data:…` comment from the workflow bundle
   before passing it to `vm.evalCode()`. The original (unstripped)
   string is kept in the host-side scope so `remapErrorStack` can
   still resolve original source positions on workflow failures.

   The map is purely host-side metadata for stack-trace remapping —
   the VM never reads it. But QuickJS retains source text for
   stack-trace line lookups, so the multi-MB base64 comment was being
   carried into the VM heap and showing up in every snapshot save+load
   round-trip. Empirically, on the example workbench's bundle:
     - Bundle string drops 5.16 MB → 1.20 MB (-77%)
     - QuickJS heap snapshot drops 11.75 MB → 8.00 MB (-32%)
   That maps to ~1s saved per per-step round-trip on Vercel.

2. Extend the `SNAPSHOT_DIAG snapshot_loaded` and
   `SNAPSHOT_DIAG snapshot_saved` checkpoint logs with per-stage byte
   counts and timings:
     - load: returnedBytes (post-decompress, pre-decrypt),
       loadDurationMs (HTTP round-trip), decryptDurationMs
     - save: plaintextBytes (raw QuickJS output),
       handedToWorldBytes (after host-side encrypt),
       encryptDurationMs, storeDurationMs
   So the savings show up in CI-fetched function logs alongside the
   existing OTel attributes. Naming clarified: 'returnedBytes' /
   'handedToWorldBytes' instead of misleading 'wireBytes', because
   the world (e.g. world-vercel) applies its own gzip layer below
   this — true on-the-wire bytes are emitted by world-vercel's own
   diagnostic (separate commit).
Adds `WORLD_SNAPSHOT_DIAG` checkpoint logs to the snapshot save and
load paths. Save reports inputBytes (what the core handed in) →
wireBytes (after gzipSync) → compressionRatio, plus separate
gzipDurationMs and putDurationMs. Load reports the equivalents:
wireBytes (raw HTTP body) → decompressedBytes (after gunzipSync),
plus getDurationMs and gunzipDurationMs. Pairs with the core
`SNAPSHOT_DIAG` checkpoints from the previous commit so the entire
snapshot lifecycle for any wedged run is grep-able by runId in
Vercel function logs.

Also covers the 404 (no-snapshot) case so a core
`skippedLoad: true` checkpoint can be cross-referenced against
the world's view: when both line up, the optimization is firing as
intended; when only one side fires, something's off.

All emitted at `console.warn` level — no DEBUG required, matching
the format/style of the core wfdiag helper.
…able

The snapshot save path was doing the wrong thing: each world (vercel,
postgres, local) gzipped the bytes BEFORE handing them to its
transport, but core's encryption wrapped them AFTER. Net result was
`gzip(encrypt(plain))` on the wire — encryption produces ciphertext
that doesn't compress, so the gzip step was largely wasted CPU.

Flip the order so compression goes BEFORE encryption (the standard
compress-then-encrypt pattern used for at-rest blob encryption — no
CRIME/BREACH applicability here since the snapshot is opaque, no
attacker injection, no per-request size leakage). Move compression
into core so it happens once, in the right place, and so the world
layers can be simplified to opaque-bytes transport.

Codec choice: zstd when available (Node 22.15+), gzip otherwise.
Benchmarked against an 8 MB QuickJS heap snapshot (representative
production payload):

  | codec  | ratio | compress | decompress |
  |--------|-------|----------|------------|
  | zstd-3 | 4.29x |    18 ms |       6 ms |
  | gzip-6 | 4.02x |   127 ms |      11 ms |

zstd is faster AND smaller. The format prefix on each blob (`zstd`
or `gzip`) marks the codec, so deployments running different Node
versions remain interoperable.

Pipeline now:
  - SAVE: serialize → compress → encrypt → world.snapshots.save
  - LOAD: world.snapshots.load → decrypt → decompress → deserialize

`@workflow/core`:
  * New `serialization/compression.ts` with `compress` /
    `decompress` / `isCompressed` / `PREFERRED_CODEC`. 11 unit
    tests covering codec selection, idempotency, format-prefix
    dispatch, legacy-blob passthrough.
  * New SerializationFormat constants `GZIP` / `ZSTD`.
  * `runtime/snapshot-entrypoint.ts` save path: compress → encrypt
    → store. Load path: decrypt → decompress. New byte-count and
    timing fields on `SNAPSHOT_DIAG snapshot_saved` /
    `snapshot_loaded` (compressedBytes, compressionRatio,
    compressionCodec, compressDurationMs, decompressDurationMs).
  * 7 new tests in `runtime/snapshot-encryption.test.ts` covering
    the full pipeline round-trip with and without encryption, plus
    legacy-blob backward compatibility.

`@workflow/world-vercel`:
  * Drop `gzipSync` from save. Body is sent verbatim (already
    compressed+encrypted by core upstream).
  * Drop the `X-Snapshot-Content-Encoding: gzip` header on save.
  * Load still gunzips when the response carries that header — for
    backward compatibility with blobs written by older deployments.

`@workflow/world-postgres`:
  * Drop `gzipSync` / `gunzipSync`. Stores opaque bytes.
    Snapshots table is created per CI run; no migration concern.

`@workflow/world-local`:
  * Save as `{runId}.bin` (was `.bin.gz`). Load still gunzips
    legacy `.bin.gz` files via the `dataFile` metadata so a
    developer's stale `.workflow-data/` directory keeps working.
The compress-then-encrypt pipeline that landed in 519bb1d added
backward-compatibility code to read older snapshot blobs that were
written under the previous SDK-side gzip scheme. The snapshot runtime
is still on the snapshot-runtime feature branch and has no production
deploy, so no such blob has ever been written under the old scheme
that needs to outlive a feature-branch deploy.

world-vercel:
  - Remove the X-Snapshot-Content-Encoding: gzip header round-trip
    on save and load.
  - Drop the gunzipSync import.
  - File header comment no longer mentions back-compat.

world-local:
  - Drop the .bin.gz / dataFile metadata mechanism. Snapshots are
    now always stored as {runId}.bin alongside {runId}.json.
  - Drop the gunzipSync import and the
    LocalSnapshotMetadataSchema extension; metadata is just
    SnapshotMetadataSchema (eventsCursor + createdAt).
  - File-naming helpers extracted as dataPath() / metadataPath().

core: remove the now-irrelevant 'legacy snapshots saved before
compression was added' test from snapshot-encryption.test.ts. The
remaining 'plaintext bytes pass through unchanged' test still
exercises the contract that decryptSerializedData() does not require
prefixed input — that's a real pre-existing API contract used by
non-snapshot callers, not snapshot back-compat.
Replaces 14 incremental per-commit changesets with 4 terse,
package-scoped ones (one each for @workflow/core, world-vercel,
world-postgres, world-local). The detailed per-change context is
preserved in git history; CHANGELOG entries from changesets should
describe what consumers need to know, not the implementation history.
This changeset is part of the serialization-refactor base branch
(introduced in 6add40c) and was incorrectly deleted in the previous
consolidation pass. Only changesets local to the snapshot-runtime
branch should have been consolidated.
The file is regenerated on every build (`scripts/build-vm-serde-bundle.js`)
and is already listed under turbo.json's outputs for caching. Tracking
it just produced noisy diffs whenever someone built the package with a
slightly different esbuild version.
…isites

Standardize on `Symbol.for('workflow-serialize')` /
`Symbol.for('workflow-deserialize')` everywhere — the parallel
`globalThis.__wdk_serialize` / `__wdk_deserialize` aliases have been
removed from `vm-bundle-entry.ts` and the snapshot runtime's inline
JS strings now use the symbol form directly. Single canonical name,
no duplication.

Drop the `?? Math.random` and `?? Date.now()` fallbacks from the
ULID generator setup. Both prerequisites
(`globalThis.__ulidTimestamp` and the host-replaced seeded
`Math.random`) are always set by `snapshot-runtime.ts` before the
serde bundle is evaluated; silently falling back to unseeded
`Math.random` or live `Date.now()` would re-introduce the
non-determinism we deliberately fixed (concurrent VM invocations of
the same resumption must produce identical correlationIds for the
world's EntityConflictError dedup to work). Now throws if
`__ulidTimestamp` isn't a number, and passes the seeded
`Math.random` reference explicitly to `monotonicFactory` so
upstream's `detectPRNG` never runs (it'd throw in QuickJS anyway,
since `crypto` is unavailable).

Drop the `URL` / `URLSearchParams` / `DOMException` availability
guards in `common-vm.ts`. quickjs-wasi's URL extension is always
loaded (`url.so`) and DOMException is always constructible — the
guards were dead code carried over from when those weren't reliably
available. The reducer/reviver code is now straightforward
`instanceof URL` / `new URL(...)` / `new DOMException(...)`.

Remove `packages/core/src/serialization/base64.ts` and its
sub-path exports (`./serialization/workflow`,
`./serialization/workflow-vm`). The pure-JS base64 helpers were
leftover from before `base64.so` shipped `btoa`/`atob` natively;
the VM-side reducers in `common-vm.ts` now build base64 strings via
the native ones. The sub-path exports had zero consumers in this
repo (the same cleanup landed on the `serialization-refactor`
branch in 05e0fee but never made it onto `snapshot-runtime`
because the branches diverged earlier).

Remove `packages/workflow/src/internal/serialization.ts` and its
`./internal/serialization` package.json export. Same story — zero
consumers, previously removed in #1082, then accidentally
reintroduced via `f04fd8e91`.
The `/v3/deployments/:id/events` endpoint mostly returned empty
results in our wedge-debugging usage and the runId-substring filter
made it slow when it did return data. The function-log fetch belongs
in a dedicated diagnostic CLI command rather than baked into the
test diagnostic block. Dropping for now; can be revived in a
follow-up PR if needed.
Updates the per-package changesets to match AGENTS.md guidance and the
current state of the PR:

- Bump from `patch` to `minor` (snapshot runtime is a new feature, not
  a bug fix; correctness matters when the changesets land on `stable`)
- Correct snapshot-runtime-core.md: snapshot is now the default, with
  replay available via `WORKFLOW_RUNTIME=replay` (was incorrectly
  describing snapshot as opt-in)
- Drop the misleading 'enforces uniqueness' line from
  snapshot-runtime-world-vercel.md (no uniqueness work happens in this
  package; that lives in workflow-server)
- Tighten language across all four changesets per AGENTS.md
  ('Keep the changesets terse')
…stack regression

Per CI history (runs 25100278265 vs 25130930859), the regression boundary
for the 'basic step error preserves message and stack trace' /
'cross-file step error preserves message and function names in stack'
e2e tests on astro local-dev is commit 770c433 ('Add CI-visible
runtime diagnostics for snapshot wedges'), NOT the later
9168353 source-map-strip commit. The astro-dev failure reproduces on
both replay and snapshot runtimes with identical symptoms (function name
shows up as `__getOwnPropDesc` instead of the actual step function
name in the source-mapped stack), which rules out any snapshot-runtime
specific cause.

The STEP_HANDLER_DIAG entries were always-on `runtimeLogger.warn`
calls inside the step queue handler. They didn't add real diagnostic
value beyond what the existing OTel spans already cover; their main
purpose was to grep-correlate step activity with SNAPSHOT_DIAG
checkpoints in Vercel function logs during the wedge-debugging session
that's now resolved. SNAPSHOT_DIAG and WORKFLOW_HANDLER_DIAG are kept;
only the STEP_HANDLER_DIAG pair is removed.

The exact mechanism by which the diagnostic warns affect the
`stepFn.apply()` stack frame's source-mapped function name is still
unclear (the most plausible explanation is that the line-shift in
step-handler.ts perturbed Vite's dev-mode module graph in a way that
changes which export getter wraps the step function reference at the
`__copyProps` site shared with the namespace import in
`_workflows.ts`). Reverting the diagnostic is sufficient to
restore the test, and the diagnostic itself is not load-bearing.
@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​quickjs-wasi@​2.0.07010010097100

View full report

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the new default snapshot-based workflow runtime (QuickJS WASM VM with snapshot/restore) and wires snapshot persistence into world backends, while keeping the existing event-replay runtime as an opt-out via WORKFLOW_RUNTIME=replay.

Changes:

  • Add snapshot runtime execution path in @workflow/core (VM bootstrap, snapshot save/load pipeline with compression + optional encryption, runtime-mode dispatch, and new telemetry attributes).
  • Introduce snapshots.save/load/delete to the @workflow/world storage interface and implement it for world-vercel, world-postgres, and world-local.
  • Expand CI/E2E coverage to run tests against both runtimes and reduce E2E flakiness by polling for hook registration instead of fixed sleeps.

Reviewed changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/create-test-matrix.mjs Duplicates app matrix across snapshot and replay runtime axes.
pnpm-lock.yaml Adds quickjs-wasi@2.0.0 lock entries.
packages/world/src/snapshots.ts Adds SnapshotMetadataSchema (eventsCursor, createdAt).
packages/world/src/interfaces.ts Extends Storage with snapshots.save/load/delete.
packages/world/src/index.ts Exposes snapshot types/schema from @workflow/world.
packages/world-vercel/src/storage.ts Wires snapshots into Vercel storage and instrumentation.
packages/world-vercel/src/snapshots.ts Implements snapshot storage via workflow-server snapshot endpoints.
packages/world-vercel/src/snapshots.test.ts Adds tests for PUT body correctness and retry behavior.
packages/world-postgres/test/storage.test.ts Adds tests asserting dedup behavior for entity-creation races.
packages/world-postgres/src/storage.ts Maps pg unique-violation for entity-creating events to EntityConflictError.
packages/world-postgres/src/snapshots.ts Implements Postgres snapshot upsert/load/delete storage.
packages/world-postgres/src/index.ts Wires snapshots storage into Postgres createStorage.
packages/world-postgres/src/drizzle/schema.ts Adds snapshots table + entity-creation partial unique index.
packages/world-postgres/src/drizzle/migrations/meta/_journal.json Registers new migrations in drizzle journal.
packages/world-postgres/src/drizzle/migrations/0010_add_snapshots_table.sql Creates workflow.workflow_snapshots table.
packages/world-postgres/src/drizzle/migrations/0011_add_events_entity_creation_unique_index.sql Adds partial unique index for step/hook/wait creation events.
packages/world-local/src/storage/snapshots-storage.ts Adds filesystem-backed snapshot storage (bytes + metadata files).
packages/world-local/src/storage/index.ts Wires snapshots storage into local storage and instrumentation.
packages/world-local/src/storage/events-storage.ts Adds atomic lock-file dedup for step_created and wait_created.
packages/world-local/src/storage.test.ts Adds race tests for local step/wait creation dedup behavior.
packages/world-local/src/queue.ts Logs queue handler errors with stack for debugging.
packages/core/turbo.json Adds generated VM bundle/assets files to build outputs.
packages/core/src/telemetry/semantic-conventions.ts Adds snapshot runtime semantic convention attributes.
packages/core/src/source-map.ts Adds stripInlineSourceMap() to reduce VM heap/snapshot size.
packages/core/src/source-map.test.ts Tests stripInlineSourceMap() behavior.
packages/core/src/serialization/workflow-vm.ts Adds VM-safe workflow-mode serializer/deserializer.
packages/core/src/serialization/workflow-vm.test.ts Tests VM serializer and VM↔Node compatibility.
packages/core/src/serialization/vm-bundle-entry.ts VM bundle entry: installs serde + deterministic ULID generator.
packages/core/src/serialization/types.ts Adds compression format prefixes (gzip, zstd).
packages/core/src/serialization/reducers/common-vm.ts Adds VM-safe reducers/revivers (base64 via btoa/atob).
packages/core/src/serialization/compression.ts Adds compress/decompress layer with gzip/zstd feature detection.
packages/core/src/serialization/compression.test.ts Tests compression layer behavior and codec selection.
packages/core/src/serialization/compat.test.ts Adds compatibility tests between modular and legacy serialization APIs.
packages/core/src/serialization/codec-devalue.ts Adds clarifying notes about modular modules vs legacy runtime path.
packages/core/src/serialization/codec-devalue-vm.ts Adds VM-compatible devalue codec using VM reducers/revivers.
packages/core/src/runtime/start.ts Propagates WORKFLOW_RUNTIME choice into executionContext.
packages/core/src/runtime/snapshot-runtime.ts Implements QuickJS snapshot/restore runtime engine.
packages/core/src/runtime/snapshot-runtime.test.ts Unit tests for snapshot runtime behavior and determinism.
packages/core/src/runtime/snapshot-entrypoint.ts Integrates snapshot runtime into devkit entrypoint + storage pipeline.
packages/core/src/runtime/snapshot-entrypoint.test.ts Tests snapshot-load skip heuristic.
packages/core/src/runtime/snapshot-encryption.test.ts Tests compress→encrypt→decrypt→decompress contract.
packages/core/src/runtime/runtime-mode.ts Adds WORKFLOW_RUNTIME parsing/validation.
packages/core/src/runtime/runtime-mode.test.ts Tests runtime-mode env parsing.
packages/core/src/runtime.ts Switches default runtime to snapshot with replay fallback.
packages/core/scripts/build-vm-serde-bundle.js Generates VM serde bundle source used by snapshot runtime.
packages/core/scripts/build-quickjs-assets.js Generates embedded quickjs-wasi wasm/extension assets.
packages/core/package.json Adds quickjs-wasi dependency and generators to build script.
packages/core/e2e/e2e.test.ts Replaces fixed hook sleeps with polling helper to reduce flakiness.
packages/core/.gitignore Ignores generated VM bundle/assets files.
.github/workflows/tests.yml Expands CI matrix across runtimes and avoids ARG_MAX in sticky comment.
.changeset/snapshot-runtime-world-vercel.md Changeset for world-vercel snapshot storage + undici.request rationale.
.changeset/snapshot-runtime-world-postgres.md Changeset for world-postgres snapshots + event uniqueness fix.
.changeset/snapshot-runtime-world-local.md Changeset for world-local snapshots + event dedup fix.
.changeset/snapshot-runtime-core.md Changeset for core snapshot runtime default + replay opt-out.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 69 to 72
"scripts": {
"build": "genversion --es6 src/version.ts && tsc",
"build": "genversion --es6 src/version.ts && node scripts/build-vm-serde-bundle.js && node scripts/build-quickjs-assets.js && tsc",
"dev": "genversion --es6 src/version.ts && tsc --watch",
"clean": "tsc --build --clean && rm -rf dist src/version.ts docs ||:",
Comment on lines +224 to +226
* The binary data is stored gzip-compressed in the `data` column.
* Metadata (`eventsCursor`, `createdAt`) lives alongside for cheap loads.
*/
Comment on lines +723 to +724
const escapedCid = cid.replace(/"/g, '\\"');
const eventData =
Comment on lines +17 to +29
function arrayBufferToBase64(
value: ArrayBufferLike,
offset: number,
length: number
): string {
if (length === 0) return '.';
// btoa requires a binary string. Build it from the byte view.
const uint8 = new Uint8Array(value, offset, length);
let binary = '';
for (let i = 0; i < uint8.length; i++) {
binary += String.fromCharCode(uint8[i]!);
}
return btoa(binary);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants