Skip to content

feat: enforce max queue deliveries in handlers with graceful failure#1344

Draft
pranaygp wants to merge 1 commit intopgp/semantic-world-errorsfrom
pgp/handler-max-deliveries
Draft

feat: enforce max queue deliveries in handlers with graceful failure#1344
pranaygp wants to merge 1 commit intopgp/semantic-world-errorsfrom
pgp/handler-max-deliveries

Conversation

@pranaygp
Copy link
Collaborator

Summary

Removes the VQS maxDeliveries: 64 cap and instead has workflow/step handlers enforce their own max delivery limit with graceful failure. This prevents "phantom stuck" runs where VQS drops the message after 64 retries and the run silently stalls in running status.

Stacked on #1342#1340#1339

Problem

When infrastructure is down (OOMs, network outages, etc.), VQS retries messages at 5s intervals up to maxDeliveries: 64 times. After exhausting retries, VQS drops the message. The run stays in running status forever with no error, no failure event, no recourse — a "phantom stuck" run.

Solution

  1. Remove maxDeliveries from VQS config — allow infinite retries at the queue level
  2. Keep retryAfterSeconds: 5 — VQS-level retry timing (works even after SIGKILL/OOM)
  3. Handlers check metadata.attempt — when delivery count exceeds MAX_QUEUE_DELIVERIES (64), the handler fails the run/step gracefully with MAX_DELIVERIES_EXCEEDED error code
  4. If even failure event creation fails — log a detailed error message explaining the situation and consume the message (no point retrying further)

Workflow handler behavior at max deliveries:

  • Posts run_failed event with errorCode: 'MAX_DELIVERIES_EXCEEDED'
  • If EntityConflictError or RunExpiredError → run already terminal, skip
  • If failure event creation fails → log detailed error, consume message anyway

Step handler behavior at max deliveries:

  • Posts step_failed event
  • Re-queues workflow to handle the failed step
  • Same fallback logging if event creation fails

Local world queue:

  • Removed hardcoded 3-retry cap → now uses 1000 safety limit
  • Matches production VQS behavior (handler enforces the real limit)

Test plan

  • 5 new unit tests for step handler max delivery enforcement
  • All 483 core tests pass
  • All 220 world-local tests pass
  • E2E: persistent failure → failed with MAX_DELIVERIES_EXCEEDED
  • E2E: transient failure → normal completion

🤖 Generated with Claude Code

@changeset-bot
Copy link

changeset-bot bot commented Mar 12, 2026

🦋 Changeset detected

Latest commit: d34edf9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 20 packages
Name Type
@workflow/errors Patch
@workflow/core Patch
@workflow/world-local Patch
@workflow/builders Patch
@workflow/cli Patch
workflow Patch
@workflow/world-postgres Patch
@workflow/world-vercel Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/vitest Patch
@workflow/web-shared Patch
@workflow/world-testing Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/ai Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Contributor

vercel bot commented Mar 12, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
❌ ▲ Vercel Production 547 24 67 638
✅ 💻 Local Development 612 0 84 696
✅ 📦 Local Production 612 0 84 696
✅ 🐘 Local Postgres 612 0 84 696
✅ 🪟 Windows 55 0 3 58
❌ 🌍 Community Worlds 118 56 15 189
❌ 📋 Other 146 1 27 174
Total 2702 81 364 3147

❌ Failed Tests

▲ Vercel Production (24 failed)

astro (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

example (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

express (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

fastify (3 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries
  • AllInOneService.processNumber - static workflow method using sibling static step methods

hono (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

nextjs-turbopack (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

nextjs-webpack (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

nitro (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

nuxt (3 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries

sveltekit (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries

vite (2 failed):

  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling retry behavior FatalError fails immediately without retries
🌍 Community Worlds (56 failed)

mongodb (3 failed):

  • hookWorkflow is not resumable via public webhook endpoint
  • webhookWorkflow
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously

redis (2 failed):

  • hookWorkflow is not resumable via public webhook endpoint
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously

turso (51 failed):

  • addTenWorkflow
  • addTenWorkflow
  • wellKnownAgentWorkflow (.well-known/agent)
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • importedStepOnlyWorkflow
  • hookWorkflow
  • hookWorkflow is not resumable via public webhook endpoint
  • webhookWorkflow
  • sleepingWorkflow
  • parallelSleepWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling retry behavior infrastructure error on run_completed retries via queue (not run_failed)
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument
  • cancelRun - cancelling a running workflow
  • cancelRun via CLI - cancelling a running workflow
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control)
📋 Other (1 failed)

e2e-local-postgres-nest-stable (1 failed):

  • webhookWorkflow

Details by Category

❌ ▲ Vercel Production
App Passed Failed Skipped
❌ astro 49 2 7
❌ example 49 2 7
❌ express 49 2 7
❌ fastify 48 3 7
❌ hono 49 2 7
❌ nextjs-turbopack 54 2 2
❌ nextjs-webpack 54 2 2
❌ nitro 49 2 7
❌ nuxt 48 3 7
❌ sveltekit 49 2 7
❌ vite 49 2 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 49 0 9
✅ express-stable 49 0 9
✅ fastify-stable 49 0 9
✅ hono-stable 49 0 9
✅ nextjs-turbopack-canary 55 0 3
✅ nextjs-turbopack-stable 55 0 3
✅ nextjs-webpack-canary 55 0 3
✅ nextjs-webpack-stable 55 0 3
✅ nitro-stable 49 0 9
✅ nuxt-stable 49 0 9
✅ sveltekit-stable 49 0 9
✅ vite-stable 49 0 9
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 49 0 9
✅ express-stable 49 0 9
✅ fastify-stable 49 0 9
✅ hono-stable 49 0 9
✅ nextjs-turbopack-canary 55 0 3
✅ nextjs-turbopack-stable 55 0 3
✅ nextjs-webpack-canary 55 0 3
✅ nextjs-webpack-stable 55 0 3
✅ nitro-stable 49 0 9
✅ nuxt-stable 49 0 9
✅ sveltekit-stable 49 0 9
✅ vite-stable 49 0 9
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 49 0 9
✅ express-stable 49 0 9
✅ fastify-stable 49 0 9
✅ hono-stable 49 0 9
✅ nextjs-turbopack-canary 55 0 3
✅ nextjs-turbopack-stable 55 0 3
✅ nextjs-webpack-canary 55 0 3
✅ nextjs-webpack-stable 55 0 3
✅ nitro-stable 49 0 9
✅ nuxt-stable 49 0 9
✅ sveltekit-stable 49 0 9
✅ vite-stable 49 0 9
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 55 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 2
❌ mongodb 52 3 3
✅ redis-dev 3 0 2
❌ redis 53 2 3
✅ turso-dev 3 0 2
❌ turso 4 51 3
❌ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 49 0 9
❌ e2e-local-postgres-nest-stable 48 1 9
✅ e2e-local-prod-nest-stable 49 0 9

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: failure
  • Windows: success

Check the workflow run for details.

Copy link
Collaborator Author

pranaygp commented Mar 12, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@vercel vercel bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

SvelteKit package has hardcoded maxDeliveries: 64 on queue triggers, causing VQS to silently drop messages before the handler can gracefully fail runs/steps.

Fix on Vercel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pranaygp pranaygp force-pushed the pgp/handler-max-deliveries branch from 9929b57 to d34edf9 Compare March 13, 2026 01:07
@pranaygp pranaygp force-pushed the pgp/semantic-world-errors branch from 2841381 to 90bd273 Compare March 13, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant