fix(world-vercel): cancel v4 event frame stream on early exit to release undici connections#2547
Conversation
decodeFrames never cancelled response.body when a consumer stopped reading before EOF — getEventV4 returns after the first frame and consumeListFrameStream breaks at the sentinel — so the undici connection stayed checked out of the pool (8 per origin) instead of being released, causing stalls/timeouts on the event-read path. Cancel the source in a try/finally (and cancel the reader in readerToIterator) via a shared closeQuietly helper. Add regression tests for both decode branches and a getEventV4 HTTP round-trip through undici MockAgent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 8fc2f47 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)nuxt (1 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
|
No backport to This fix targets the To override, re-run the Backport to stable workflow manually via |
Summary
decodeFrames(the v4 event frame-stream reader) never cancelled its underlyingresponse.bodywhen a consumer stopped reading before EOF. With Node'sfetch/undici, a response body that is neither fully drained nor cancelled keeps its socket checked out of the connection pool — so every such read leaks a connection until keepalive timeout/GC.Two live read paths stop early by design and hit this:
getEventV4returns after the first frame (single-frame response)consumeListFrameStream(getWorkflowRunEventsV4/getEventsByCorrelationIdV4)breaks at the{_end:1}sentinelThis is the v4 wire format that
events.tsuses "throughout" (it replaced the v2/v3 readers), so the leak is on the current, hot replay path — not legacy code.Why it matters
In production every event read goes to a single origin —
vercel-workflow.com(direct) orapi.vercel.com(proxy) — and the dispatcher caps that origin atconnections: 8(HTTP/1.1). A leaked body holds its connection "in use," so it isn't returned to the pool. Pin ~8 of them and the pool is saturated: subsequent reads queue until a connection frees (the dropped response is GC-reclaimed, or the 60s request timeout fires). The user-visible result is intermittent stalls and timeouts on the event-read path; because each pinned connection ends up single-use instead of kept-alive, connection churn rises too.Fix
packages/world-vercel/src/frames.ts:decodeFramesread loop intry/finallyand callchunks.return?.()on exit, which cancels the source stream (releasing the socket) on early break/return, normal completion, and error paths alike. No-op once drained.readerToIteratornow cancels its reader in afinally(the non-async-iterable fallback branch).closeQuietlyhelper that swallows only cleanup errors, so it can't mask the original outcome.bodyLen > 0 / elsebranch into one path (slice(0, 0)already yields an empty body) — behavior identical.Returned values are unaffected:
frame.bodyis an ownedslicecopy, and the cancel runs after the result is assembled, so cancelling the remaining body can't corrupt what callers receive.Test plan
packages/world-vercel/src/frames.test.ts: new tests asserting the underlying stream is cancelled when the consumer breaks early — for both the async-iterable branch and thegetReader/readerToIteratorbranch — plus a guard that full consumption still decodes every frame. ThespyStreamhelper models a kept-alive socket (highWaterMark: 0, never signals EOF); a toy stream that auto-closes would makecancel()a no-op and give a false pass.packages/world-vercel/src/events-v4.test.ts: newgetEventV4HTTP round-trip via undiciMockAgent(this path previously had no test). Includes a trailing frame the reader must never read, proving the earlyreturn+ cancel returns correct data and doesn't hang.pnpm test(world-vercel): 167 passed.pnpm typecheckandbiome check: clean.Scope
Targets the
decodeFramesearly-exit leak only. A few lower-frequency error/edge paths (streamer.tsget()/list()throwing without draining; the content-type / missing-header throws inevents-v4.ts) are the same class of issue and can be a follow-up.🤖 Generated with Claude Code