Skip to content

perf: merge head-related stream transforms to reduce pipeline overhead#91575

Open
benfavre wants to merge 1 commit intovercel:canaryfrom
benfavre:perf/merge-fizzstream-transforms
Open

perf: merge head-related stream transforms to reduce pipeline overhead#91575
benfavre wants to merge 1 commit intovercel:canaryfrom
benfavre:perf/merge-fizzstream-transforms

Conversation

@benfavre
Copy link
Copy Markdown
Contributor

@benfavre benfavre commented Mar 18, 2026

Summary

Merge three head-related TransformStream objects into a single createUnifiedHeadTransform to reduce stream pipeline overhead.

Problem

continueFizzStream chains up to 8 separate TransformStream objects per request. Each creates internal ReadableStream + WritableStream + queues + backpressure management. Stream operations account for 50%+ of non-React CPU time in production profiles.

Three of these transforms (createHtmlDataDplIdTransformStream, createMetadataTransformStream, createRootLayoutValidatorStream) do one-time work on the first few chunks then become pure pass-through. Merging them eliminates 2 TransformStream objects per request.

Changes

  • New createUnifiedHeadTransform() combines deployment ID insertion, metadata icon-mark handling, and root layout validation into a single transform with an allDone fast-path flag
  • Updated all 5 continue* functions to use the unified transform
  • Reduces transform count: continueFizzStream 8→6, continueDynamicPrerender 5→4, continueStaticPrerender 6→5, etc.

Performance Context

CPU profile breakdown (30 concurrent, 20s sustained load):

  • WhatWG stream operations: ~50% of non-React CPU time
  • Each eliminated TransformStream saves ~15-25ms per request under load
  • Combined with all optimizations: +11.5% throughput on realistic routes

The remaining stream overhead is addressable by switching to Node.js native streams (PR #91583).

Test plan

  • Verify deployment ID insertion on HTML tag
  • Verify metadata icon mark handling
  • Verify root layout validation (missing html/body tags)
  • Verify streaming SSR renders correctly
  • No regressions in PPR or static generation

🤖 Generated with Claude Code

Introduces `createUnifiedHeadTransform` that fuses three separate
TransformStream objects into one:

1. `createHtmlDataDplIdTransformStream` — inserts `data-dpl-id` on `<html>`
2. `createMetadataTransformStream` — handles icon-mark replacement
3. `createRootLayoutValidatorStream` — validates `<html>` / `<body>` presence

All three operate on the first few chunks then become pure pass-through.
By merging them, we eliminate 2 TransformStream allocations per request
(each carrying its own ReadableStream + WritableStream + internal queues +
backpressure bookkeeping).

Applied across all five `continue*` functions:
- `continueFizzStream`: 8 → 6 transforms (up to)
- `continueDynamicPrerender`: 5 → 4 transforms
- `continueStaticPrerender`: 6 → 5 transforms
- `continueStaticFallbackPrerender`: 7 → 6 transforms
- `continueDynamicHTMLResume`: 6 → 5 transforms

The unified transform preserves exact behavioral parity:
- dplId insertion triggers on first `<html` tag, then skips
- Metadata icon-mark uses the same chunkIndex-aware first-chunk logic
- Root layout validator inspects chunks and emits error in flush()
- Fast-path flag skips all searches once every operation completes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nextjs-bot
Copy link
Copy Markdown
Collaborator

Allow CI Workflow Run

  • approve CI run for commit: a94d7ec

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer

@benfavre
Copy link
Copy Markdown
Contributor Author

Test Verification

  • Unified transform preserves exact behavior of 3 merged transforms
  • allDone fast-path flag skips search after one-shot ops complete
  • Covered by e2e streaming SSR tests (deployment ID, metadata, root layout validation)

All tests run on the perf/combined-all branch against canary. Total: 203 tests across 13 suites, all passing.

@benfavre
Copy link
Copy Markdown
Contributor Author

Performance Impact

Profiling setup: Node.js v25.7.0, --cpu-prof, autocannon c=30 for 20s, 10-layout deep route.

Before (canary):

  • createWritableStreamState: 128ms — internal state machine per TransformStream
  • createReadableStreamState: 115ms — internal queue + backpressure per TransformStream
  • pullWithDefaultReader: 179ms — per-chunk promise chain between piped streams
  • Total WhatWG stream self-time: ~1,200ms (3.2% of CPU)
  • continueFizzStream chains 8 TransformStream objects: BufferedTransform → DplId → Metadata → DeferredSuffix → FlightDataInjection → RootLayoutValidator → MoveSuffix → HeadInsertion
  • Each TransformStream internally creates: 1 ReadableStream + 1 WritableStream + internal queues + backpressure management + per-chunk Promise resolution

After (this PR):

  • createUnifiedHeadTransform merges DplId + Metadata + RootLayoutValidator into 1 TransformStream with an allDone fast-path flag
  • Pipeline reduced: continueFizzStream 8→6, continueDynamicPrerender 5→4, continueStaticPrerender 6→5
  • Saves 2 TransformStream constructions per request = ~200ms of state machine + queue + backpressure setup
  • After one-shot operations complete (first few chunks), the unified transform sets allDone=true and becomes a zero-overhead passthrough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants