[turbopack-trace-server] optimize loading#93264
Merged
lukesandberg merged 8 commits intocanaryfrom Apr 28, 2026
Merged
Conversation
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced Apr 26, 2026
509a5d1 to
a9cd6d7
Compare
48953a4 to
13fe392
Compare
Contributor
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles
Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📎 Tarball URLCommit: e400faf |
Contributor
Tests PassedCommit: e400faf |
Contributor
Merge activity
|
d57d362 to
50d5373
Compare
sokra
reviewed
Apr 28, 2026
sokra
approved these changes
Apr 28, 2026
Two small wins in self_time_tree.rs: 1. Replace the two-pass min_by_key/max_by_key in distribute_entries with a single fold. Only fires on the first split of a node so the absolute saving is small, but the change is free. 2. Add a fast path to lookup_range_corrected_time that skips the sort-and-sweep when every overlapping interval fully contains the query window. This is the common case for short SelfTime events queried via SpanRef::corrected_self_time / SpanEventRef::corrected_self_time. Pre-reserves the changes vec to avoid amortized RawVec growth on the slow path. Also defensively returns ZERO when there are no overlapping intervals (the original code would have divided by zero, but the calling span's own self-time event is always in the tree so this case isn't reachable in practice). Tests: 4 new unit tests cover no-overlap, single-interval, full-containment fast path, and partial overlap.
Change `SpanEventSelfTime { start, end, ... }` to
`SpanEventSelfTime { start, duration: NonZeroU64, ... }`. The
`NonZeroU64` gives the larger variant a niche, and combined with
`Child`'s existing `NonZeroUsize` the compiler packs the enum without
a separate discriminant byte. Verified by
`const _: () = assert!(size_of == 32)`.
Saves ~8 bytes per event. With ~10 events/span on average (47M spans
giving ~470M events), that's roughly 3.5 GB.
Callers must filter zero-duration self-time events before constructing.
The construction sites in store.rs (`add_self_time` and
`set_total_time`) are updated:
- `add_self_time` now uses `SpanEvent::self_time(start, end)` which
returns `None` for zero/negative duration (early-returns instead of
pushing).
- `set_total_time`'s three internal pushes use `if let Some(...)` to
defensively skip zero-duration events.
`SpanEventSelfTimeRef::end()` now computes `start + duration.get()`
on demand. The redundant zero-duration check in `corrected_self_time()`
is removed since `NonZeroU64` guarantees the invariant.
Tests: 3 new tests for SpanEvent (size, zero-duration filter, ctor).
Replace `Span::args: Vec<(RcStr, RcStr)>` with `pub type SpanArgs = SmallVec<[(RcStr, RcStr); 1]>`. Most spans have 0–1 args (typically just a `name` key for `turbo_tasks::function` spans), and `SmallVec<[T; 1]>` with the workspace's `union` feature is the same 24 bytes as `Vec` while inlining one entry. Net effect: zero-arg spans pay no heap allocation; single-arg spans (the common case) also pay no heap allocation; spans with 2+ args spill to heap as before. Backed by `const _: () = assert!(size_of::<SpanArgs>() == 24)` so any layout regression breaks the build. Cap intentionally pinned at 1, not 2: bumping it would grow `Span` by 16 bytes (~750 MB at 47M spans) for a marginal additional saving.
Change `LazySortedVec`'s backing storage from `Vec<T>` to `SmallVec<[T; 1]>`. ~69% of spans have <=1 event (a single self-time event for leaf spans), so inlining one entry avoids a heap allocation in this common case. `Deref` now returns `&[T]` so callers iterate through the slice rather than `&Vec<T>`. `set_total_time` builds events into a local `Vec<SpanEvent>` and converts via the existing `From<Vec<T>>` impl on assignment.
…ceLock Replace `Span`'s six `OnceLock<u32|u64>` fields (`max_depth`, `total_allocations`, `total_deallocations`, `total_persistent_allocations`, `total_allocation_count`, `total_span_count`) with a single `OnceLock<SpanTotals>` bundling all six values. Computation walks the subtree once on first access and fills every field; subsequent calls — regardless of which getter — hit the cache. Trades a small amount of read-side work (always populating all six fields, even if the caller only wanted one) for a much smaller per-Span lock count. With ~47M spans this saves on the order of hundreds of MB of OnceLock overhead. `SpanRef::max_depth`, `total_allocations`, etc. now read through `SpanRef::totals()`. Invalidation in `Store::invalidate_outdated_spans` collapses the per-field `take()` calls into one `span.totals.take()`.
648f228 to
e400faf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Land a few optimizations to the trace server
SpanEventso it is 32 bytes instead of 40 bytes by triggering anicheoptimizationargsandeventsto be asmallvecwith inline size 1argsit is size <=1 ~31% of the timeeventsit is size <=1 69% of the timeOncelockoverheadstrace.nextjs.orgit is 100%innerOnceLocksfromSpanNameswe can just allocate these all togetherMeasuring with one 10gb trace file I see loading times progress from 75.7s (33G of ram) to 60.5s (19.5G of ram). With loading times hitting >200mb/s occasionally