perf(pm): manifest cache & resolver alloc cleanup by elrrrrrrr · Pull Request #2826 · utooland/utoo

elrrrrrrr · 2026-04-25T14:37:27Z

Summary

Third of 4 split PRs from #2818. Independently-motivated allocator + cache hot-path optimisations for the resolver. Each landed during the worker-pool exploration but stands alone — they do not depend on the worker-pool architecture.

Changes (each ~50ms preload savings, cumulative ~200ms)

TLS provider: aws-lc-rs instead of ring (~420ms saved on cold preload TLS handshakes — measured CCS→AppData 78ms→17ms)
DNS per-family rotation: cycle v4 and v6 independently so connection pool spreads evenly across all addresses (matches bun's pcap-observed 4×64 distribution)
Disk-cache bulk-readdir ETag index: lazy HashSet<String> of cached names from one read_dir, restores warm 304 path without per-package try_exists storm
Lazy per-version CoreVersionManifest parse via simd_json::OwnedValue + DashMap memoisation — resolver typically reads 1-3 of ~500 versions per manifest
Arc<FullManifest> in MemoryCache — atomic-bump clone instead of deep HashMap clone (~500k allocs eliminated)
normalize_spec returns Cow<'_, str> — common path now zero-alloc (~5460 allocs eliminated)
Drop versions.keys.clone() on cache-hit path (~360k String allocs eliminated)
OnceMap dedup for concurrent resolve_full_manifest callers
tracing file_filter info+ default — drops format/serialize CPU for ~15-30k hot-path debug events per cold preload (override via UTOO_FILE_LOG=debug)
indicatif progress bar: drop per-package message updates (was ~9000 lock acquisitions per ant-design preload)
HTTP + parse diagnostic infrastructure for #PR4 to wire in

Trait surface change

RegistryClient's default-method futures gain + Send and Self: Sync bounds. Required by spawn use in #PR4 but works equally for single-threaded resolvers. Adds + Sync bound on resolve_package / resolve_registry_dep / process_dependency / preload helpers.

Test plan

cargo fmt + cargo clippy --all-targets -- -D warnings --no-deps clean
cargo test -p utoo-ruborist 164 + 10 doctests pass
cargo test -p utoo-pm 248/249 pass (1 pre-existing flake on test_update_package_binary_fsevents runs green alone)
CI bench-phases auto-comment with p1_resolve delta (depends on ci(pm): pm-e2e-bench unified workflow + phase-isolated bench infrastructure #2824)

Stacking

Base: next
Stacked-on-top: PR4 (perf/preload-worker-pool) targets perf/manifest-cache and adds the worker-pool spawn refactor + Send/Clone/Sync/'static bound propagation.

Context

Full exploration journey + failed-experiments catalog: #2818
Bench infrastructure: #2824

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request implements a series of performance optimizations for the package resolver, focusing on reducing memory allocations, improving concurrency, and optimizing network and disk I/O. Key enhancements include lazy manifest parsing using simd_json, a OnceMap utility to deduplicate concurrent fetches, and a round-robin DNS resolver for better connection distribution. Memory efficiency is improved by utilizing Arc in caches and Cow for string normalization. Review feedback points out a compilation risk from using unstable let_chains, potential data loss due to fire-and-forget background writes, and an opportunity to further optimize DNS rotation allocations.

gemini-code-assist · 2026-04-25T14:40:44Z

+    fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> {
+        if addrs.is_empty() {
+            return Vec::new();
+        }
+        let rotate = |slice: &[SocketAddr]| -> Vec<SocketAddr> {
+            if slice.is_empty() {
+                return Vec::new();
+            }
+            let start = offset % slice.len();
+            slice[start..]
+                .iter()
+                .chain(&slice[..start])
+                .copied()
+                .collect()
+        };
+        let v6: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv6()).copied().collect();
+        let v4: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv4()).copied().collect();
+        let v6_rot = rotate(&v6);
+        let v4_rot = rotate(&v4);
+        // Preserve v6-first ordering if that's what the resolver gave us;
+        // Happy Eyeballs will still prefer v6 when it's reachable.
+        let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true);
+        if v6_first {
+            v6_rot.into_iter().chain(v4_rot).collect()
+        } else {
+            v4_rot.into_iter().chain(v6_rot).collect()
+        }
+    }


The rotate_addrs function performs multiple Vec allocations (filtering, collecting, and rotating) on every DNS resolution, including cache hits. Since this is a hot path and the PR aims for allocation cleanup, this can be optimized to use fewer allocations by leveraging partition and rotate_left.

fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> { if addrs.is_empty() { return Vec::new(); } let (mut v6, mut v4): (Vec<_>, Vec<_>) = addrs.iter().copied().partition(|a| a.is_ipv6()); if !v6.is_empty() { let mid = offset % v6.len(); v6.rotate_left(mid); } if !v4.is_empty() { let mid = offset % v4.len(); v4.rotate_left(mid); } let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true); if v6_first { v6.into_iter().chain(v4).collect() } else { v4.into_iter().chain(v6).collect() } }

gemini-code-assist · 2026-04-25T14:40:44Z

+        if let Some(entry) = self.map.get(&key)
+            && let Value::Done(result) = entry.value()
+        {
+            return Some(Arc::clone(result));
+        }


The use of let chains (e.g., if let ... && let ...) is an unstable Rust feature (let_chains). Unless the project is explicitly using a nightly compiler, this will cause compilation errors on stable Rust. It is recommended to use nested if let blocks instead. This also applies to lines 235-239 and 245-249.

Suggested change

if let Some(entry) = self.map.get(&key)

&& let Value::Done(result) = entry.value()

{

return Some(Arc::clone(result));

}

if let Some(entry) = self.map.get(&key) {

if let Value::Done(result) = entry.value() {

return Some(Arc::clone(result));

}

}

gemini-code-assist · 2026-04-25T14:40:44Z

+    /// tuning showed the previous inline `.await` + `serde_json::to_string_pretty`
+    /// burned ~1–3 ms per call on the hot path, stalling the main preload
+    /// task and causing the 24..62 active-stream dip observed on CI.
+    pub fn set_versions_to_disk(&self, name: &str, info: &VersionsInfo) {


Using tokio::spawn for fire-and-forget disk writes in a CLI tool can lead to data loss or cache corruption if the process exits before the background tasks complete. Since there is no mechanism to await these tasks during shutdown, the disk cache might not be reliably updated. Consider tracking these tasks or providing a flush mechanism to ensure writes complete before the program terminates.

The headline architectural change of #2818. ruborist's preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (same reqwest stack, no resolver — see #2824) hit 92 at the same cap. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` now `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead; 1-5ms CPU per manifest is acceptable on async worker. - **Workspace package.json parallel reads** — `find_workspaces_from_pkg` switched from sequential `for path in matched_paths { read }` loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivered the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Tests `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` since worker-pool needs a spawn-able runtime; ruborist's dev-dependencies on `tokio` add the `rt-multi-thread` feature. 164 ruborist + 10 doctests + 248/249 utoo-pm pass (1 pre-existing flake on `test_update_package_binary_fsevents`, runs green alone). ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The headline architectural change of #2818 — preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (#2824) hit 92 on the same reqwest stack. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead. - **Workspace package.json parallel reads** — switch the per-pattern `for path in matched_paths` serial loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivers the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Tests: 164 ruborist + 10 doctests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Compares this PR's utoo against next-branch HEAD (the merged baseline) instead of just utoo-npm (latest published, can be days/weeks behind). The utoo-next column isolates THIS PR's perf delta from any other unmerged-since-publish work. Two new build jobs (build-next-{linux,mac-arm64}) checkout origin/next and build utoo from there in parallel with the main builds. Bench phases pick up both artifacts via the new setup-utoo-next-baseline composite action and pass utoo-next through PM_LIST. Build jobs gate on the same `benchmark` label / dispatch trigger as bench-phases — they only fire when bench-phases will actually run. Bench script (bench/pm-bench-phases.sh) gets parallel utoo-next support: UTOO_NEXT_BIN env, UTOO_NEXT_CACHE, and case statements mirroring the existing utoo-npm pattern across install_cmd, resolve_cmd, write_prepare, capture_footprint, seed_for_phase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bundle of independently-motivated allocator + cache hot-path optimisations from the parent perf branch (#2818). Each landed during the worker-pool exploration but doesn't depend on the worker-pool architecture itself — they stand alone as straightforward perf wins for the resolver. ## TLS provider — `aws-lc-rs` instead of `ring` `reqwest` 0.12's default `rustls-tls-native-roots` feature pins `ring` via Cargo's feature unification. Switch to `rustls-tls-native-roots-no-provider`, build our own `rustls::ClientConfig` with the `aws_lc_rs` provider, pass via `Client::use_preconfigured_tls`. CI measurement (4-core ubuntu vs npmjs.org): ring's per-handshake CCS→AppData was 78 ms p50 / 154 ms max, all 128 parallel handshakes serialising across 4 cores. aws-lc-rs (BoringSSL primitives) is ~3× faster on x86_64. Saved ~420 ms preload on cold ant-design. ## DNS — per-family rotation `getaddrinfo` typically returns 10 v6 + 12 v4 for npmjs.org. A flat rotation across the joined list meant offsets 0..10 all started inside the v6 range; on hosts where v6 routing fails (GitHub Actions runners), every connection fell through to the *same* first-reachable v4. Rotate per-family so v4 conns cycle across all v4 addresses (and v6 over v6) — observed pcap on bun shows the same 4×64 distribution we now produce. ## Disk-cache bulk-readdir ETag index `PackageCache` lazy-builds a `HashSet<String>` of names with existing disk cache entries from a single `read_dir(cache_dir)` + per-`@scope` recurse. `get_versions_from_disk` and `get_version_manifest_from_disk` short-circuit via the index. Restores the warm-run 304 path that was temporarily removed in 46cb803 (per-package `try_exists` was 16 ms avg on the cold-run critical path; now zero). ## Lazy per-version `CoreVersionManifest` via `simd_json::OwnedValue` `Versions` now stores `keys: Vec<String>` (ordered version list) + `trees: HashMap<String, Arc<simd_json::OwnedValue>>` (pre-parsed JSON subtrees). Strongly-typed `CoreVersionManifest` is materialised on demand via `CoreVersionManifest::deserialize(tree.as_ref())` — zero-copy through `simd_json::OwnedValue`'s `Deserializer` impl, memoised in a `DashMap`. Resolver typically reads 1-3 of the ~500 versions per manifest; previous design built every one eagerly. ## `Arc<FullManifest>` in `MemoryCache` Cache previously returned `FullManifest` by value, deep-cloning the per-version HashMap (100-500 entries × String key clone + Arc bump per cache hit) on the resolver hot path. ~2730 cache hits during cold preload × ~200-entry HashMap clone = ~500k allocations on shared resolver threads, contending the allocator. Wrap in `Arc<FullManifest>`; cache hit becomes one atomic bump. ## `normalize_spec` returns `Cow<'a, str>` Was unconditionally allocating `(String, String)` even for the ~99 % of deps with no `npm:` / `workspace:` prefix. ~5460 String allocations per ant-design preload, all on resolver hot path. Common path now returns `Cow::Borrowed`. ## Drop `versions.keys.clone()` from cache-hit path `resolve_package`'s full-manifest cache-hit branch was cloning the entire `versions.keys: Vec<String>` (~200 entries) just to pass `&[String]` to `resolve_target_version`. Borrow directly via Arc auto-deref. ~360k String allocs eliminated (~1800 cache hits × ~200 entries). ## OnceMap dedup New `crate::util::oncemap` module: `DashMap` + `tokio::sync::Notify` coalescer for concurrent `resolve_full_manifest` callers of the same name. First caller fetches the network; others wait on the shared `Notify` and read the cached `Arc<V>`. Replaces the prior per-name `tokio::sync::Mutex<()>` gate that serialised the hot dispatch path. ## tracing file_filter info+ default File-layer log filter dropped from `utoo=debug` to `utoo=info`. Hot-path `tracing::debug!()` calls (cache hits, BFS dispatch, preload events) emit ~5-10 events per resolved manifest. With 2730+ manifests during cold preload that's 15-30k events that — even routed through the non_blocking appender's channel — pay format/serialise CPU on the resolving thread before the channel send. Override via `UTOO_FILE_LOG=debug` for diagnostics. ## indicatif progress bar — drop per-package message updates `PreloadFetching` and `PreloadProgress` used to call `format!("fetching/resolved {}", name)` + `PROGRESS_BAR.set_message()` per event. With ~9000 such calls per ant-design preload and an indicatif-internal `Mutex` per call, this serialised the main loop's fill-and-drain rate. The user can't visually parse 5460 message swaps in 3 seconds anyway. Counter still ticks via `PROGRESS_BAR.inc(1)`. ## HTTP + parse diagnostic infrastructure (used by PR4) `service/http.rs` ships `start_http_trace` / `finish_http_trace` + `start_parse_trace` / `finish_parse_trace` plus `record_http_interval` + `record_parse_interval` callbacks. `#[allow(dead_code)]` on the start/finish for now — the preload worker-pool refactor in the next PR (#TBD) wires them in. Also bumps the `+ Sync` bound on `RegistryClient` callers in `builder.rs` / `preload.rs` / `resolver/registry.rs` — required because the trait's default-method futures gained `+ Send` (needed downstream by tokio::spawn, but already correct for single-threaded resolvers too). Tests: 164 ruborist + 248/249 utoo-pm pass (1 pre-existing flake on `test_update_package_binary_fsevents` when run in parallel, passes alone). Stacks: PR4 (preload worker-pool architecture) targets this branch and adds the bound propagation + spawn refactor on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two adjustments needed for the wasm target after introducing aws-lc-rs and Send/Sync trait bounds: 1. Move `rustls` (with `aws-lc-rs` feature) and `rustls-native-certs` under `[target.'cfg(not(target_arch = "wasm32"))'.dependencies]`. `aws-lc-sys` builds BoringSSL via `cc` and doesn't support the `wasm32-unknown-unknown` target (no `stdlib.h` etc). The wasm reqwest path uses the browser fetch API and ignores rustls. 2. Add `MaybeSend` / `MaybeSync` shim traits in `util::maybe_send`: on native they expand to `Send` / `Sync`; on wasm32 they are vacuous (impl for every type). wasm-bindgen's `JsFuture` is `!Send` so the trait surface had to either drop the bound on wasm or use a conditional shim. Replace `+ Send` and `Self: Sync` in the `RegistryClient` trait + caller bounds in `builder.rs` / `preload.rs` / `resolver/registry.rs` with `+ MaybeSend` / `Self: MaybeSync`. 3. cfg-gate `service/cache.rs` `tokio::spawn` for fire-and-forget disk writes — wasm uses `wasm_bindgen_futures::spawn_local` instead since the futures are `!Send`. 4. cfg-gate the `OnceMap` coalescer in `service/registry.rs` — wasm runs single-threaded so coalescing concurrent fetches is a no-op anyway; call the network path directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The headline architectural change of #2818 — preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (#2824) hit 92 on the same reqwest stack. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead. - **Workspace package.json parallel reads** — switch the per-pattern `for path in matched_paths` serial loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivers the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Tests: 164 ruborist + 10 doctests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-27T04:22:31Z

📊 pm-bench-phases · `cf7889e` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-next (next-branch baseline) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	8.88s	0.20s	10.08s	10.11s	735M	325.9K
utoo-next	9.86s	0.11s	11.50s	13.05s	1.27G	154.5K
utoo-npm	9.90s	0.24s	11.46s	13.08s	1.30G	157.2K
utoo	9.24s	0.70s	10.74s	13.06s	2.28G	268.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.1K	17.4K	1.17G	6M	1.85G	1.73G	1M
utoo-next	165.6K	151.6K	1.15G	4M	1.70G	1.69G	2M
utoo-npm	165.0K	143.6K	1.15G	4M	1.70G	1.69G	2M
utoo	152.5K	97.1K	1.13G	5M	1.70G	1.69G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.98s	0.05s	3.90s	1.05s	496M	168.3K
utoo-next	5.45s	0.41s	6.04s	1.13s	430M	75.0K
utoo-npm	5.16s	0.08s	6.00s	1.15s	431M	74.2K
utoo	5.88s	2.39s	4.53s	1.95s	1.37G	169.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8.1K	4.6K	200M	3M	104M	-	1M
utoo-next	66.3K	2.9K	204M	2M	9M	5M	2M
utoo-npm	65.4K	2.9K	201M	2M	9M	5M	2M
utoo	76.9K	7.3K	196M	3M	7M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.86s	0.41s	6.20s	9.88s	607M	197.9K
utoo-next	6.89s	1.23s	5.51s	11.11s	754M	117.6K
utoo-npm	7.03s	1.16s	5.59s	11.16s	856M	120.5K
utoo	6.89s	0.72s	5.46s	11.31s	952M	119.8K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	4.5K	7.0K	1004M	4M	1.75G	1.75G	1M
utoo-next	104.1K	64.7K	975M	2M	1.69G	1.69G	2M
utoo-npm	106.6K	69.9K	975M	3M	1.69G	1.69G	2M
utoo	116.2K	86.7K	975M	3M	1.69G	1.69G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.16s	0.07s	0.22s	2.30s	137M	32.1K
utoo-next	2.48s	0.06s	0.62s	3.89s	81M	19.0K
utoo-npm	2.46s	0.02s	0.64s	3.89s	82M	19.0K
utoo	2.32s	0.05s	0.55s	3.84s	82M	19.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	245	30	7M	24K	1.90G	1.71G	1M
utoo-next	46.9K	19.7K	16K	27K	1.69G	1.69G	2M
utoo-npm	48.8K	21.5K	16K	15K	1.69G	1.69G	2M
utoo	49.6K	21.1K	15K	10K	1.70G	1.69G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	42.63s	8.15s	10.05s	10.61s	551M	364.2K
utoo-next	81.74s	51.23s	8.50s	14.40s	879M	118.7K
utoo-npm	24.07s	7.92s	8.33s	13.81s	882M	118.2K
utoo	142.78s	117.12s	8.00s	13.86s	792M	105.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	131.2K	6.1K	1.13G	16M	1.85G	1.74G	2M
utoo-next	244.9K	126.8K	991M	10M	1.69G	1.69G	2M
utoo-npm	221.1K	123.8K	990M	8M	1.69G	1.69G	2M
utoo	238.7K	87.6K	1019M	12M	1.69G	1.69G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.03s	0.32s	3.85s	1.17s	592M	195.9K
utoo-next	5.47s	0.28s	2.09s	0.56s	75M	15.8K
utoo-npm	3.34s	0.12s	1.96s	0.57s	75M	16.1K
utoo	9.29s	12.21s	1.28s	0.23s	78M	17.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8.4K	5.2K	151M	3M	106M	-	2M
utoo-next	47.5K	575	13M	2M	-	4M	2M
utoo-npm	43.9K	1.1K	13M	2M	-	4M	2M
utoo	24.6K	67	16M	2M	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	22.69s	5.05s	6.12s	9.25s	244M	101.8K
utoo-next	44.32s	38.85s	6.42s	12.93s	644M	100.3K
utoo-npm	72.66s	20.68s	6.60s	13.60s	578M	86.9K
utoo	70.07s	19.12s	6.42s	13.33s	673M	91.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	75.7K	3.5K	996M	10M	1.71G	1.71G	2M
utoo-next	199.2K	100.9K	1005M	8M	1.69G	1.69G	2M
utoo-npm	227.3K	92.6K	979M	10M	1.69G	1.69G	2M
utoo	225.8K	79.6K	990M	11M	1.69G	1.69G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.18s	0.08s	0.20s	2.26s	135M	30.9K
utoo-next	2.41s	0.22s	0.63s	3.94s	82M	19.1K
utoo-npm	2.46s	0.04s	0.63s	3.93s	82M	19.2K
utoo	2.35s	0.06s	0.58s	3.84s	83M	19.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	643	26	3M	50K	1.84G	1.74G	2M
utoo-next	46.9K	20.4K	47K	38K	1.69G	1.69G	2M
utoo-npm	48.5K	22.3K	43K	13K	1.69G	1.69G	2M
utoo	50.0K	21.6K	52K	17K	1.69G	1.69G	2M

github-actions · 2026-04-27T05:03:41Z

📊 pm-bench-phases · `cf7889e` · mac (`macos-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-next (next-branch baseline) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	18.58s	2.53s	6.48s	18.91s	742M	47.9K
utoo-next	23.68s	6.53s	10.98s	25.76s	1.17G	111.5K
utoo-npm	16.68s	1.54s	8.33s	17.22s	1.02G	101.0K
utoo	29.78s	1.53s	13.70s	34.22s	1.98G	172.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.8K	143.4K	-	-	1.79G	1.91G	1M
utoo-next	13.1K	359.9K	-	-	1.64G	1.84G	2M
utoo-npm	12.7K	361.7K	-	-	1.64G	1.88G	2M
utoo	10.9K	341.1K	-	-	1.64G	1.84G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.62s	0.11s	2.99s	1.39s	476M	30.9K
utoo-next	9.47s	0.47s	7.13s	4.93s	542M	36.5K
utoo-npm	6.94s	0.62s	5.38s	3.61s	545M	37.1K
utoo	7.65s	0.90s	5.76s	4.72s	1.37G	95.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	27	23.7K	-	-	110M	-	1M
utoo-next	20	70.8K	-	-	28M	5M	2M
utoo-npm	11	72.0K	-	-	28M	5M	2M
utoo	38	82.2K	-	-	27M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	19.22s	2.62s	3.73s	19.20s	541M	35.2K
utoo-next	13.59s	2.83s	3.62s	14.83s	747M	75.4K
utoo-npm	14.29s	3.45s	3.51s	14.16s	810M	77.0K
utoo	15.56s	5.52s	4.36s	19.87s	754M	74.8K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.7K	137.1K	-	-	1.70G	1.94G	1M
utoo-next	1.5K	231.2K	-	-	1.61G	1.87G	2M
utoo-npm	1.4K	230.5K	-	-	1.61G	1.87G	2M
utoo	1.4K	226.2K	-	-	1.61G	1.87G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	4.94s	0.49s	0.10s	2.12s	53M	4.0K
utoo-next	5.78s	1.03s	0.73s	4.03s	94M	6.8K
utoo-npm	4.20s	0.58s	0.51s	2.79s	89M	6.7K
utoo	5.97s	0.52s	0.64s	4.23s	89M	6.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.8K	843	-	-	1.87G	1.92G	1M
utoo-next	12.4K	74.7K	-	-	1.61G	1.86G	2M
utoo-npm	12.5K	75.8K	-	-	1.61G	1.86G	2M
utoo	12.9K	71.5K	-	-	1.63G	1.86G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	29.56s	6.65s	6.63s	19.34s	615M	39.8K
utoo-next	28.27s	3.54s	7.87s	24.40s	693M	76.6K
utoo-npm	25.46s	1.06s	8.09s	25.60s	813M	77.0K
utoo	30.32s	11.86s	6.67s	20.11s	697M	75.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	14.9K	146.5K	-	-	1.77G	1.91G	2M
utoo-next	4.3K	384.7K	-	-	1.61G	1.84G	2M
utoo-npm	994	370.2K	-	-	1.61G	1.87G	2M
utoo	4.8K	376.2K	-	-	1.61G	1.84G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.50s	0.07s	3.05s	1.73s	601M	39.1K
utoo-next	11.70s	13.43s	1.56s	0.86s	79M	5.8K
utoo-npm	4.91s	0.51s	1.41s	0.81s	79M	5.7K
utoo	10.15s	14.81s	1.26s	0.43s	84M	6.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	20	22.2K	-	-	111M	-	2M
utoo-next	5	46.8K	-	-	-	4M	2M
utoo-npm	6	45.3K	-	-	-	4M	2M
utoo	25	25.4K	-	-	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	20.26s	2.51s	4.03s	19.09s	296M	19.5K
utoo-next	26.42s	2.12s	4.72s	16.88s	598M	73.0K
utoo-npm	28.27s	2.14s	4.59s	16.37s	674M	74.9K
utoo	27.41s	1.57s	4.99s	18.54s	647M	74.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	2.1K	146.6K	-	-	1.65G	1.92G	2M
utoo-next	1.6K	345.5K	-	-	1.61G	1.83G	2M
utoo-npm	1.6K	366.2K	-	-	1.61G	1.83G	2M
utoo	1.6K	339.1K	-	-	1.61G	1.83G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	4.56s	0.06s	0.08s	1.94s	44M	3.4K
utoo-next	3.80s	0.51s	0.52s	2.68s	95M	7.1K
utoo-npm	4.56s	0.65s	0.57s	3.01s	90M	6.8K
utoo	4.12s	0.78s	0.46s	3.01s	95M	7.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	13.5K	634	-	-	1.78G	1.91G	2M
utoo-next	12.2K	72.3K	-	-	1.61G	1.83G	2M
utoo-npm	12.2K	73.1K	-	-	1.61G	1.83G	2M
utoo	12.3K	81.2K	-	-	1.61G	1.83G	2M

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

elrrrrrrr mentioned this pull request Apr 25, 2026

perf(pm): preload worker-pool replaces FuturesUnordered #2827

Draft

5 tasks

elrrrrrrr marked this pull request as draft April 25, 2026 15:22

elrrrrrrr force-pushed the perf/manifest-cache branch from 3be6b63 to 2831262 Compare April 25, 2026 15:30

elrrrrrrr changed the base branch from next to perf/bench-infra April 25, 2026 15:30

elrrrrrrr changed the title ~~perf(ruborist): manifest cache & resolver alloc cleanup~~ perf(pm): manifest cache & resolver alloc cleanup Apr 25, 2026

elrrrrrrr mentioned this pull request Apr 26, 2026

ci(pm): pm-e2e-bench unified workflow + phase-isolated bench infrastructure #2824

Merged

5 tasks

Base automatically changed from perf/bench-infra to next April 27, 2026 02:14

elrrrrrrr and others added 4 commits April 27, 2026 11:30

chore(ruborist): tombi 0.7.32 column-align

2d5befb

elrrrrrrr force-pushed the perf/manifest-cache branch from 2831262 to 2d5befb Compare April 27, 2026 03:31

elrrrrrrr added the benchmark Run pm-bench on PR label Apr 27, 2026

elrrrrrrr mentioned this pull request Apr 27, 2026

perf(pm): preload worker pool replaces FuturesUnordered #2830

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): manifest cache & resolver alloc cleanup#2826

perf(pm): manifest cache & resolver alloc cleanup#2826
elrrrrrrr wants to merge 4 commits intonextfrom
perf/manifest-cache

elrrrrrrr commented Apr 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented Apr 25, 2026

Summary

Changes (each ~50ms preload savings, cumulative ~200ms)

Trait surface change

Test plan

Stacking

Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · cf7889e · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · cf7889e · mac (macos-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 pm-bench-phases · `cf7889e` · linux (`ubuntu-latest`)

📊 pm-bench-phases · `cf7889e` · mac (`macos-latest`)