perf(pm): manifest cache & resolver alloc cleanup#2826
perf(pm): manifest cache & resolver alloc cleanup#2826
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a series of performance optimizations for the package resolver, focusing on reducing memory allocations, improving concurrency, and optimizing network and disk I/O. Key enhancements include lazy manifest parsing using simd_json, a OnceMap utility to deduplicate concurrent fetches, and a round-robin DNS resolver for better connection distribution. Memory efficiency is improved by utilizing Arc in caches and Cow for string normalization. Review feedback points out a compilation risk from using unstable let_chains, potential data loss due to fire-and-forget background writes, and an opportunity to further optimize DNS rotation allocations.
| fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> { | ||
| if addrs.is_empty() { | ||
| return Vec::new(); | ||
| } | ||
| let rotate = |slice: &[SocketAddr]| -> Vec<SocketAddr> { | ||
| if slice.is_empty() { | ||
| return Vec::new(); | ||
| } | ||
| let start = offset % slice.len(); | ||
| slice[start..] | ||
| .iter() | ||
| .chain(&slice[..start]) | ||
| .copied() | ||
| .collect() | ||
| }; | ||
| let v6: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv6()).copied().collect(); | ||
| let v4: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv4()).copied().collect(); | ||
| let v6_rot = rotate(&v6); | ||
| let v4_rot = rotate(&v4); | ||
| // Preserve v6-first ordering if that's what the resolver gave us; | ||
| // Happy Eyeballs will still prefer v6 when it's reachable. | ||
| let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true); | ||
| if v6_first { | ||
| v6_rot.into_iter().chain(v4_rot).collect() | ||
| } else { | ||
| v4_rot.into_iter().chain(v6_rot).collect() | ||
| } | ||
| } |
There was a problem hiding this comment.
The rotate_addrs function performs multiple Vec allocations (filtering, collecting, and rotating) on every DNS resolution, including cache hits. Since this is a hot path and the PR aims for allocation cleanup, this can be optimized to use fewer allocations by leveraging partition and rotate_left.
fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> {
if addrs.is_empty() {
return Vec::new();
}
let (mut v6, mut v4): (Vec<_>, Vec<_>) = addrs.iter().copied().partition(|a| a.is_ipv6());
if !v6.is_empty() {
let mid = offset % v6.len();
v6.rotate_left(mid);
}
if !v4.is_empty() {
let mid = offset % v4.len();
v4.rotate_left(mid);
}
let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true);
if v6_first {
v6.into_iter().chain(v4).collect()
} else {
v4.into_iter().chain(v6).collect()
}
}| if let Some(entry) = self.map.get(&key) | ||
| && let Value::Done(result) = entry.value() | ||
| { | ||
| return Some(Arc::clone(result)); | ||
| } |
There was a problem hiding this comment.
The use of let chains (e.g., if let ... && let ...) is an unstable Rust feature (let_chains). Unless the project is explicitly using a nightly compiler, this will cause compilation errors on stable Rust. It is recommended to use nested if let blocks instead. This also applies to lines 235-239 and 245-249.
| if let Some(entry) = self.map.get(&key) | |
| && let Value::Done(result) = entry.value() | |
| { | |
| return Some(Arc::clone(result)); | |
| } | |
| if let Some(entry) = self.map.get(&key) { | |
| if let Value::Done(result) = entry.value() { | |
| return Some(Arc::clone(result)); | |
| } | |
| } |
| /// tuning showed the previous inline `.await` + `serde_json::to_string_pretty` | ||
| /// burned ~1–3 ms per call on the hot path, stalling the main preload | ||
| /// task and causing the 24..62 active-stream dip observed on CI. | ||
| pub fn set_versions_to_disk(&self, name: &str, info: &VersionsInfo) { |
There was a problem hiding this comment.
Using tokio::spawn for fire-and-forget disk writes in a CLI tool can lead to data loss or cache corruption if the process exits before the background tasks complete. Since there is no mechanism to await these tasks during shutdown, the disk cache might not be reliably updated. Consider tracking these tasks or providing a flush mechanism to ensure writes complete before the program terminates.
The headline architectural change of #2818. ruborist's preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (same reqwest stack, no resolver — see #2824) hit 92 at the same cap. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` now `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead; 1-5ms CPU per manifest is acceptable on async worker. - **Workspace package.json parallel reads** — `find_workspaces_from_pkg` switched from sequential `for path in matched_paths { read }` loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivered the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Tests `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` since worker-pool needs a spawn-able runtime; ruborist's dev-dependencies on `tokio` add the `rt-multi-thread` feature. 164 ruborist + 10 doctests + 248/249 utoo-pm pass (1 pre-existing flake on `test_update_package_binary_fsevents`, runs green alone). ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3be6b63 to
2831262
Compare
The headline architectural change of #2818 — preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (#2824) hit 92 on the same reqwest stack. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead. - **Workspace package.json parallel reads** — switch the per-pattern `for path in matched_paths` serial loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivers the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Tests: 164 ruborist + 10 doctests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compares this PR's utoo against next-branch HEAD (the merged baseline)
instead of just utoo-npm (latest published, can be days/weeks behind).
The utoo-next column isolates THIS PR's perf delta from any other
unmerged-since-publish work.
Two new build jobs (build-next-{linux,mac-arm64}) checkout origin/next
and build utoo from there in parallel with the main builds. Bench
phases pick up both artifacts via the new setup-utoo-next-baseline
composite action and pass utoo-next through PM_LIST.
Build jobs gate on the same `benchmark` label / dispatch trigger as
bench-phases — they only fire when bench-phases will actually run.
Bench script (bench/pm-bench-phases.sh) gets parallel utoo-next
support: UTOO_NEXT_BIN env, UTOO_NEXT_CACHE, and case statements
mirroring the existing utoo-npm pattern across install_cmd,
resolve_cmd, write_prepare, capture_footprint, seed_for_phase.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundle of independently-motivated allocator + cache hot-path optimisations from the parent perf branch (#2818). Each landed during the worker-pool exploration but doesn't depend on the worker-pool architecture itself — they stand alone as straightforward perf wins for the resolver. ## TLS provider — `aws-lc-rs` instead of `ring` `reqwest` 0.12's default `rustls-tls-native-roots` feature pins `ring` via Cargo's feature unification. Switch to `rustls-tls-native-roots-no-provider`, build our own `rustls::ClientConfig` with the `aws_lc_rs` provider, pass via `Client::use_preconfigured_tls`. CI measurement (4-core ubuntu vs npmjs.org): ring's per-handshake CCS→AppData was 78 ms p50 / 154 ms max, all 128 parallel handshakes serialising across 4 cores. aws-lc-rs (BoringSSL primitives) is ~3× faster on x86_64. Saved ~420 ms preload on cold ant-design. ## DNS — per-family rotation `getaddrinfo` typically returns 10 v6 + 12 v4 for npmjs.org. A flat rotation across the joined list meant offsets 0..10 all started inside the v6 range; on hosts where v6 routing fails (GitHub Actions runners), every connection fell through to the *same* first-reachable v4. Rotate per-family so v4 conns cycle across all v4 addresses (and v6 over v6) — observed pcap on bun shows the same 4×64 distribution we now produce. ## Disk-cache bulk-readdir ETag index `PackageCache` lazy-builds a `HashSet<String>` of names with existing disk cache entries from a single `read_dir(cache_dir)` + per-`@scope` recurse. `get_versions_from_disk` and `get_version_manifest_from_disk` short-circuit via the index. Restores the warm-run 304 path that was temporarily removed in 46cb803 (per-package `try_exists` was 16 ms avg on the cold-run critical path; now zero). ## Lazy per-version `CoreVersionManifest` via `simd_json::OwnedValue` `Versions` now stores `keys: Vec<String>` (ordered version list) + `trees: HashMap<String, Arc<simd_json::OwnedValue>>` (pre-parsed JSON subtrees). Strongly-typed `CoreVersionManifest` is materialised on demand via `CoreVersionManifest::deserialize(tree.as_ref())` — zero-copy through `simd_json::OwnedValue`'s `Deserializer` impl, memoised in a `DashMap`. Resolver typically reads 1-3 of the ~500 versions per manifest; previous design built every one eagerly. ## `Arc<FullManifest>` in `MemoryCache` Cache previously returned `FullManifest` by value, deep-cloning the per-version HashMap (100-500 entries × String key clone + Arc bump per cache hit) on the resolver hot path. ~2730 cache hits during cold preload × ~200-entry HashMap clone = ~500k allocations on shared resolver threads, contending the allocator. Wrap in `Arc<FullManifest>`; cache hit becomes one atomic bump. ## `normalize_spec` returns `Cow<'a, str>` Was unconditionally allocating `(String, String)` even for the ~99 % of deps with no `npm:` / `workspace:` prefix. ~5460 String allocations per ant-design preload, all on resolver hot path. Common path now returns `Cow::Borrowed`. ## Drop `versions.keys.clone()` from cache-hit path `resolve_package`'s full-manifest cache-hit branch was cloning the entire `versions.keys: Vec<String>` (~200 entries) just to pass `&[String]` to `resolve_target_version`. Borrow directly via Arc auto-deref. ~360k String allocs eliminated (~1800 cache hits × ~200 entries). ## OnceMap dedup New `crate::util::oncemap` module: `DashMap` + `tokio::sync::Notify` coalescer for concurrent `resolve_full_manifest` callers of the same name. First caller fetches the network; others wait on the shared `Notify` and read the cached `Arc<V>`. Replaces the prior per-name `tokio::sync::Mutex<()>` gate that serialised the hot dispatch path. ## tracing file_filter info+ default File-layer log filter dropped from `utoo=debug` to `utoo=info`. Hot-path `tracing::debug!()` calls (cache hits, BFS dispatch, preload events) emit ~5-10 events per resolved manifest. With 2730+ manifests during cold preload that's 15-30k events that — even routed through the non_blocking appender's channel — pay format/serialise CPU on the resolving thread before the channel send. Override via `UTOO_FILE_LOG=debug` for diagnostics. ## indicatif progress bar — drop per-package message updates `PreloadFetching` and `PreloadProgress` used to call `format!("fetching/resolved {}", name)` + `PROGRESS_BAR.set_message()` per event. With ~9000 such calls per ant-design preload and an indicatif-internal `Mutex` per call, this serialised the main loop's fill-and-drain rate. The user can't visually parse 5460 message swaps in 3 seconds anyway. Counter still ticks via `PROGRESS_BAR.inc(1)`. ## HTTP + parse diagnostic infrastructure (used by PR4) `service/http.rs` ships `start_http_trace` / `finish_http_trace` + `start_parse_trace` / `finish_parse_trace` plus `record_http_interval` + `record_parse_interval` callbacks. `#[allow(dead_code)]` on the start/finish for now — the preload worker-pool refactor in the next PR (#TBD) wires them in. Also bumps the `+ Sync` bound on `RegistryClient` callers in `builder.rs` / `preload.rs` / `resolver/registry.rs` — required because the trait's default-method futures gained `+ Send` (needed downstream by tokio::spawn, but already correct for single-threaded resolvers too). Tests: 164 ruborist + 248/249 utoo-pm pass (1 pre-existing flake on `test_update_package_binary_fsevents` when run in parallel, passes alone). Stacks: PR4 (preload worker-pool architecture) targets this branch and adds the bound propagation + spawn refactor on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two adjustments needed for the wasm target after introducing aws-lc-rs and Send/Sync trait bounds: 1. Move `rustls` (with `aws-lc-rs` feature) and `rustls-native-certs` under `[target.'cfg(not(target_arch = "wasm32"))'.dependencies]`. `aws-lc-sys` builds BoringSSL via `cc` and doesn't support the `wasm32-unknown-unknown` target (no `stdlib.h` etc). The wasm reqwest path uses the browser fetch API and ignores rustls. 2. Add `MaybeSend` / `MaybeSync` shim traits in `util::maybe_send`: on native they expand to `Send` / `Sync`; on wasm32 they are vacuous (impl for every type). wasm-bindgen's `JsFuture` is `!Send` so the trait surface had to either drop the bound on wasm or use a conditional shim. Replace `+ Send` and `Self: Sync` in the `RegistryClient` trait + caller bounds in `builder.rs` / `preload.rs` / `resolver/registry.rs` with `+ MaybeSend` / `Self: MaybeSync`. 3. cfg-gate `service/cache.rs` `tokio::spawn` for fire-and-forget disk writes — wasm uses `wasm_bindgen_futures::spawn_local` instead since the futures are `!Send`. 4. cfg-gate the `OnceMap` coalescer in `service/registry.rs` — wasm runs single-threaded so coalescing concurrent fetches is a no-op anyway; call the network path directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2831262 to
2d5befb
Compare
The headline architectural change of #2818 — preload phase shifts from a single-task `FuturesUnordered` cooperative poller to N long-lived `tokio::spawn` workers (or `wasm_bindgen_futures::spawn_local` on wasm32 where Send isn't satisfied). Stacks on top of #2826. ## Why Old design: main task owned `FuturesUnordered`, polled all preload futures cooperatively, and ran every per-future continuation (post-await body, completion handler, dispatch refill) on the same single task. The deeper await chain inside `resolve_package` (cache check + `OnceMap::get_or_init` + `RetryIf` + `request.send` + `bytes` + parse spawn_blocking) made each future yield 5+ times, and every yield round-tripped through main — saturating it. CI ant-design preload sustained avg_conc=55-61 even after Mutex / allocator hot-path eliminations, while the standalone manifest-bench (#2824) hit 92 on the same reqwest stack. ## How N long-lived `tokio::spawn` workers pulling from a shared lock-free `SegQueue<Dep>` with `DashSet` dedup. Each worker owns an `Arc<R>` clone and runs `resolve_package` on tokio's global executor — futures progress fully independently, no cooperative poll bottleneck. Main task only drains an `mpsc::unbounded_channel` of completions to fire receiver events + on_manifest callback. Termination: workers track `dispatched` / `completed: AtomicUsize` and park on a shared `Notify` when the queue is empty. When the last completion makes `completed == dispatched` and the queue is empty, the finishing worker raises a `shutdown` flag and wakes others; all workers drop their result_tx clones, the channel closes, and the main `recv().await` loop exits. ## Trait surface change - `MockRegistryClient` + `MockPackage` `derive(Clone)` so tests can wrap the mock in `Arc` for the new signature - `preload_manifests` takes `registry: Arc<R>` (was `&R`); call site in `run_preload_phase` clones the borrowed registry into a fresh `Arc`. Bound at every public surface up the chain bumped to `R: RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, `R::Error: MaybeSend`. The `MaybeSend` / `MaybeSync` shims (added in #2826) keep the trait surface wasm-compatible. ## Companion changes folded in - **Inline simd_json parse** — drop `tokio::task::spawn_blocking` in `service/manifest.rs`. Worker-pool surfaced parse blocking- pool queue saturation: `queue p95=200ms sum=70-89s` over 2730 manifests on cap=4 CI runners. Inline parse on the worker thread eliminates dispatch + queue overhead. - **Workspace package.json parallel reads** — switch the per-pattern `for path in matched_paths` serial loop to `FuturesUnordered` fan-out. ant-design has ~200 workspace packages; saved ~150ms. - **Setup phase + lockfile-write timing logs** — round out the per-phase wall account for the bench-comment infrastructure. - **Manifests concurrency cap 64 → 128** — worker-pool delivers the parallelism that justifies the cap raise. CI ant-design avg_conc 84 at cap=128 (up from 55 under the old architecture); preload wall 3.10s → 2.15s. ## Wasm CI cfg-gates `tokio::spawn` to `wasm_bindgen_futures::spawn_local` on wasm32 since wasm-bindgen's `JsFuture` is `!Send`. Workers still run independently — single-threaded under wasm but the queue + Notify + mpsc termination story is unchanged. `cargo check -p utoo-wasm --target wasm32-unknown-unknown` clean. Tests: 164 ruborist + 10 doctests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 8.88s | 0.20s | 10.08s | 10.11s | 735M | 325.9K |
| utoo-next | 9.86s | 0.11s | 11.50s | 13.05s | 1.27G | 154.5K |
| utoo-npm | 9.90s | 0.24s | 11.46s | 13.08s | 1.30G | 157.2K |
| utoo | 9.24s | 0.70s | 10.74s | 13.06s | 2.28G | 268.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.1K | 17.4K | 1.17G | 6M | 1.85G | 1.73G | 1M |
| utoo-next | 165.6K | 151.6K | 1.15G | 4M | 1.70G | 1.69G | 2M |
| utoo-npm | 165.0K | 143.6K | 1.15G | 4M | 1.70G | 1.69G | 2M |
| utoo | 152.5K | 97.1K | 1.13G | 5M | 1.70G | 1.69G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 1.98s | 0.05s | 3.90s | 1.05s | 496M | 168.3K |
| utoo-next | 5.45s | 0.41s | 6.04s | 1.13s | 430M | 75.0K |
| utoo-npm | 5.16s | 0.08s | 6.00s | 1.15s | 431M | 74.2K |
| utoo | 5.88s | 2.39s | 4.53s | 1.95s | 1.37G | 169.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 8.1K | 4.6K | 200M | 3M | 104M | - | 1M |
| utoo-next | 66.3K | 2.9K | 204M | 2M | 9M | 5M | 2M |
| utoo-npm | 65.4K | 2.9K | 201M | 2M | 9M | 5M | 2M |
| utoo | 76.9K | 7.3K | 196M | 3M | 7M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.86s | 0.41s | 6.20s | 9.88s | 607M | 197.9K |
| utoo-next | 6.89s | 1.23s | 5.51s | 11.11s | 754M | 117.6K |
| utoo-npm | 7.03s | 1.16s | 5.59s | 11.16s | 856M | 120.5K |
| utoo | 6.89s | 0.72s | 5.46s | 11.31s | 952M | 119.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 4.5K | 7.0K | 1004M | 4M | 1.75G | 1.75G | 1M |
| utoo-next | 104.1K | 64.7K | 975M | 2M | 1.69G | 1.69G | 2M |
| utoo-npm | 106.6K | 69.9K | 975M | 3M | 1.69G | 1.69G | 2M |
| utoo | 116.2K | 86.7K | 975M | 3M | 1.69G | 1.69G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.16s | 0.07s | 0.22s | 2.30s | 137M | 32.1K |
| utoo-next | 2.48s | 0.06s | 0.62s | 3.89s | 81M | 19.0K |
| utoo-npm | 2.46s | 0.02s | 0.64s | 3.89s | 82M | 19.0K |
| utoo | 2.32s | 0.05s | 0.55s | 3.84s | 82M | 19.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 245 | 30 | 7M | 24K | 1.90G | 1.71G | 1M |
| utoo-next | 46.9K | 19.7K | 16K | 27K | 1.69G | 1.69G | 2M |
| utoo-npm | 48.8K | 21.5K | 16K | 15K | 1.69G | 1.69G | 2M |
| utoo | 49.6K | 21.1K | 15K | 10K | 1.70G | 1.69G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 42.63s | 8.15s | 10.05s | 10.61s | 551M | 364.2K |
| utoo-next | 81.74s | 51.23s | 8.50s | 14.40s | 879M | 118.7K |
| utoo-npm | 24.07s | 7.92s | 8.33s | 13.81s | 882M | 118.2K |
| utoo | 142.78s | 117.12s | 8.00s | 13.86s | 792M | 105.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 131.2K | 6.1K | 1.13G | 16M | 1.85G | 1.74G | 2M |
| utoo-next | 244.9K | 126.8K | 991M | 10M | 1.69G | 1.69G | 2M |
| utoo-npm | 221.1K | 123.8K | 990M | 8M | 1.69G | 1.69G | 2M |
| utoo | 238.7K | 87.6K | 1019M | 12M | 1.69G | 1.69G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.03s | 0.32s | 3.85s | 1.17s | 592M | 195.9K |
| utoo-next | 5.47s | 0.28s | 2.09s | 0.56s | 75M | 15.8K |
| utoo-npm | 3.34s | 0.12s | 1.96s | 0.57s | 75M | 16.1K |
| utoo | 9.29s | 12.21s | 1.28s | 0.23s | 78M | 17.1K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 8.4K | 5.2K | 151M | 3M | 106M | - | 2M |
| utoo-next | 47.5K | 575 | 13M | 2M | - | 4M | 2M |
| utoo-npm | 43.9K | 1.1K | 13M | 2M | - | 4M | 2M |
| utoo | 24.6K | 67 | 16M | 2M | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 22.69s | 5.05s | 6.12s | 9.25s | 244M | 101.8K |
| utoo-next | 44.32s | 38.85s | 6.42s | 12.93s | 644M | 100.3K |
| utoo-npm | 72.66s | 20.68s | 6.60s | 13.60s | 578M | 86.9K |
| utoo | 70.07s | 19.12s | 6.42s | 13.33s | 673M | 91.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 75.7K | 3.5K | 996M | 10M | 1.71G | 1.71G | 2M |
| utoo-next | 199.2K | 100.9K | 1005M | 8M | 1.69G | 1.69G | 2M |
| utoo-npm | 227.3K | 92.6K | 979M | 10M | 1.69G | 1.69G | 2M |
| utoo | 225.8K | 79.6K | 990M | 11M | 1.69G | 1.69G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.18s | 0.08s | 0.20s | 2.26s | 135M | 30.9K |
| utoo-next | 2.41s | 0.22s | 0.63s | 3.94s | 82M | 19.1K |
| utoo-npm | 2.46s | 0.04s | 0.63s | 3.93s | 82M | 19.2K |
| utoo | 2.35s | 0.06s | 0.58s | 3.84s | 83M | 19.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 643 | 26 | 3M | 50K | 1.84G | 1.74G | 2M |
| utoo-next | 46.9K | 20.4K | 47K | 38K | 1.69G | 1.69G | 2M |
| utoo-npm | 48.5K | 22.3K | 43K | 13K | 1.69G | 1.69G | 2M |
| utoo | 50.0K | 21.6K | 52K | 17K | 1.69G | 1.69G | 2M |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 18.58s | 2.53s | 6.48s | 18.91s | 742M | 47.9K |
| utoo-next | 23.68s | 6.53s | 10.98s | 25.76s | 1.17G | 111.5K |
| utoo-npm | 16.68s | 1.54s | 8.33s | 17.22s | 1.02G | 101.0K |
| utoo | 29.78s | 1.53s | 13.70s | 34.22s | 1.98G | 172.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.8K | 143.4K | - | - | 1.79G | 1.91G | 1M |
| utoo-next | 13.1K | 359.9K | - | - | 1.64G | 1.84G | 2M |
| utoo-npm | 12.7K | 361.7K | - | - | 1.64G | 1.88G | 2M |
| utoo | 10.9K | 341.1K | - | - | 1.64G | 1.84G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.62s | 0.11s | 2.99s | 1.39s | 476M | 30.9K |
| utoo-next | 9.47s | 0.47s | 7.13s | 4.93s | 542M | 36.5K |
| utoo-npm | 6.94s | 0.62s | 5.38s | 3.61s | 545M | 37.1K |
| utoo | 7.65s | 0.90s | 5.76s | 4.72s | 1.37G | 95.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 27 | 23.7K | - | - | 110M | - | 1M |
| utoo-next | 20 | 70.8K | - | - | 28M | 5M | 2M |
| utoo-npm | 11 | 72.0K | - | - | 28M | 5M | 2M |
| utoo | 38 | 82.2K | - | - | 27M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 19.22s | 2.62s | 3.73s | 19.20s | 541M | 35.2K |
| utoo-next | 13.59s | 2.83s | 3.62s | 14.83s | 747M | 75.4K |
| utoo-npm | 14.29s | 3.45s | 3.51s | 14.16s | 810M | 77.0K |
| utoo | 15.56s | 5.52s | 4.36s | 19.87s | 754M | 74.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 5.7K | 137.1K | - | - | 1.70G | 1.94G | 1M |
| utoo-next | 1.5K | 231.2K | - | - | 1.61G | 1.87G | 2M |
| utoo-npm | 1.4K | 230.5K | - | - | 1.61G | 1.87G | 2M |
| utoo | 1.4K | 226.2K | - | - | 1.61G | 1.87G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 4.94s | 0.49s | 0.10s | 2.12s | 53M | 4.0K |
| utoo-next | 5.78s | 1.03s | 0.73s | 4.03s | 94M | 6.8K |
| utoo-npm | 4.20s | 0.58s | 0.51s | 2.79s | 89M | 6.7K |
| utoo | 5.97s | 0.52s | 0.64s | 4.23s | 89M | 6.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.8K | 843 | - | - | 1.87G | 1.92G | 1M |
| utoo-next | 12.4K | 74.7K | - | - | 1.61G | 1.86G | 2M |
| utoo-npm | 12.5K | 75.8K | - | - | 1.61G | 1.86G | 2M |
| utoo | 12.9K | 71.5K | - | - | 1.63G | 1.86G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 29.56s | 6.65s | 6.63s | 19.34s | 615M | 39.8K |
| utoo-next | 28.27s | 3.54s | 7.87s | 24.40s | 693M | 76.6K |
| utoo-npm | 25.46s | 1.06s | 8.09s | 25.60s | 813M | 77.0K |
| utoo | 30.32s | 11.86s | 6.67s | 20.11s | 697M | 75.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.9K | 146.5K | - | - | 1.77G | 1.91G | 2M |
| utoo-next | 4.3K | 384.7K | - | - | 1.61G | 1.84G | 2M |
| utoo-npm | 994 | 370.2K | - | - | 1.61G | 1.87G | 2M |
| utoo | 4.8K | 376.2K | - | - | 1.61G | 1.84G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.50s | 0.07s | 3.05s | 1.73s | 601M | 39.1K |
| utoo-next | 11.70s | 13.43s | 1.56s | 0.86s | 79M | 5.8K |
| utoo-npm | 4.91s | 0.51s | 1.41s | 0.81s | 79M | 5.7K |
| utoo | 10.15s | 14.81s | 1.26s | 0.43s | 84M | 6.1K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 20 | 22.2K | - | - | 111M | - | 2M |
| utoo-next | 5 | 46.8K | - | - | - | 4M | 2M |
| utoo-npm | 6 | 45.3K | - | - | - | 4M | 2M |
| utoo | 25 | 25.4K | - | - | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 20.26s | 2.51s | 4.03s | 19.09s | 296M | 19.5K |
| utoo-next | 26.42s | 2.12s | 4.72s | 16.88s | 598M | 73.0K |
| utoo-npm | 28.27s | 2.14s | 4.59s | 16.37s | 674M | 74.9K |
| utoo | 27.41s | 1.57s | 4.99s | 18.54s | 647M | 74.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 2.1K | 146.6K | - | - | 1.65G | 1.92G | 2M |
| utoo-next | 1.6K | 345.5K | - | - | 1.61G | 1.83G | 2M |
| utoo-npm | 1.6K | 366.2K | - | - | 1.61G | 1.83G | 2M |
| utoo | 1.6K | 339.1K | - | - | 1.61G | 1.83G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 4.56s | 0.06s | 0.08s | 1.94s | 44M | 3.4K |
| utoo-next | 3.80s | 0.51s | 0.52s | 2.68s | 95M | 7.1K |
| utoo-npm | 4.56s | 0.65s | 0.57s | 3.01s | 90M | 6.8K |
| utoo | 4.12s | 0.78s | 0.46s | 3.01s | 95M | 7.1K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 13.5K | 634 | - | - | 1.78G | 1.91G | 2M |
| utoo-next | 12.2K | 72.3K | - | - | 1.61G | 1.83G | 2M |
| utoo-npm | 12.2K | 73.1K | - | - | 1.61G | 1.83G | 2M |
| utoo | 12.3K | 81.2K | - | - | 1.61G | 1.83G | 2M |
Summary
Third of 4 split PRs from #2818. Independently-motivated allocator + cache hot-path optimisations for the resolver. Each landed during the worker-pool exploration but stands alone — they do not depend on the worker-pool architecture.
Changes (each ~50ms preload savings, cumulative ~200ms)
aws-lc-rsinstead ofring(~420ms saved on cold preload TLS handshakes — measured CCS→AppData 78ms→17ms)HashSet<String>of cached names from oneread_dir, restores warm 304 path without per-packagetry_existsstormCoreVersionManifestparse viasimd_json::OwnedValue+DashMapmemoisation — resolver typically reads 1-3 of ~500 versions per manifestArc<FullManifest>inMemoryCache— atomic-bump clone instead of deep HashMap clone (~500k allocs eliminated)normalize_specreturnsCow<'_, str>— common path now zero-alloc (~5460 allocs eliminated)versions.keys.clone()on cache-hit path (~360k String allocs eliminated)OnceMapdedup for concurrentresolve_full_manifestcallersinfo+default — drops format/serialize CPU for ~15-30k hot-path debug events per cold preload (override viaUTOO_FILE_LOG=debug)Trait surface change
RegistryClient's default-method futures gain+ SendandSelf: Syncbounds. Required by spawn use in #PR4 but works equally for single-threaded resolvers. Adds+ Syncbound onresolve_package/resolve_registry_dep/process_dependency/ preload helpers.Test plan
cargo fmt+cargo clippy --all-targets -- -D warnings --no-depscleancargo test -p utoo-ruborist164 + 10 doctests passcargo test -p utoo-pm248/249 pass (1 pre-existing flake ontest_update_package_binary_fseventsruns green alone)Stacking
nextperf/preload-worker-pool) targetsperf/manifest-cacheand adds the worker-pool spawn refactor + Send/Clone/Sync/'static bound propagation.Context
Full exploration journey + failed-experiments catalog: #2818
Bench infrastructure: #2824
🤖 Generated with Claude Code