perf(pm): preload multi-thread worker pool (tokio::spawn)#2832
perf(pm): preload multi-thread worker pool (tokio::spawn)#2832
Conversation
N long-lived `tokio::spawn` workers pull work from a shared lock-free `SegQueue`. Each worker is independent on tokio's multi-thread runtime, so when one worker is parsing manifest JSON (CPU-bound, simd_json), other workers continue driving their network IO and other parses run on different cores. Replaces the prior `FuturesUnordered` design where the main task owned all preload futures and polled them cooperatively — every per-future await continuation (including parse) ran serialised in that single task. The previous spawn_local-only PR (#2830, closed) showed no measurable gain on p1_resolve, confirming that the architectural fix alone — N independent task identities — isn't the perf driver. The actual driver is multi-thread parse parallelism across cores; that requires `tokio::spawn`, which requires `Send` bounds. ## Trait surface A new `crate::maybe_send::{MaybeSend, MaybeSync}` shim expands to `Send`/`Sync` on native and to no-op marker traits on `wasm32`. The `RegistryClient` trait's `impl Future<...>` returns gain `+ MaybeSend`, and the default impls add `where Self: MaybeSync, Self::Error: MaybeSend`. Callers use a new `PreloadRegistry` trait alias that bundles `RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, applied at the 5 resolver entry points (`build_deps*`, `resolve*`, `run_preload_phase`). ## Wasm `tokio::spawn` is cfg-gated to `wasm_bindgen_futures::spawn_local` on wasm32. Workers run independently on the JS event loop — single-thread there, since `JsFuture` is `!Send` — but the queue + Notify + mpsc termination story is identical. ## Replaces #2827, closed #2830 - #2827 (silent linux e2e crash, root cause undiagnosed) — this PR re-derives the architectural change with the same shim approach but on a clean origin/next baseline, skipping unrelated companion changes (cap raise / inline parse / workspace parallel reads / manifest cache stack). - #2830 (spawn_local minimal) — bench showed nop, confirming multi-thread parse is the real driver. ## Local verification - `cargo fmt -p utoo-ruborist -p utoo-pm` - `cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps` - `cargo test -p utoo-ruborist` — 160 unit + 10 doctests pass - `cargo build --release -p utoo-pm` - `utoo deps` on ant-design (Case 2 silent-crash test from #2827) passes — preload 108.9s wall on local flaky network, 4571 success / 0 failed, 11 worker threads scheduled. CI will run `pm-e2e-*` (4 platforms, especially linux which crashed on #2827) plus `pm-bench-phases-*` for the real head-to-head against `utoo-next`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the manifest preloading logic to use a multi-threaded worker pool based on a lock-free queue (SegQueue) and a concurrent set (DashSet), replacing the previous FuturesUnordered implementation. It also introduces a maybe_send shim to abstract Send and Sync requirements for compatibility between native multi-threaded runtimes and WASM. Feedback identifies a critical race condition in the worker termination logic where the dispatch counter is updated after items are queued, as well as an inefficient snapshotting of the processed set that should be replaced with a simple length check.
| for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker) | ||
| { | ||
| let key = format!("{}@{}", tdep.0, tdep.1); | ||
| if processed.insert(key) { | ||
| pending.push(tdep); | ||
| count += 1; | ||
| } | ||
| } | ||
| if count > 0 { | ||
| dispatched.fetch_add(count, Ordering::Release); |
There was a problem hiding this comment.
There is a potential race condition in the termination logic. The dispatched counter is incremented after items are pushed to the pending queue. If a worker thread finishes its current task and checks the termination condition (done == dispatched.load(Ordering::Acquire)) before the producer thread has incremented dispatched for the new transitive dependencies, the worker pool might shut down prematurely while there is still work in the queue. To fix this, always increment the dispatched counter before pushing the item to the queue.
| for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker) | |
| { | |
| let key = format!("{}@{}", tdep.0, tdep.1); | |
| if processed.insert(key) { | |
| pending.push(tdep); | |
| count += 1; | |
| } | |
| } | |
| if count > 0 { | |
| dispatched.fetch_add(count, Ordering::Release); | |
| for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker) | |
| { | |
| let key = format!("{}@{}", tdep.0, tdep.1); | |
| if processed.insert(key) { | |
| dispatched.fetch_add(1, Ordering::Release); | |
| pending.push(tdep); | |
| count += 1; | |
| } | |
| } | |
| if count > 0 { | |
| // Wake parked workers so they pick up the new | |
| // work before checking the termination condition. | |
| notify.notify_waiters(); | |
| } |
| stats.total_processed = { | ||
| let mut set: HashSet<String> = HashSet::with_capacity(processed.len()); | ||
| for entry in processed.iter() { | ||
| set.insert(entry.key().clone()); | ||
| } | ||
| set.len() | ||
| }; |
There was a problem hiding this comment.
Calculating total_processed by cloning the entire DashSet into a new HashSet is highly inefficient, especially for large dependency trees. Since the workers have finished at this point, processed.len() provides an accurate count without the overhead of allocations and string cloning.
stats.total_processed = processed.len();
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.43s | 0.28s | 10.04s | 10.17s | 669M | 322.8K |
| utoo-npm | 10.34s | 0.21s | 11.54s | 13.34s | 1.17G | 154.9K |
| utoo | 10.75s | 1.49s | 12.58s | 13.03s | 1.42G | 145.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 16.1K | 18.6K | 1.16G | 7M | 1.83G | 1.72G | 1M |
| utoo-npm | 176.0K | 165.0K | 1.14G | 5M | 1.68G | 1.68G | 2M |
| utoo | 156.7K | 122.4K | 1.15G | 4M | 1.68G | 1.68G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.21s | 0.04s | 3.88s | 1.06s | 480M | 174.7K |
| utoo-npm | 5.31s | 0.05s | 5.91s | 1.04s | 433M | 76.9K |
| utoo | 4.59s | 1.66s | 6.86s | 1.61s | 551M | 67.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 10.2K | 3.9K | 200M | 3M | 104M | - | 1M |
| utoo-npm | 65.8K | 2.9K | 204M | 2M | 9M | 5M | 2M |
| utoo | 66.7K | 14.5K | 211M | 2M | 9M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 7.51s | 1.52s | 6.12s | 9.98s | 598M | 203.7K |
| utoo-npm | 8.19s | 1.54s | 5.47s | 11.51s | 920M | 111.1K |
| utoo | 6.17s | 0.24s | 5.37s | 10.89s | 817M | 111.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 7.2K | 7.8K | 993M | 4M | 1.73G | 1.73G | 1M |
| utoo-npm | 124.6K | 81.2K | 965M | 3M | 1.67G | 1.67G | 2M |
| utoo | 101.0K | 68.1K | 964M | 2M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.34s | 0.12s | 0.20s | 2.33s | 136M | 32.4K |
| utoo-npm | 2.65s | 0.41s | 0.59s | 3.85s | 82M | 19.0K |
| utoo | 2.39s | 0.04s | 0.58s | 3.84s | 81M | 19.0K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 521 | 24 | 7M | 31K | 1.88G | 1.72G | 1M |
| utoo-npm | 49.2K | 21.8K | 19K | 46K | 1.67G | 1.67G | 2M |
| utoo | 46.6K | 19.4K | 14K | 3K | 1.68G | 1.67G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 24.21s | 4.20s | 9.25s | 9.81s | 545M | 379.5K |
| utoo-npm | 25.56s | 4.24s | 8.36s | 13.78s | 771M | 113.8K |
| utoo | 26.26s | 4.52s | 8.13s | 13.86s | 711M | 103.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 69.3K | 6.0K | 1.12G | 11M | 1.84G | 1.72G | 2M |
| utoo-npm | 238.9K | 112.8K | 992M | 8M | 1.67G | 1.67G | 2M |
| utoo | 233.8K | 96.3K | 988M | 8M | 1.67G | 1.68G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 1.57s | 0.20s | 3.94s | 1.07s | 643M | 189.5K |
| utoo-npm | 3.32s | 0.12s | 1.97s | 0.61s | 76M | 15.7K |
| utoo | 3.37s | 0.10s | 1.94s | 0.70s | 76M | 16.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 4.9K | 5.9K | 151M | 2M | 106M | - | 2M |
| utoo-npm | 43.8K | 1.2K | 14M | 2M | - | 4M | 2M |
| utoo | 40.0K | 2.6K | 14M | 2M | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 22.26s | 13.17s | 5.94s | 9.43s | 240M | 92.9K |
| utoo-npm | 22.22s | 2.01s | 6.12s | 12.60s | 566M | 103.4K |
| utoo | 21.98s | 1.53s | 6.13s | 12.59s | 577M | 98.0K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 46.8K | 4.9K | 999M | 8M | 1.73G | 1.73G | 2M |
| utoo-npm | 182.7K | 98.4K | 984M | 6M | 1.67G | 1.67G | 2M |
| utoo | 186.7K | 98.1K | 978M | 6M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.08s | 0.20s | 0.21s | 2.30s | 135M | 32.1K |
| utoo-npm | 2.16s | 0.09s | 0.59s | 3.79s | 82M | 19.1K |
| utoo | 2.27s | 0.11s | 0.58s | 3.84s | 82M | 18.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 700 | 31 | 7M | 51K | 1.88G | 1.72G | 2M |
| utoo-npm | 48.2K | 21.3K | 56K | 36K | 1.67G | 1.67G | 2M |
| utoo | 49.2K | 20.3K | 55K | 13K | 1.67G | 1.67G | 2M |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 21.95s | 0.35s | 6.90s | 22.76s | 777M | 50.2K |
| utoo-npm | 23.47s | 0.59s | 11.17s | 27.31s | 1.01G | 102.2K |
| utoo | 23.53s | 2.57s | 11.91s | 28.32s | 1.18G | 106.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 16.3K | 137.0K | - | - | 1.82G | 1.94G | 1M |
| utoo-npm | 13.4K | 367.8K | - | - | 1.63G | 1.83G | 2M |
| utoo | 8.9K | 345.5K | - | - | 1.63G | 1.87G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.55s | 0.09s | 3.18s | 1.56s | 495M | 32.1K |
| utoo-npm | 6.77s | 0.06s | 5.40s | 4.02s | 540M | 36.7K |
| utoo | 4.40s | 0.12s | 5.57s | 3.15s | 601M | 40.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 37 | 18.9K | - | - | 111M | - | 1M |
| utoo-npm | 58 | 72.8K | - | - | 28M | 5M | 2M |
| utoo | 14 | 71.9K | - | - | 28M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 22.59s | 3.85s | 4.09s | 23.47s | 538M | 35.0K |
| utoo-npm | 20.51s | 2.26s | 5.09s | 24.19s | 857M | 79.5K |
| utoo | 17.87s | 0.74s | 4.92s | 23.92s | 741M | 72.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 6.0K | 135.0K | - | - | 1.70G | 1.94G | 1M |
| utoo-npm | 1.9K | 242.9K | - | - | 1.61G | 1.83G | 2M |
| utoo | 1.3K | 231.5K | - | - | 1.61G | 1.83G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 8.95s | 0.39s | 0.17s | 3.51s | 51M | 3.9K |
| utoo-npm | 6.81s | 0.40s | 0.88s | 4.95s | 87M | 6.5K |
| utoo | 6.96s | 0.50s | 0.79s | 5.06s | 92M | 6.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 17.6K | 1.2K | - | - | 1.86G | 1.91G | 1M |
| utoo-npm | 12.6K | 81.1K | - | - | 1.61G | 1.83G | 2M |
| utoo | 13.4K | 81.5K | - | - | 1.63G | 1.83G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 127.93s | 115.64s | 7.35s | 19.55s | 558M | 36.1K |
| utoo-npm | 39.29s | 8.02s | 6.62s | 18.01s | 780M | 77.1K |
| utoo | 44.33s | 7.65s | 8.31s | 23.56s | 761M | 77.1K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.8K | 166.1K | - | - | 1.79G | 1.90G | 2M |
| utoo-npm | 4.3K | 460.3K | - | - | 1.61G | 1.84G | 2M |
| utoo | 4.1K | 438.6K | - | - | 1.61G | 1.87G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 5.90s | 5.16s | 2.52s | 1.39s | 545M | 35.5K |
| utoo-npm | 5.44s | 0.26s | 1.35s | 0.72s | 78M | 5.7K |
| utoo | 6.12s | 1.47s | 1.68s | 0.92s | 78M | 5.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 28 | 27.2K | - | - | 111M | - | 2M |
| utoo-npm | 5 | 44.0K | - | - | - | 4M | 2M |
| utoo | 17 | 37.1K | - | - | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 23.94s | 1.81s | 4.16s | 19.44s | 309M | 20.3K |
| utoo-npm | 58.43s | 30.68s | 6.90s | 24.48s | 751M | 76.9K |
| utoo | 90.61s | 39.32s | 7.01s | 23.66s | 631M | 74.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 1.9K | 162.5K | - | - | 1.65G | 1.92G | 2M |
| utoo-npm | 1.7K | 368.6K | - | - | 1.61G | 1.87G | 2M |
| utoo | 1.7K | 404.1K | - | - | 1.61G | 1.87G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 4.05s | 0.89s | 0.08s | 1.82s | 44M | 3.4K |
| utoo-npm | 3.40s | 0.57s | 0.49s | 2.70s | 97M | 7.1K |
| utoo | 3.71s | 0.15s | 0.45s | 2.77s | 97M | 7.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 13.2K | 955 | - | - | 1.78G | 1.90G | 2M |
| utoo-npm | 12.1K | 72.2K | - | - | 1.61G | 1.82G | 2M |
| utoo | 12.2K | 71.4K | - | - | 1.61G | 1.82G | 2M |
Summary
Iteration history
Why this should work where #2830 didn't
Single-thread `spawn_local` keeps all parsing serial in one OS thread, same as `FuturesUnordered`. Multi-thread `tokio::spawn` lets parse work run on whatever tokio worker thread is free while other workers drive their network IO. Local verification: `utoo deps` on ant-design ran across 11 worker threads (`Caching` log events distributed: TID 02 through 12+, ~1.3K events each).
Risk: PR #2827 silent crash
PR #2827 silently exited 1 on linux Case 2 ant-design e2e (root cause never diagnosed). This PR is a clean rebuild from `origin/next` with only the worker-pool change + `MaybeSend` shim. macOS local verification passes (preload 108.9s, 4571 manifests, exit 0). CI `utoopm-e2e-ubuntu-latest-x86_64-unknown-linux-gnu` is the smoke test that matters.
Test plan
🤖 Generated with Claude Code