perf(pm): preload multi-thread worker pool (tokio::spawn) by elrrrrrrr · Pull Request #2832 · utooland/utoo

elrrrrrrr · 2026-04-27T09:44:04Z

Summary

Preload phase: `FuturesUnordered` → N `tokio::spawn` workers on lock-free `SegQueue` + `DashSet` dedup
Multi-thread parse parallelism (simd_json across cores) is the actual perf driver — confirmed by closed perf(pm): preload worker pool replaces FuturesUnordered #2830 (spawn_local-only) showing nop
New `MaybeSend`/`MaybeSync` shim keeps wasm32 building (`spawn_local` fallback)

Iteration history

PR	Status	Mechanism	Bench result
#2827	closed	tokio::spawn + bundled changes (cap, inline parse, workspace, cache)	-30% claimed; silent linux crash
#2830	closed	`spawn_local` only (single thread)	nop on p1_resolve
this	open	`tokio::spawn` + minimal `MaybeSend` shim, no other changes	TBD

Why this should work where #2830 didn't

Single-thread `spawn_local` keeps all parsing serial in one OS thread, same as `FuturesUnordered`. Multi-thread `tokio::spawn` lets parse work run on whatever tokio worker thread is free while other workers drive their network IO. Local verification: `utoo deps` on ant-design ran across 11 worker threads (`Caching` log events distributed: TID 02 through 12+, ~1.3K events each).

Risk: PR #2827 silent crash

PR #2827 silently exited 1 on linux Case 2 ant-design e2e (root cause never diagnosed). This PR is a clean rebuild from `origin/next` with only the worker-pool change + `MaybeSend` shim. macOS local verification passes (preload 108.9s, 4571 manifests, exit 0). CI `utoopm-e2e-ubuntu-latest-x86_64-unknown-linux-gnu` is the smoke test that matters.

Test plan

`cargo fmt -p utoo-ruborist -p utoo-pm`
`cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps`
`cargo test -p utoo-ruborist` — 160 unit + 10 doctests pass
`cargo build --release -p utoo-pm`
ant-design `utoo deps` macOS: exit 0, 4571 success / 0 failed, multi-thread distribution confirmed
CI `pm-e2e-*` (4 platforms; linux is the silent-crash smoke)
CI `utooweb-ci-build-wasm`
CI `pm-bench-phases-*` head-to-head vs `utoo-next`

🤖 Generated with Claude Code

N long-lived `tokio::spawn` workers pull work from a shared lock-free `SegQueue`. Each worker is independent on tokio's multi-thread runtime, so when one worker is parsing manifest JSON (CPU-bound, simd_json), other workers continue driving their network IO and other parses run on different cores. Replaces the prior `FuturesUnordered` design where the main task owned all preload futures and polled them cooperatively — every per-future await continuation (including parse) ran serialised in that single task. The previous spawn_local-only PR (#2830, closed) showed no measurable gain on p1_resolve, confirming that the architectural fix alone — N independent task identities — isn't the perf driver. The actual driver is multi-thread parse parallelism across cores; that requires `tokio::spawn`, which requires `Send` bounds. ## Trait surface A new `crate::maybe_send::{MaybeSend, MaybeSync}` shim expands to `Send`/`Sync` on native and to no-op marker traits on `wasm32`. The `RegistryClient` trait's `impl Future<...>` returns gain `+ MaybeSend`, and the default impls add `where Self: MaybeSync, Self::Error: MaybeSend`. Callers use a new `PreloadRegistry` trait alias that bundles `RegistryClient + Clone + MaybeSend + MaybeSync + 'static`, applied at the 5 resolver entry points (`build_deps*`, `resolve*`, `run_preload_phase`). ## Wasm `tokio::spawn` is cfg-gated to `wasm_bindgen_futures::spawn_local` on wasm32. Workers run independently on the JS event loop — single-thread there, since `JsFuture` is `!Send` — but the queue + Notify + mpsc termination story is identical. ## Replaces #2827, closed #2830 - #2827 (silent linux e2e crash, root cause undiagnosed) — this PR re-derives the architectural change with the same shim approach but on a clean origin/next baseline, skipping unrelated companion changes (cap raise / inline parse / workspace parallel reads / manifest cache stack). - #2830 (spawn_local minimal) — bench showed nop, confirming multi-thread parse is the real driver. ## Local verification - `cargo fmt -p utoo-ruborist -p utoo-pm` - `cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps` - `cargo test -p utoo-ruborist` — 160 unit + 10 doctests pass - `cargo build --release -p utoo-pm` - `utoo deps` on ant-design (Case 2 silent-crash test from #2827) passes — preload 108.9s wall on local flaky network, 4571 success / 0 failed, 11 worker threads scheduled. CI will run `pm-e2e-*` (4 platforms, especially linux which crashed on #2827) plus `pm-bench-phases-*` for the real head-to-head against `utoo-next`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request refactors the manifest preloading logic to use a multi-threaded worker pool based on a lock-free queue (SegQueue) and a concurrent set (DashSet), replacing the previous FuturesUnordered implementation. It also introduces a maybe_send shim to abstract Send and Sync requirements for compatibility between native multi-threaded runtimes and WASM. Feedback identifies a critical race condition in the worker termination logic where the dispatch counter is updated after items are queued, as well as an inefficient snapshotting of the processed set that should be replaced with a simple length check.

gemini-code-assist · 2026-04-27T09:49:40Z

+                        for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker)
+                        {
+                            let key = format!("{}@{}", tdep.0, tdep.1);
+                            if processed.insert(key) {
+                                pending.push(tdep);
+                                count += 1;
+                            }
+                        }
+                        if count > 0 {
+                            dispatched.fetch_add(count, Ordering::Release);


There is a potential race condition in the termination logic. The dispatched counter is incremented after items are pushed to the pending queue. If a worker thread finishes its current task and checks the termination condition (done == dispatched.load(Ordering::Acquire)) before the producer thread has incremented dispatched for the new transitive dependencies, the worker pool might shut down prematurely while there is still work in the queue. To fix this, always increment the dispatched counter before pushing the item to the queue.

Suggested change

for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker)

{

let key = format!("{}@{}", tdep.0, tdep.1);

if processed.insert(key) {

pending.push(tdep);

count += 1;

}

}

if count > 0 {

dispatched.fetch_add(count, Ordering::Release);

for tdep in extract_transitive_deps(&resolved.manifest, &config_for_worker)

{

let key = format!("{}@{}", tdep.0, tdep.1);

if processed.insert(key) {

dispatched.fetch_add(1, Ordering::Release);

pending.push(tdep);

count += 1;

}

}

if count > 0 {

// Wake parked workers so they pick up the new

// work before checking the termination condition.

notify.notify_waiters();

}

gemini-code-assist · 2026-04-27T09:49:40Z

+    stats.total_processed = {
+        let mut set: HashSet<String> = HashSet::with_capacity(processed.len());
+        for entry in processed.iter() {
+            set.insert(entry.key().clone());
+        }
+        set.len()
+    };


Calculating total_processed by cloning the entire DashSet into a new HashSet is highly inefficient, especially for large dependency trees. Since the workers have finished at this point, processed.len() provides an accurate count without the overhead of allocations and string cloning.

stats.total_processed = processed.len();

github-actions · 2026-04-27T10:06:18Z

📊 pm-bench-phases · `906381d` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	9.43s	0.28s	10.04s	10.17s	669M	322.8K
utoo-npm	10.34s	0.21s	11.54s	13.34s	1.17G	154.9K
utoo	10.75s	1.49s	12.58s	13.03s	1.42G	145.8K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	16.1K	18.6K	1.16G	7M	1.83G	1.72G	1M
utoo-npm	176.0K	165.0K	1.14G	5M	1.68G	1.68G	2M
utoo	156.7K	122.4K	1.15G	4M	1.68G	1.68G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.21s	0.04s	3.88s	1.06s	480M	174.7K
utoo-npm	5.31s	0.05s	5.91s	1.04s	433M	76.9K
utoo	4.59s	1.66s	6.86s	1.61s	551M	67.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	10.2K	3.9K	200M	3M	104M	-	1M
utoo-npm	65.8K	2.9K	204M	2M	9M	5M	2M
utoo	66.7K	14.5K	211M	2M	9M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	7.51s	1.52s	6.12s	9.98s	598M	203.7K
utoo-npm	8.19s	1.54s	5.47s	11.51s	920M	111.1K
utoo	6.17s	0.24s	5.37s	10.89s	817M	111.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	7.2K	7.8K	993M	4M	1.73G	1.73G	1M
utoo-npm	124.6K	81.2K	965M	3M	1.67G	1.67G	2M
utoo	101.0K	68.1K	964M	2M	1.67G	1.67G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.34s	0.12s	0.20s	2.33s	136M	32.4K
utoo-npm	2.65s	0.41s	0.59s	3.85s	82M	19.0K
utoo	2.39s	0.04s	0.58s	3.84s	81M	19.0K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	521	24	7M	31K	1.88G	1.72G	1M
utoo-npm	49.2K	21.8K	19K	46K	1.67G	1.67G	2M
utoo	46.6K	19.4K	14K	3K	1.68G	1.67G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	24.21s	4.20s	9.25s	9.81s	545M	379.5K
utoo-npm	25.56s	4.24s	8.36s	13.78s	771M	113.8K
utoo	26.26s	4.52s	8.13s	13.86s	711M	103.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	69.3K	6.0K	1.12G	11M	1.84G	1.72G	2M
utoo-npm	238.9K	112.8K	992M	8M	1.67G	1.67G	2M
utoo	233.8K	96.3K	988M	8M	1.67G	1.68G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.57s	0.20s	3.94s	1.07s	643M	189.5K
utoo-npm	3.32s	0.12s	1.97s	0.61s	76M	15.7K
utoo	3.37s	0.10s	1.94s	0.70s	76M	16.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	4.9K	5.9K	151M	2M	106M	-	2M
utoo-npm	43.8K	1.2K	14M	2M	-	4M	2M
utoo	40.0K	2.6K	14M	2M	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	22.26s	13.17s	5.94s	9.43s	240M	92.9K
utoo-npm	22.22s	2.01s	6.12s	12.60s	566M	103.4K
utoo	21.98s	1.53s	6.13s	12.59s	577M	98.0K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	46.8K	4.9K	999M	8M	1.73G	1.73G	2M
utoo-npm	182.7K	98.4K	984M	6M	1.67G	1.67G	2M
utoo	186.7K	98.1K	978M	6M	1.67G	1.67G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.08s	0.20s	0.21s	2.30s	135M	32.1K
utoo-npm	2.16s	0.09s	0.59s	3.79s	82M	19.1K
utoo	2.27s	0.11s	0.58s	3.84s	82M	18.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	700	31	7M	51K	1.88G	1.72G	2M
utoo-npm	48.2K	21.3K	56K	36K	1.67G	1.67G	2M
utoo	49.2K	20.3K	55K	13K	1.67G	1.67G	2M

github-actions · 2026-04-27T10:40:16Z

📊 pm-bench-phases · `906381d` · mac (`macos-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	21.95s	0.35s	6.90s	22.76s	777M	50.2K
utoo-npm	23.47s	0.59s	11.17s	27.31s	1.01G	102.2K
utoo	23.53s	2.57s	11.91s	28.32s	1.18G	106.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	16.3K	137.0K	-	-	1.82G	1.94G	1M
utoo-npm	13.4K	367.8K	-	-	1.63G	1.83G	2M
utoo	8.9K	345.5K	-	-	1.63G	1.87G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.55s	0.09s	3.18s	1.56s	495M	32.1K
utoo-npm	6.77s	0.06s	5.40s	4.02s	540M	36.7K
utoo	4.40s	0.12s	5.57s	3.15s	601M	40.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	37	18.9K	-	-	111M	-	1M
utoo-npm	58	72.8K	-	-	28M	5M	2M
utoo	14	71.9K	-	-	28M	5M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	22.59s	3.85s	4.09s	23.47s	538M	35.0K
utoo-npm	20.51s	2.26s	5.09s	24.19s	857M	79.5K
utoo	17.87s	0.74s	4.92s	23.92s	741M	72.8K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	6.0K	135.0K	-	-	1.70G	1.94G	1M
utoo-npm	1.9K	242.9K	-	-	1.61G	1.83G	2M
utoo	1.3K	231.5K	-	-	1.61G	1.83G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	8.95s	0.39s	0.17s	3.51s	51M	3.9K
utoo-npm	6.81s	0.40s	0.88s	4.95s	87M	6.5K
utoo	6.96s	0.50s	0.79s	5.06s	92M	6.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	17.6K	1.2K	-	-	1.86G	1.91G	1M
utoo-npm	12.6K	81.1K	-	-	1.61G	1.83G	2M
utoo	13.4K	81.5K	-	-	1.63G	1.83G	2M

npmmirror.com

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	127.93s	115.64s	7.35s	19.55s	558M	36.1K
utoo-npm	39.29s	8.02s	6.62s	18.01s	780M	77.1K
utoo	44.33s	7.65s	8.31s	23.56s	761M	77.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	14.8K	166.1K	-	-	1.79G	1.90G	2M
utoo-npm	4.3K	460.3K	-	-	1.61G	1.84G	2M
utoo	4.1K	438.6K	-	-	1.61G	1.87G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	5.90s	5.16s	2.52s	1.39s	545M	35.5K
utoo-npm	5.44s	0.26s	1.35s	0.72s	78M	5.7K
utoo	6.12s	1.47s	1.68s	0.92s	78M	5.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	28	27.2K	-	-	111M	-	2M
utoo-npm	5	44.0K	-	-	-	4M	2M
utoo	17	37.1K	-	-	-	4M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	23.94s	1.81s	4.16s	19.44s	309M	20.3K
utoo-npm	58.43s	30.68s	6.90s	24.48s	751M	76.9K
utoo	90.61s	39.32s	7.01s	23.66s	631M	74.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	1.9K	162.5K	-	-	1.65G	1.92G	2M
utoo-npm	1.7K	368.6K	-	-	1.61G	1.87G	2M
utoo	1.7K	404.1K	-	-	1.61G	1.87G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	4.05s	0.89s	0.08s	1.82s	44M	3.4K
utoo-npm	3.40s	0.57s	0.49s	2.70s	97M	7.1K
utoo	3.71s	0.15s	0.45s	2.77s	97M	7.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	13.2K	955	-	-	1.78G	1.90G	2M
utoo-npm	12.1K	72.2K	-	-	1.61G	1.82G	2M
utoo	12.2K	71.4K	-	-	1.61G	1.82G	2M

elrrrrrrr added the benchmark Run pm-bench on PR label Apr 27, 2026

chore(pm): tombi format Cargo.toml

e5c6055

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

elrrrrrrr mentioned this pull request Apr 27, 2026

perf(pm): #2818 rebase reproduce — bench probe #2834

Closed

This was referenced Apr 27, 2026

perf(pm): probe — rustls aws-lc-rs alone #2835

Closed

perf(pm): probe — #2818 minus worker-pool #2836

Closed

perf(pm): probe — #2836 minus OnceMap #2837

Closed

perf(pm): minimum-viable preload bundle (worker-pool + OnceMap + aws-lc-rs) #2838

Closed

elrrrrrrr closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): preload multi-thread worker pool (tokio::spawn)#2832

perf(pm): preload multi-thread worker pool (tokio::spawn)#2832
elrrrrrrr wants to merge 2 commits intonextfrom
perf/pm-mt-worker-pool

elrrrrrrr commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented Apr 27, 2026

Summary

Iteration history

Why this should work where #2830 didn't

Risk: PR #2827 silent crash

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · 906381d · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

github-actions Bot commented Apr 27, 2026

📊 pm-bench-phases · 906381d · mac (macos-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

npmmirror.com

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 pm-bench-phases · `906381d` · linux (`ubuntu-latest`)

📊 pm-bench-phases · `906381d` · mac (`macos-latest`)