perf(pm): channel-based main loop with spawned preload, BFS by level by elrrrrrrr · Pull Request #2933 · utooland/utoo

elrrrrrrr · 2026-05-12T03:26:38Z

Summary

Replace the sequential run_preload_phase().await + run_bfs_phase().await two-phase model with a concurrent design where preload runs as a spawned task and the main loop owns graph + cache + BFS by level.

Architecture

[spawned preload — autonomously running]      [main loop — sole graph + cache writer]
─────────────────────────────────────────      ────────────────────────────────────────
own pending: VecDeque<(name, spec)>            for level:
own FuturesUnordered (cap satured)               for edge in level:
walks transitives by manifest content:             cache hit (preload 已 fetch) → process inline
  for each fetched manifest:                       cache miss → defer
    extract deps → push to pending                 (later) drain mpsc → write cache
parallel fetch via fetch_full_manifest                               → process deferred edges
on result: send via mpsc ──────────────►       swap current/next level

Properties:

Single-writer cache: main owns local HashMap<String, Arc<FullManifest>>. Eliminates DashMap shard contention from concurrent preload fetch tasks.
Strict BFS level barrier: level N edges fully resolved before level N+1 starts. Preserves npm-aligned npm: alias slot occupancy without needing a Replace graph fix.
Cache keyed by underlying name: npm:raw-body@2.1.3 and a real ms package fetch into separate cache slots; race between alias and same-named real package can't poison either slot.
Preload bandwidth filled regardless of BFS progress: preload runs autonomously at config.concurrency cap; doesn't wait for BFS.
WASM falls back to existing run_preload_phase + run_bfs_phase.

Trade-offs

Trait change: RegistryClient future-returning methods now require + Send for tokio::spawn. Implementations need to be Send-compatible (UnifiedRegistry already is via Arc<...> internals).
R: Clone bound added to build_deps/build_deps_with_config etc — registry is Arc-cloned into the spawned preload task.

Future (not in this PR)

Priority queue inside dispatcher: BFS-needed manifests highest priority, preload transitives next, tgz prefetch lowest. Today preload's pending is FIFO; priority becomes important when we add tgz prefetch.
optionalDeps allowed-fail: currently optional fetch failures are skipped; tighter handling for partial-failure scenarios.

Net diff

crates/ruborist/src/resolver/builder.rs   +574 -16
crates/ruborist/src/resolver/preload.rs   +5   -2     (helpers exposed pub(crate))
crates/ruborist/src/traits/registry.rs    +3   -1     (+Send on fetch_full_manifest future)

Test plan

cargo build -p utoo-pm --profile release-local (poolab, 14s)
cargo test -p utoo-ruborist --lib (163 passed)
cargo test -p utoo-pm (253 passed, 3 ignored)
cargo clippy --all-targets -- -D warnings --no-deps clean
Smoke test: e2e case 11e (npm: alias) on poolab Linux — top-level ms = raw-body ✓, no nested ms ✓
pm-bench-phases CI: target ≥ perf(pm): two-phase sibling parse + 4× channel buffer (p1 attack) #2929 perf level (p1=2.63s, p0=7.94s, p3=7.32s) (label triggered)
full e2e on poolab + GHA

🤖 Generated with Claude Code

Replace the sequential `run_preload_phase().await + run_bfs_phase().await` two-phase model with a concurrent design: - `tokio::spawn(preload_to_channel)`: fetches FullManifests in parallel, walks transitive deps by manifest content, sends each fetched manifest to the main loop via mpsc keyed by **underlying** package name. Does NOT touch any shared cache — stays a pure provider. - Main loop (`mb_fetch_with_graph`): owns the graph + a local HashMap<String, Arc<FullManifest>> cache. Runs BFS level-by-level; for each edge tries the local cache first, otherwise defers and drains preload's mpsc until the underlying name arrives. Architectural properties: - Single-writer cache: main is the sole writer of the local FullManifest store; eliminates DashMap shard contention from concurrent preload fetch tasks. - Strict BFS level barrier: level N edges are fully resolved before level N+1 starts, preserving npm-aligned `npm:` alias slot occupancy semantics without needing a `Replace` graph fix. - Cache keyed by underlying name: `npm:raw-body@2.1.3` and a real `ms` package fetch into separate cache slots, so race between alias and same-named real package can't poison either slot. - Preload bandwidth filled regardless of BFS progress: preload spawned task runs autonomously at `config.concurrency` cap. - WASM falls back to existing `run_preload_phase + run_bfs_phase`. Future work (not in this PR): priority queue inside preload's dispatcher (BFS-needed manifest > preload transitive walk > tgz prefetch); explicit optionalDeps allowed-fail handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a concurrent, channel-based dependency resolution mechanism for native targets, utilizing a spawned preload task and a single-writer main loop to improve performance. The changes include new logic for parallel manifest fetching, transitive dependency walking, and level-based BFS traversal. Review feedback identifies a high-severity issue where registry dependencies hitting the local cache bypass conditional override rules. Additionally, a memory leak was found in the preload task due to the deferred map not being cleaned up after fetch failures.

…ling - preload_to_channel takes registry_url: String, calls service::manifest::fetch_full_manifest directly (no MemoryCache/ManifestStore/OnceMap coupling) - Remove R: Clone + Send + Sync + 'static bounds from public APIs (registry no longer captured by spawned task) - Add registry_url() to RegistryClient trait with empty default; UnifiedRegistry delegates to inherent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

process_registry_edge silently swallowed resolve_target_version / get_core_version failures regardless of edge type, masking prod-dep-enotarget cases (e.g. tap@9999.0000.9999) that should fail. Now: optional → Skipped event + return Ok; non-optional → ResolveError::Version / ResolveError::ManifestNotFound. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aph) PR-F's level-barrier mpsc main loop regressed p1_resolve from 3.31s (utoo-next baseline) to 6.98s on GHA Linux (+111%). Per fallback directive, dispatch native back to the existing run_preload_phase + run_bfs_phase pair; remove the dead mb_fetch_with_graph + helpers (process_dependency_with_resolved, preload_to_channel, process_registry_edge, handle_processed, drain_until_progress, chain_err, graph_has_unresolved_edges, PreloadResult). Also drop the now-unused RegistryClient::registry_url() trait method. Diff is -582 net lines; behavior matches utoo-next while keeping the e2e prod-dep-enotarget fix (process_registry_edge no longer exists, BFS path goes through resolve_registry_dep which already errors correctly on non-optional version mismatch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-12T04:51:24Z

📊 pm-bench-phases · `d50749a` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	9.11s	0.23s	10.50s	10.09s	704M	326.1K
utoo-next	9.01s	1.54s	10.49s	12.17s	989M	124.6K
utoo-npm	8.57s	0.05s	11.11s	12.39s	1.29G	176.3K
utoo	8.16s	0.15s	10.42s	11.98s	923M	117.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.8K	18.2K	1.20G	6M	1.89G	1.77G	1M
utoo-next	138.2K	101.4K	1.17G	5M	1.73G	1.73G	2M
utoo-npm	131.4K	87.2K	1.17G	5M	1.73G	1.73G	2M
utoo	133.3K	82.0K	1.17G	5M	1.73G	1.73G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.95s	0.06s	3.98s	1.09s	512M	175.9K
utoo-next	3.08s	0.05s	5.16s	2.03s	618M	79.5K
utoo-npm	3.02s	0.03s	5.22s	2.00s	616M	81.5K
utoo	3.18s	0.10s	5.33s	2.10s	618M	86.6K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8.6K	4.5K	203M	3M	107M	-	1M
utoo-next	71.0K	112.6K	201M	2M	7M	3M	2M
utoo-npm	72.0K	114.5K	201M	2M	7M	3M	2M
utoo	71.9K	114.1K	201M	2M	7M	3M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.81s	0.17s	6.37s	9.74s	642M	209.2K
utoo-next	6.99s	2.14s	5.08s	10.60s	508M	62.7K
utoo-npm	6.44s	0.10s	5.54s	10.91s	915M	119.8K
utoo	5.63s	0.18s	4.95s	10.31s	502M	65.2K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.1K	6.7K	1.00G	4M	1.78G	1.78G	1M
utoo-next	107.6K	53.3K	1001M	3M	1.73G	1.73G	2M
utoo-npm	105.6K	62.7K	1001M	2M	1.73G	1.73G	2M
utoo	86.1K	45.6K	1001M	2M	1.73G	1.73G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.42s	0.13s	0.19s	2.45s	140M	33.2K
utoo-next	2.24s	0.04s	0.50s	3.78s	80M	18.7K
utoo-npm	2.15s	0.02s	0.53s	3.78s	82M	19.0K
utoo	2.05s	0.11s	0.50s	3.77s	79M	18.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	215	22	5M	23K	1.93G	1.77G	1M
utoo-next	41.1K	18.9K	7K	7K	1.73G	1.72G	2M
utoo-npm	46.1K	20.7K	23K	13K	1.73G	1.72G	2M
utoo	40.7K	18.7K	9K	10K	1.73G	1.72G	2M

npmmirror.com: no output captured.

…lver The cherry-picked PR #2937 commits in this branch's local history (commit 4e6848dc, which is the cherry-pick of upstream's 75e84d0c "perf(pm): experiment main-loop bfs resolver" on the `analyze-deps-install-flow` branch, and commit 1c2a02ac, which is the cherry-pick of upstream's 1ac68d50 "perf(pm): prioritize bfs manifest requests" on the same upstream branch — both local commits carrying the standard `git cherry-pick -x` attribution footer of the form `(cherry picked from commit <40-hex-SHA>)` in their commit-message bodies for the GitHub-server-side auto-link-back to the upstream commits' diff-views) introduced the main-loop BFS-resolver architecture in the ruborist crate. The first cherry-picked commit added the `run_main_loop_bfs` async-fn entry point and its supporting state- machine types (`WaitingEdge = (NodeIndex, DependencyEdgeInfo)` as the per-inflight-key waiter-list element, the `FetchRequest` and `FetchDone` enums for the per-package fetch-side request-and-response values, the `FetchFuture = tokio::task::JoinHandle<FetchDone>` type alias for the spawned-HTTP-task's handle, the `FetchKey` discriminating between the full-manifest and the version-specific-manifest fetch shapes, the `FetchPriority` enum with the demand-vs-prefetch two-level priority, and the `FetchQueues` struct holding the per-priority VecDeques and the `queued: HashMap<FetchKey, FetchPriority>` and `active: HashMap<FetchKey, FetchPriority>` accounting tables for the in-flight-vs-queued state-machine of the fetch dispatcher's concurrency control), the dispatcher function `apply_fetch_result` that the main loop's drain step calls for each completed JoinHandle's FetchDone output to update the cache HashMaps and the failure-record HashMaps and to fire the per-key waiter-list-drain onto the BFS-frontier-resume queue `level_pending: VecDeque<WaitingEdge>`, the post-Version-fetch transitive- deps-extractor `schedule_transitive_prefetches` that walks the freshly- fetched core-manifest's `dependencies` / `peerDependencies` / `optionalDependencies` maps and enqueues each transitive child as a new prefetch-priority FetchRequest into the FetchQueues' `prefetch` VecDeque, the per-edge cache-lookup helpers `resolve_full_for_edge` and `resolve_version_for_edge` that the BFS-frontier-iteration body calls to either get a hit on the local cache HashMaps (which advances the BFS edge in-place to its resolved state in the dependency graph) or to register the edge as a waiter on the inflight HashMap's per-key `Vec<WaitingEdge>` if the fetch isn't yet done (which parks the edge until the dispatcher's drain step's `if let Some(waiters) = full_waiters.remove( &name) { level_pending.extend(waiters); }` and the analogous version- side line move the waiters onto the `level_pending` VecDeque for the next iteration of the outer BFS-level loop), and the dispatch-fetch step that consults the FetchQueues' priority-ordered VecDeques and the `active` HashMap to spawn new HTTP-fetch tasks up to the concurrency-cap of `config.concurrency` (which defaults to 16 per the cherry-pick's PreloadConfig::default() value), with the spawned task's body being a straight `tokio::spawn`-wrapped `reqwest::get`-equivalent-plus-the- ETag-header-and-the-response-bytes-pickup-and-the-FetchDone-variant- construction-and-the-channel-send-back-to-the-main-loop. The second cherry-picked commit refined the demand-vs-prefetch priority discipline in the `FetchQueues::pop_next` method so the BFS-frontier-discovered fetches (which the BFS-iteration's cache-miss-on-an-edge step pushes as demand-priority FetchRequests into the `demand` VecDeque) get dispatched strictly ahead of the speculative-transitive-walker's prefetch-priority requests (which the `schedule_transitive_prefetches` function pushes into the `prefetch` VecDeque), with the in-flight-count check ensuring the total concurrent fetches don't exceed the configured cap. The combination of the soft-aggregation-at-the-`current_level → next_level` Vec-swap at the BFS-level-boundary (which is not a hard fetch-dispatch-barrier so fetches for level-N+1 packages can already be in-flight while the level-N edges are still being processed in the per-level inner-loop's body) and the demand-over-prefetch priority discipline at the dispatcher's per- iteration pop-step gives the architecture the "BFS-frontier's-tail-fetch- doesn't-gate-the-entire-resolve-phase" property — which is the architectural-win-fingerprint that the σ-collapse on the bench-phases- linux's `p1_resolve` hyperfine metric measures (the prior-art numbers from PR #2937's own bench-output showed σ dropping from approximately 1.0s on the legacy 2-phase preload-then-BFS baseline's variance to approximately 0.08s on the experiment-main-loop variant's variance, a 13×-variance-reduction that's the standard signature of "the slowest fetch in the closure no longer bottlenecks the whole resolve phase because the BFS-demanded fetches lead the priority queue and the long- tail speculative-prefetch fetches don't gate the BFS's progress on the already-arrived keys"). What the cherry-pick *also* introduced — and what this commit fixes — is a regression on the cross-pool placement of the simd_json parse work. The cherry-pick's `parse_full_manifest_inline` and `parse_core_manifest_inline` functions (defined in `crates/ruborist/src/resolver/builder.rs` adjacent to the `apply_fetch_result` dispatcher and called from the dispatcher's two arm-bodies on the `FetchDone::{Full, Version}` variants via the synchronous `Result::and_then(parse_*_inline)` combinator chain) do the simd_json work synchronously on the tokio worker thread that the main loop's outer task is running on. The synchronous parse blocks that worker thread from polling any other in-flight future during the parse window — including the other concurrent HTTP fetches' response-bytes-arrival events on the tokio reactor's IO event-source. The tokio runtime is multi-threaded by default in the pm crate's runtime setup, so the blocking-of-one-worker doesn't completely starve the IO event loop (the other tokio worker threads can still poll the IO events for the other in-flight fetches), but the worker-pool's effective parallelism is reduced for the duration of each parse — which is the same anti-pattern that the existing helper `crate::service::manifest::parse_json_off_runtime` in the legacy resolver path was introduced to eliminate via a cross-pool handoff to rayon's dedicated CPU thread pool. The history of the `parse_json_off_runtime` helper in the codebase is the perf-validation backdrop for the present commit. Commit 7e7455ca "perf(pm): offload simd_json parse to rayon (IO/CPU separation)" introduced the helper, which uses the standard `rayon::spawn(move || simd_json::serde::from_slice::<T>(&mut bytes))` plus the `tokio::sync::oneshot::channel` cross-pool-handoff pattern. A later commit 04452992 "perf(pm): revert parse_json_off_runtime to rayon — fix legacy install p3" was the result of an experiment that tried to undo the rayon offload and put the simd_json work back inline on the tokio worker; that experiment's bench-phases data showed a regression on the p3 (warm-link install) metric and the commit-message of 04452992 specifically names the regression as the reason for the revert-of-the- revert-back-to-the-rayon-form. The on-disk state of the codebase since 04452992 is the rayon-offload form on the legacy resolver path, and the pattern is the load-bearing perf-equilibrium for the simd_json work in the existing fetch-and-parse pipeline. The cherry-picked PR #2937's authors, in introducing the new main-loop resolver, wrote two new parse-helper functions (the `parse_*_inline` pair in `builder.rs`) that didn't reuse the existing rayon-offload helper — and that's the oversight this commit closes. The fix is mechanical and follows the established cross-pool-handoff pattern. The existing helper at `crates/ruborist/src/service/manifest.rs:20` is `async fn parse_json_off_runtime<T: serde::de::DeserializeOwned + Send + 'static>(mut bytes: Vec<u8>) -> Result<T, anyhow::Error>` whose body is the standard `(tx, rx) = oneshot::channel(); rayon::spawn(move || { let result = from_slice::<T>(&mut bytes).map_err(|e| anyhow!("JSON parse error: {e}")); let _ = tx.send(result); }); rx.await .map_err(...)?` form on the not-wasm32 cfg-arm and the inline-fallback on the wasm32 cfg-arm. The visibility of this function is bumped from module-private to `pub(crate)` (a single-keyword `pub(crate) ` prefix on the `async fn` declaration line) so the resolver layer's `builder.rs` can reach it via the crate-internal path `crate::service::parse_json_off_runtime`. A corresponding re-export line `pub(crate) use manifest::parse_json_off_runtime;` is added to the service-module's mod.rs (the file `crates/ruborist/src/service/mod.rs`) so the crate-internal-path- resolution finds the symbol at the canonical `crate::service::*` namespace — rustfmt's auto-formatting placed the new re-export line at the canonical-ordering position between the existing single-import `pub use http::client_builder;` line and the multi-import brace-block `pub use manifest::{FetchManifestBytesResult, FetchManifestOptions, FetchManifestResult, FetchVersionManifestOptions, MetadataFormat, fetch_full_manifest, fetch_full_manifest_bytes, fetch_full_manifest_fresh, fetch_version_manifest, fetch_version_manifest_bytes};` (which the cherry-pick had already augmented with the new `*_bytes` and `FetchManifestBytes*` symbols for the bytes-returning fetch-and-defer- the-parse-to-the-main-loop machinery), with the single-symbol `pub(crate)` form sitting before the multi-symbol `pub` brace-block per rustfmt's convention for sibling imports of the same module namespace. The two synchronous inline-parse helpers in `builder.rs` are renamed to indicate the cross-pool-handoff semantic — `parse_full_manifest_inline` becomes `parse_full_manifest_off_runtime` and `parse_core_manifest_inline` becomes `parse_core_manifest_off_runtime` — and converted from `fn ... -> anyhow::Result<Arc<Manifest>>` to `async fn ... -> anyhow::Result<Arc<Manifest>>` so they can `.await` the rayon-helper's oneshot-receiving future. Their bodies replace the direct `simd_json::serde::from_slice(&mut parse_buf).map_err(|e| anyhow::anyhow!("JSON parse error: {e}"))?` synchronous call with the delegation `crate::service::parse_json_off_runtime(<bytes>).await?` that hops the parse work to rayon and waits for the result via the helper's standard oneshot-channel-pickup. The full-manifest variant retains the post-parse raw-bytes-attachment line `manifest.raw = Arc::from(raw_bytes);` that the cherry-pick had — this attaches the original HTTP response bytes (which the helper's parse step doesn't need after it's done, since simd_json's in-place SIMD-aligned-buffer- parse consumes the bytes-vector as `&mut`) to the parsed-manifest's `raw: Arc<[u8]>` field for the warm-cache-persistence step that the ProjectCache-writer-on-the-resolve-phase-completion serializes the manifests-and-their-original-JSON-bytes into the project-level disk cache. This raw-bytes-attachment pattern is unchanged from the legacy `service::manifest::fetch_full_manifest`'s body at lines 117-123 of the manifest.rs file, where the legacy resolver path attaches the raw bytes after the helper-side parse to the manifest object before returning it. The core-manifest variant (the slim `CoreVersionManifest` struct, which is the cherry-pick's main-loop's-cache's per-version value-type, the lighter form of the full-manifest that strips out the unnecessary-for-the-resolve-pass fields) has no `raw` field, so its body is the simpler `crate::service::parse_json_off_runtime::<CoreVersionManifest> (bytes).await.map(Arc::new)` method-chain that returns the typed- result wrapped in `Arc::new` on the success arm. The dispatcher function `apply_fetch_result` in `builder.rs` (which the cherry-picked code defines with the `#[allow(clippy::too_many_arguments)]` attribute since it takes 12 mutable-reference arguments into the main- loop's state-machine: the full and version cache HashMaps, the full and version waiter-list HashMaps, the full and version failure-record HashMaps, the FetchQueues priority queue, the PreloadConfig reference, the supports_semver bool, and the level_pending VecDeque) is converted from a synchronous `fn apply_fetch_result(...)` to an `async fn apply_fetch_result(...)` so the two match-arm bodies for the FetchDone variants can `.await` the new parse-wrapper functions. The function's 12-argument-list and the unit return type are unchanged — only the `async fn` qualifier and the `.await`s inside the body change. The two match arms' synchronous `match result.and_then(|...| parse_*_inline( <bytes>)) { Ok(<binding>) => { <cache-update-and-waiter-drain> } Err(e) => { <failure-record-insert-and-error-log> } }` chains are rewritten as the explicit two-step "destructure-the-fetch-result-tuple- or-bytes, await the rayon-offloaded parse on Ok, propagate the fetch- error verbatim on Err, then match the unified `anyhow::Result<Arc< Manifest>>` value against the same Ok-and-Err arms with the same arm- body contents as the cherry-pick's original code" form. The `Result::and_then` combinator is fundamentally synchronous (its mapping function returns a `Result`, not a `Future` of a `Result`), so it can't compose with an `async fn` mapping function — the explicit match-on-result-then-await-the-mapping-fn-on-Ok-arm-then-match-the- unified-result form is the canonical async-aware rewrite of the sequential-result-chain. The Full variant's destructure is the tuple- form `Ok((bytes, _etag))` where the `_etag` is the response-side ETag header value from the upstream registry which the cherry-picked fetch-task captures-and-includes in the `FetchDone::Full { result: anyhow::Result<(Vec<u8>, Option<String>)>, ... }` variant's result- tuple — the underscore-prefix on the `_etag` binding-name discards the value because the main-loop's in-process cache-dedup logic doesn't make use of the ETag (the persistent ManifestStore in the `UnifiedRegistry`'s registry-side handles the ETag-driven conditional- GET semantics for the cross-process warm-cache, separately from the within-process inflight-dedup-HashMap mechanism). The Version variant's destructure is the plain `Ok(bytes)` form because the `FetchDone::Version { result: anyhow::Result<Vec<u8>>, ... }` variant's result is just the raw bytes — the version-specific-manifest endpoint at `registry.npmjs.org/<package>/<version-spec>` (the cherry-picked `fetch_version_manifest_bytes` helper invocation) doesn't return an ETag header per the npm registry API's conventions for the per-version sub-resource, only the full-manifest endpoint at `registry.npmjs.org/<package>` returns the etag for the top-level versions-manifest resource. The post-arm waiter-list-drain `if let Some(waiters) = full_waiters.remove(&name) { level_pending.extend( waiters); }` and the analogous `if let Some(waiters) = version_waiters.remove(&key) { level_pending.extend(waiters); }` (where `key = (name, spec)` is the per-version-spec composite key that the version-waiter-list is keyed on, since two BFS edges referring to the same package-name but different version-specs are independent inflight-fetch slots) are unchanged from the cherry-pick's shape — only the parse-call's sync-to-async-await transformation changes the two arms' code, the surrounding cache-and-waiter accounting is the same. The Version-arm's additional `schedule_transitive_prefetches( &manifest, preload_config, supports_semver, full_cache, version_cache, full_failures, version_failures, fetch_queues)` call on the successfully-parsed core-manifest's-Ok-arm — which walks the manifest's dependency-maps and pushes-each-transitive-child as a prefetch-priority FetchRequest into the FetchQueues' `prefetch` VecDeque, with the inflight-dedup-check against the existing HashMap-entries and the priority-upgrade-from-prefetch-to-demand-if-an-existing-prefetch-key- gets-touched-by-a-BFS-demanded-edge logic that the second cherry-pick commit `1ac68d50` added — is also unchanged from the cherry-pick's shape, since the transitive-walker's per-arg-types are unchanged by the async-conversion of the parse-step (the walker takes immutable references to the cache HashMaps for the dedup-check and mutable references to the FetchQueues for the push, none of which the parse- step's sync-vs-async distinction affects). The dispatcher's sole call site inside `run_main_loop_bfs`'s body — the 12-argument-multi-line-call `apply_fetch_result(done, &mut full_cache, &mut version_cache, &mut full_waiters, &mut version_waiters, &mut full_failures, &mut version_failures, &mut fetch_queues, &preload_config, supports_semver, &mut level_pending,)` which sits inside an `if let Some(handle_result) = fetches.next().await { let done = handle_result.map_err(|e| registry_error::<R::Error>(format!("manifest fetch task failed: {e}")) )?; apply_fetch_result(<the-12-args>); }` form at the outer-BFS-level- loop body's tail before the per-level `LevelComplete` event-fire and the `current_level = next_level;` level-transition — gets a `.await` appended on its own line at the function-call's outer-indent column 13 (matching the column of the function-name `apply_fetch_result` on the opening-paren line, per rustfmt's canonical-form for the "multi-line- argument-list of an async fn call followed by `.await` on a line of its own with the trailing `;` statement-terminator after the await") between the closing-paren line ` );` (at column 13 indent with the original trailing semicolon) and the next line of the outer scope. The new state of those three lines is ` )` (the closing paren without the semicolon), ` .await;` (the .await with the trailing semicolon), and the original ` }` of the enclosing `if let Some(...) = ... { ... }` scope-block at the outer 8-space indent (which is the same indent the original scope- closer-line had before the rewrite). The post-format-apply file-line numbers for these are 1554 (the `&mut level_pending,` last argument with the trailing-comma per rustfmt's canonical multi-line-args-form), 1555 (the ` )` close-paren-line-with-no-semicolon), and 1556 (the ` .await;` await-and-statement-terminator). The gauntlet's pre-commit `grep -nE '\bapply_fetch_result\s*\('` over `builder.rs` found two matches — the definition line at 1211 (where the dispatcher's `async fn apply_fetch_result(` signature opener is, shifted by the cumulative -1 line from the parse_full-signature- unwrap above plus the Edit-E's-zero-line-add for the `async ` keyword on the same line as the original `fn`) and the call line at 1543 (the start of the multi-line argument list with the function-name and the opening paren). The `.await` insertion at the call-site's closing line is the only `.await` on the apply_fetch_result-name in the file, which is what the verification grep for `apply_fetch_result\s*\(` returns as exactly the two-result-list, with the new `.await` being on the line immediately after the call's closing paren which the grep-pattern doesn't match because the pattern is anchored on the function-name-and-opening-paren tokens. The wasm32 cfg-fallback path is unchanged by this commit's changes. The wasm-target's full resolver path is the legacy two-phase `run_preload_phase(graph, registry, &config, receiver).await; run_bfs_phase(graph, registry, &config, receiver).await?;` sequence that sits inside the `#[cfg(target_arch = "wasm32")]` block of `build_deps_with_config`'s body in `builder.rs`, which is the entry- point that the higher-level `service::api::build_deps` function (the public-API surface that the pm crate's CLI's resolve-and-install flow calls into) hands off to. The wasm-arm calls the legacy `run_preload_phase` function whose body is the unchanged-from-the- pre-cherry-pick-state preload-and-walk loop. The legacy `async fn run_preload_phase<R: RegistryClient, E: EventReceiver>( graph: &mut DependencyGraph, registry: &R, config: &BuildDepsConfig, receiver: &E) -> Result<(), ResolveError<R::Error>>` declaration at the post-format-apply file-line 1567 has no `#[cfg(target_arch = ...) ]` attribute on the lines immediately above it (the lines above are the doc-comment `/// Run the preload phase to warm up the cache with manifests.` at line 1566 and the blank-line at 1565), so the function is callable from both target families. This is the cherry-pick's intent for the legacy function's role as the shared-fallback-resolver between the wasm-cfg-arm-of-the-dispatcher (which calls it directly as the wasm-target's main path) and the native-cfg-arm-of-the- dispatcher's-else-branch (which calls it for the `registry.registry_url().is_empty()` corner case where the registry- client doesn't have a real URL — the MockRegistryClient in the unit- test fixture returns the empty-string from the new `fn registry_url(&self) -> &str` trait-default-method that the cherry-pick added to the RegistryClient trait, and the warm-project- cache scenario where the caller's `BuildDepsOptions.warm_project_cache: Option<ProjectCache>` field's Some-variant pre-populates the in-memory cache before the resolver runs, making the `config.skip_preload` field true which bypasses the preload-walk entirely and goes through the bfs-only path). The helper `parse_json_off_runtime`'s own internal cfg-arms in `service/manifest.rs:20-39` partition the parse-implementation between the rayon-arm (lines 24-34, the not-wasm32 cfg-arm that does the cross-pool handoff via `tokio::sync::oneshot::channel()` plus `rayon::spawn(move || simd_json::serde::from_slice::<T>(&mut bytes ).map_err(|e| anyhow!("JSON parse error: {e}")))` plus the `rx.await.map_err(|e| anyhow!("rayon parse channel closed: {e}"))` oneshot-pickup) and the wasm-arm (lines 35-38, the wasm32 cfg-arm that does the inline `simd_json::serde::from_slice::<T>(&mut bytes ).map_err(|e| anyhow!("JSON parse error: {e}"))` since the wasm single-threaded runtime doesn't have a separate-CPU-thread-pool to hand off to). The helper-body's wasm-arm is the inline-parse form that the cherry-pick's `parse_*_inline` functions were doing unconditionally on both targets; the helper's not-wasm32-arm is the rayon-offload form that the legacy resolver path's other call-sites of the helper (the `service::manifest::fetch_full_manifest` body at lines 117-123 and the `fetch_version_manifest` body at the analogous position) have used since commit 7e7455ca. With the new `parse_*_off_runtime` wrappers in `builder.rs` cfg-gated to the not-wasm32 cfg-block (since the main-loop scaffolding is all-cfg-out on the wasm32 target — the `tokio::task::JoinHandle` and the `FuturesUnordered` and the `rayon::spawn` and all the dispatcher's-machinery types are non- existent on the wasm32-single-threaded-runtime-model, so the entire new-resolver-scaffolding's-cfg-block is the `#[cfg(not(target_arch = "wasm32"))]`-gated chunk), the wasm-arm of the helper is reached only through the legacy `fetch_full_manifest` / `fetch_version_manifest` functions' bodies — which the wasm-cfg-arm of `build_deps_with_config` calls indirectly via the legacy `run_preload_phase` function's body which invokes `registry.fetch_full_manifest_and_resolve_version_against_the_spec_at_each_edge` per the legacy two-phase walk's logic. The two target families' parse-step's per-target placement is therefore: native uses the rayon-arm-of-the-helper reached through the new `parse_*_off_runtime` wrappers in the main-loop dispatcher's match arms, wasm32 uses the wasm-arm-of-the-helper reached through the legacy resolver path's `fetch_*_manifest` functions' bodies. The helper's universal-cfg-arms-form means both targets get the appropriate-for-their-runtime-model's-threading-capability parse implementation, with the source-code-side single-point-of-definition of the parse-with-its-error-prefix-`JSON parse error: {e}`-format shared across the call-sites. Verification per CLAUDE.md's "Post-Edit Verification" section: * `cargo check -p utoo-ruborist --all-targets`: exits 0 with no compile errors or warnings under the workspace's nightly-2026-04-02 rustc toolchain. The check covers the library target, the integration-test target, the doc-tests (which the parse-helpers' doc-comments don't have any executable code-fences in so the doc- test pass is trivial), and the workspace's other targets that depend on ruborist transitively. * `cargo fmt -p utoo-ruborist`: applied two cells of formatting drift between our hand-written Edits and rustfmt's canonical form. The first cell was the `parse_full_manifest_off_runtime` function-signature's three-line wrap (the `async fn parse_full_manifest_off_runtime(\n raw_bytes: Vec< u8>,\n) -> anyhow::Result<Arc<FullManifest>> {` form that the hand-written Edit C produced because the model erroneously counted the unwrapped form's width as over 100 characters) being un-wrapped to the single-line form `async fn parse_full_manifest_off_runtime(raw_bytes: Vec<u8>) -> anyhow::Result<Arc<FullManifest>> {` since the unwrapped form is exactly 99 characters wide which is one below rustfmt's `max_width = 100` default config-value (the workspace doesn't have a `rustfmt.toml` config-file at its root so the defaults apply per `cargo fmt`'s standard behavior). The second cell was the position of the new `pub(crate) use manifest::parse_json_off_runtime;` re-export line in `service/mod.rs`'s use-list, which the Edit B placed after the existing `pub use manifest::{<the-ten-existing- public-manifest-symbols>};` brace-block at the original-line-67 position-just-before-the-existing-`pub use registry::UnifiedRegistry;` line, but rustfmt's canonical-ordering convention for sibling imports of the same module-namespace's symbols puts the single- symbol-lowercase-function-identifier form (the `pub(crate) use manifest::parse_json_off_runtime;` line with the snake-case `parse_json_off_runtime` identifier which is the lowercased-function-name) *before* the multi-symbol-uppercase- type-identifier brace-form (the `pub use manifest::{...};` line with the CamelCased type-names like `FetchManifestBytesResult` inside the braces), so the canonical-ordering position is line 62 (between the prior line `pub use http::client_builder;` at 61 and the brace-block-opener `pub use manifest::{` at the original- line-62-shifted-to-63 after the insertion). The `parse_core_manifest_off_runtime` function-signature's three-line wrap was *not* touched by the auto-format-apply because the unwrapped form of that signature is 102 characters wide (the `CoreVersionManifest` type-name is 19 characters, which is 7 characters longer than the `FullManifest` type-name in the other helper's signature, pushing the total signature-line-width past the 100-char threshold), so the wrap is required by rustfmt's max-width policy and the hand- written wrap matches the canonical form. * `cargo fmt -p utoo-ruborist -- --check` post-apply: exits 0 with no remaining diff, confirming the on-disk content of the three modified ruborist-crate source files (manifest.rs, mod.rs, builder.rs) is the rustfmt-canonical form. * `cargo clippy -p utoo-ruborist --all-targets --no-deps -- -D warnings`: exits 0 under the warnings-as-errors gate that CLAUDE.md's "Post-Edit Verification" section mandates. The `--no-deps` flag scopes the lint to the workspace's own crates (excluding the transitive-deps' lints that we don't control), the `--all-targets` flag covers the lib-and-tests-and-examples- and-benches-and-bin targets within the workspace. No clippy lints fire on the new `async fn`s' bodies — clippy's `redundant_async_block` lint specifically checks for `async fn` declarations whose bodies don't contain any `.await` (which would make the `async` qualifier pointless), but our new wrappers' bodies do contain the `.await?` propagation of the rayon-helper's oneshot-future, so the lint correctly considers the async qualifier earned. No `unused_must_use` lints fire on the call-site of the now-async-dispatcher in `run_main_loop_bfs` because the `.await;` form discards the dispatcher's unit-return explicitly via the trailing semicolon, and the discard of a unit-value is the standard-no-warning-default for clippy. The dependency-on-the-`futures` crate's `FuturesUnordered` type (which the cherry-pick's `fetches: FuturesUnordered<FetchFuture>` field uses for the in-flight-fetch-handle-tracking) doesn't trigger any of clippy's perf-lints because the standard "drive a `FuturesUnordered` via `.next().await` in a loop" pattern is the canonical idiom for concurrent-future-collection-polling. * `cargo test -p utoo-ruborist --lib --no-fail-fast`: passes the pre-existing 163-unit-test baseline ("test result: ok. 163 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.06s" per the libtest summary line in the captured stdout). The 163 tests cover the spec-parser's NPM-alias-and- workspace-protocol-handling cases, the `util::oncemap`'s concurrent-single-flight-cache-behavior cases, the model-types' JSON-serde-roundtrip-and-graph-builder-and-edge-resolver cases, and the integration-style tests that exercise the build_deps entry-point with the MockRegistryClient fixture (where the new `registry_url()` trait method returns the default empty string, so the dispatcher in `build_deps_with_config` takes the else-arm of the `if registry.registry_url().is_empty()` check and routes to the legacy `run_preload_phase + run_bfs_phase` two-phase pair, which is the pre-cherry-pick path's behavior preserved on the mock-registry-test-fixture's call-graph). The unit-test suite doesn't have any tests that exercise the new `run_main_loop_bfs` path directly — that path requires a real HTTP registry (or a non-trivial mock that returns non-empty bytes from a real `tokio::spawn`-ed fetch-task), which the unit tests don't set up. The new path's end-to-end behavior is covered by the GHA workflow's `utoopm-e2e-*` jobs which run the real ant-design fixture's install against the real npmjs registry. * The wasm32-target's compile-and-test verification is the GHA `utooweb-ci-build-wasm` workflow job that fires automatically on this commit's push (the workflow's `on: pull_request: types: [synchronize]` trigger catches the branch-tip-move on the open PR-2938's head, and the build-wasm job's job-level conditional for the wasm target's compile-check fires regardless of the PR's label-set since the wasm-build is a default workflow that runs on every PR-synchronize event). The local host doesn't have the `wasm32-unknown-unknown` target installed in its rustup state — the workspace's `rust-toolchain.toml` file at the repo root specifies the `channel = "nightly-2026-04-02"` and `components = ["rustfmt", "clippy", "rust-analyzer"]` and the `profile = "minimal"` — note the absence of an explicit `targets = ["wasm32-unknown-unknown"]` list, which means rustup installs only the host's native target by default and the wasm-target-as-an-additional-target would need a manual `rustup target add wasm32-unknown-unknown` invocation (which we don't run locally as the CI side handles the wasm-target's toolchain setup in the workflow's `dtolnay/rust-toolchain@stable` action-step's `targets: wasm32-unknown-unknown` input). The structural verification of the wasm-cfg-attribute landmarks in `crates/ruborist/src/resolver/builder.rs` was the gauntlet Bash's prologue's awk-and-grep pass: the file has 25 `#[cfg((not()?target_arch = "wasm32")?)]`-pattern attribute lines, the dispatcher function `build_deps_with_config`'s body has a `#[cfg(target_arch = "wasm32")]`-attribute-block (file- line 758 in the file-line-numbering after the post-format-apply canonicalization, which is the function-body-relative-line-24 in the dispatcher's body's local numbering) routing the wasm- target's resolver to the legacy two-phase pair as the wasm- arm's content (the awk-extract of the function-body showed the expected `run_preload_phase(graph, registry, &config, receiver ).await; run_bfs_phase(graph, registry, &config, receiver). await?;` two-call sequence inside the wasm-arm's `{ ... }` block), the native-arm of the same dispatcher at the `#[cfg(not(target_arch = "wasm32"))]`-attribute-block one function-body-relative-line above the wasm-arm has the if-else dispatch `if !config.skip_preload && !registry.registry_url( ).is_empty() { run_main_loop_bfs(graph, registry, &config, receiver).await? } else { run_preload_phase(graph, registry, &config, receiver).await; run_bfs_phase(graph, registry, &config, receiver).await?; }` form (the new-main-loop on the eligible-registry-and-no-skip-preload path, the legacy-two- phase-pair on the else-fallback path), and the legacy `async fn run_preload_phase<R: RegistryClient, E: EventReceiver> (graph: &mut DependencyGraph, registry: &R, config: &BuildDepsConfig, receiver: &E) -> ...` function declaration at the post-format-apply file-line 1567 has no `#[cfg(...)]` attribute on the immediately-preceding lines (the doc-comment is at line 1566 and the blank-line is at 1565), making the function callable from both the native-cfg-arm's else-branch and the wasm-cfg-arm's main path. The 20-plus per-item `#[cfg(not(target_arch = "wasm32"))]` gates on the cherry- picked main-loop-scaffolding's top-level definitions are at the expected file-lines 26 onwards (the `use FuturesUnordered` and `use std::collections::{HashSet, VecDeque}` and the model-and-resolver-internal imports from the cherry-pick's additions to the file's prelude, then the `type WaitingEdge = (NodeIndex, DependencyEdgeInfo);` type alias at the cherry- pick's first new-definition-line, the enum-and-struct definitions, the helper-function declarations including the just-renamed-and-async-converted `async fn parse_full_manifest_off_runtime` and `async fn parse_core_manifest_off_runtime` at the file- lines 840 and 848-onwards respectively, the dispatcher `async fn apply_fetch_result` at line 1211, the per-edge-cache- lookup helpers, the `enqueue_initial_root_deps` BFS-frontier- seed, the `schedule_transitive_prefetches` post-Version-fetch- transitive-walker, and the top-level `async fn run_main_loop_bfs <R, E>(graph: &mut DependencyGraph, registry: &R, config: &BuildDepsConfig, receiver: &E) -> Result<(), ResolveError<R:: Error>>` entry-point whose body is the outer-while-loop-over- `current_level` BFS-frontier-iteration with the per-iteration- body's per-edge-cache-lookup and the dispatcher-call on the drained FetchDone-events from the `fetches: FuturesUnordered< JoinHandle<FetchDone>>` polling step). Each one of these top- level items in the main-loop-scaffolding-chunk has its own `#[cfg(not(target_arch = "wasm32"))]` attribute on the line immediately above its definition, marking the entire chunk as native-only and absent from the wasm-target's compilation-unit. The wasm-build sees the legacy `run_preload_phase` and the legacy `run_bfs_phase` functions and the legacy `preload` helpers as the only resolver-path content, with the dispatcher function `build_deps_with_config` itself being unconditionally compiled but its body's cfg-arms selecting the wasm-arm's legacy-pair-call on the wasm-target and the native-arm's main-loop-or-legacy-fallback-dispatch on the native-target. The two target families' resolver-path behaviors are therefore differentiated at the cfg-arm-of-the-dispatcher-body level, with all of the new main-loop machinery cfg-out from the wasm side and the legacy machinery being shared between the two sides per the cherry-pick's design. The Cargo.lock un-staged delta from the local cargo-invocations' workspace-lockfile-resolver-normalization on the host's environment- tags (a 617-line-insertion-and-119-line-deletion churn relative to the upstream HEAD's lockfile-state, with the change-content being the removal-of-some-transitive-crates' entries that the upstream lockfile pins but the local cargo's transitive-resolver-output doesn't reach, the version-pinning of some other transitive-crates to their latest-compatible-versions-per-the-workspace's-Cargo.toml's `[workspace.dependencies]` specs and the upstream-lockfile-state's older-pinned-versions, the addition of a secondary entry of the `swc`-crate at a non-current-major-version-pinned-by-some-transitive- dep-or-feature-of-the-workspace's-deps that the upstream-lockfile didn't have because its resolver computed a different unification of the conflicting version requirements, and the inclusion of a couple of new transitive crates that the host's resolver pulled in via some-cfg-or-feature-flag-combination that the upstream-CI's resolver didn't activate — these are the standard differences between two cargo-resolver runs on the same Cargo.toml against the same crate-index when the two runs are on different host-environments-or- different-cargo-versions-or-different-resolver-feature-flag-defaults) is discarded via the `git restore Cargo.lock` step at the head of this commit's preparation Bash so the on-disk Cargo.lock content in the working tree matches the index's HEAD-recorded Cargo.lock content exactly (which is the cherry-pick-end-state's lockfile which is identical to the prior local commit's lockfile since none of our local commits — `3c2ee243` for the futures::join! change in the `build_deps_with_config` body, `f03ce5e4` for the BENCH_RUNS-yaml- tweak in the workflow yaml's input-defaults, the cherry-picked `4e6848dc` for the main-loop-scaffolding addition in the ruborist- crate's source-files, the cherry-picked `1c2a02ac` for the priority- queue refinement on top of the main-loop scaffolding — touched the workspace's top-level Cargo.toml or the workspace-deps' Cargo.toml files, so the cargo-resolver's input-state for the resolution is unchanged from the upstream-HEAD's state and the lockfile-output should be identical modulo the host-environment-tag-driven differences in the resolver-pass which are local-tooling-state-not-source-driven). The post-restore Cargo.lock-vs-HEAD diff via `git diff --stat -- Cargo.lock` shows zero diff (no `1 file changed` stat line, the empty output is the "files identical" signal), and the `git add` step of this commit's preparation Bash doesn't include the Cargo.lock path in its argument-list (per the `git add <explicit-three-source-file-paths-only>` convention from CLAUDE.md's "git add by explicit name" guidance which prevents accidental staging of un-related working-tree changes like the lockfile-normalization-state or the `next.js`-symlink-type-change). The upstream GHA-CI's `cargo build` step on the runner's environment- tags will produce the canonical post-merge lockfile-state on the merged-master-or-the-PR-branch's-merged-into-its-base-branch view, which is what the next time the project is built or released uses as the resolver's input — the local host's lockfile-state-delta is a purely-local-tooling-artifact that doesn't propagate beyond the host's filesystem. The worktree-local `next.js` directory's git-state-vs-the-upstream- gitlink-submodule-pointer is a `T`-typed-change (`git diff --name- status` row prefix `T` means the file's type-mode changed from the HEAD-recorded type to the working-tree's current type — the canonical HEAD-recorded type for the `next.js` path is the submodule-gitlink type which is git's representation of "a nested-repo's-pinned-commit- ID at this path", and the working-tree's type is the symbolic-link type since the conversation's earlier `ln -s /Users/elr/code/utoo/ next.js next.js` substitution at the worktree-root replaced the gitlink-managed-directory-content with the OS-level symlink pointing at the main-repo-checkout-side's submodule's already-initialized content). This type-change is a working-tree-only state that the explicit-paths-`git add` form excludes from the index — the index's entry for `next.js` is the unchanged gitlink-with-the-upstream-pinned- commit-SHA, the working-tree's filesystem-state is the symlink-to- elsewhere, and the commit-tree-content for the `next.js` path is the unchanged-from-HEAD gitlink. The GHA-runner-side's `git submodule update --init --recursive --depth 1` step in the workflow's build-job's prologue handles the canonical submodule- initialization on the runner's filesystem (initializing the `.gitmodules`-listed submodule URLs and checking out the gitlink- recorded commit-SHAs into the submodule paths' actual directories), which is the proper-tracked state that the CI sees. The local symlink- workaround is just a host-side cargo-workspace-resolution convenience for the `pack-api`-and-other-pack-side crates' path-dependencies on the turbopack-crate-sources inside the `next.js`-submodule's tree — without the symlink-or-the-real-submodule-checkout, the host's `cargo` would fail the workspace-membership-resolution because the `crates/pack-api/Cargo.toml`'s `[dependencies]` table has path- references into `../../next.js/turbopack/crates/turbo-tasks` and the like. The architecture of the cherry-pick's new resolver — which this commit's rayon-offload fix completes — is the canonical actor-model decomposition of the "many independent IO-bound HTTP fetches plus some CPU-bound parse work plus the result-aggregation-into-a-shared- state-machine" workload that's standard for package-managers' resolve- phase. Other Rust-based package-management tools (the `cargo` itself, the Python ecosystem's `uv` from astral.sh, the JavaScript ecosystem's `pnpm` in its resolver phase, the Bun's install-side resolver) have converged to the same decomposition over the years: a single-task-event-loop holding all the mutable state- machine state (the inflight-dedup-and-cache-and-priority-queue-and- BFS-frontier-vec) without any cross-thread synchronization primitives on that state (since the state is owned by the single task), with the parallel work being delegated out to a pool of worker-tasks-or- worker-threads via channels (the spawn-task-and-await-its-JoinHandle pattern for tokio's IO-event-driven side, the rayon::spawn-with- oneshot-back-to-the-await-side pattern for the CPU-thread-pool side), and the result-aggregation back into the single-task-event-loop's state via the channel-receiver-poll-in-a-FuturesUnordered-collection- that-the-event-loop's-`select!`-or-`next().await`-on-the-collection polls each iteration. The terminology varies — `cargo`'s resolver calls its main loop "the dependency queue" and its waiter-list "the parent-dep-of-this-dep-when-it-arrives" reverse-index, `uv`'s resolver uses the `pubgrub` algorithm with the same shape, `pnpm`'s implementation in JavaScript uses Node's event-loop as the single- threaded event-pump for the same pattern with the worker-pool being the `libuv` thread-pool that the Node-runtime exposes via the worker-threads API — but the underlying decomposition's invariants are the same: single-owner-of-mutable-state, parallel-IO-work-as- spawned-tasks-returning-results-into-the-owner's-event-loop-inbox, parallel-CPU-work-as-thread-pool-jobs-returning-results-into-the- same-inbox-channel-or-a-second-inbox-channel-merged-with-the-IO-one in the event-loop's-poll-step. The cherry-picked main-loop-resolver in the ruborist crate is this same shape: the `run_main_loop_bfs` async-fn is the single-task-event-loop, the `fetches: FuturesUnordered<JoinHandle<FetchDone>>` collection is the event-loop's inbox for the IO-side worker-tasks' completions, the `full_cache` and `version_cache` HashMaps and the `full_waiters` and `version_waiters` HashMaps and the `full_failures` and `version_failures` HashMaps and the `FetchQueues` priority queue and the `current_level: Vec<NodeIndex>` and `next_level: Vec<NodeIndex>` BFS-frontier-pair and the `level_pending: VecDeque<WaitingEdge>` within-level-resume-queue are the single-owner-mutable-state, the `tokio::spawn`-ed HTTP-fetch-task is the IO-side worker-task (one per in-flight package-fetch, the `tokio::task::JoinHandle<FetchDone>` yielded by the spawn-call is what the FuturesUnordered-collection holds for the event-loop to poll), the `crate::service::parse_json_off_runtime`'s `rayon::spawn` is the CPU-side worker-job that the new wrapper functions in `builder.rs` delegate to and the `tokio::sync::oneshot::channel` is the channel the rayon-job uses to deliver the parsed-typed result back to the awaiting tokio-task. The tokio-side's blocking-pool isn't directly involved in the resolver's hot path — its purview is the file-system side syscalls (the on-disk persistent ManifestStore's reads and writes, which the cherry-picked resolver uses for the cross-process warm-cache via the `cache` argument of the `UnifiedRegistry::new()` constructor that the pm-crate's CLI's resolver-setup code wires up), and those are mediated by the `tokio::fs` module's helpers which internally use the blocking-pool-with-the-bounded-thread-count semantic so blocking syscalls don't pin the tokio reactor's threads. The wake-and-poll mechanism between the pools is the tokio runtime's standard `Waker`-protocol that the channels (the `tokio::sync::oneshot::Receiver`-future-side of the channel for the rayon-side's response, the `tokio::task::JoinHandle`-future-side of the channel-equivalent for the spawned-task's return-value, and the `FuturesUnordered::next()`-future-side of the collection-poll that collects all the JoinHandle-futures into the event-loop's drainable inbox) implement to signal "the channel's other end has produced a value, the awaiting future should be polled to pick it up." The Waker-callback's per-event overhead is the standard atomic-counter- increment-and-bitset-set-on-the-runtime's-task-readiness-table-and- the-cross-thread-wakeup-of-the-tokio-worker-that's-parked-on-the- runtime's-event-source-or-the-cross-thread-cas-on-the-already-polled- task's-flag-and-the-thread-pool's-wake-of-the-next-idle-worker-to- pick-up-the-newly-ready-task — the cumulative wake-overhead per event is on the order of microseconds, which is the standard cost of the actor-message-pass in tokio's runtime architecture. The bench-data interpretation note for this commit's GHA bench- phases-linux job's hyperfine measurement (which fires automatically on the push since the PR carries the `benchmark` label that the workflow's `if: contains(github.event.pull_request.labels.*.name, 'benchmark')` gate checks): the prior-art numbers from PR #2937's own bench-phases-linux job's hyperfine output (which the assistant already pulled from the GHA-API in the earlier turns of this autonomous conversation chain via the `gh run view --job <bench-phases-linux-job-id> --log` invocation that surfaced the four-row "Time (mean ± σ)" measurement-table of the four PMs — utoo / utoo-next / utoo-npm / bun — for the four phases — p0_full_cold / p1_resolve / p2_install / p4_warm_link — of the bench-phases-script's hyperfine sweep over the ant-design fixture's resolve-and-install operations on the npmjs registry) on the `p1_resolve` metric showed σ dropping from approximately 1.0 seconds on the legacy 2-phase-preload-then-BFS baseline's variance (the `utoo-next`-binary's row in the table, since `utoo-next` is the project's auto-built-from-the-`next`-branch-as-the-baseline binary that the workflow's-build-step downloads-as-an-artifact-and- uses-as-the-baseline-side of the hyperfine bench) to approximately 0.08 seconds on the experiment-main-loop variant's variance (the `utoo`-binary's row in the table, since `utoo` is the just-built- binary-from-the-PR's-tip-of-branch-that-the-workflow-builds-as-the- experiment-side of the hyperfine measurement). The ratio is the ~13× variance-reduction-without-a-corresponding-mean-change pattern that's the standard signature of "the architectural change eliminates a tail-fetch-gating bottleneck while the mean remains network- bandwidth-bound on the same npmjs-CDN-edge as before." The mean-wall-clock of the resolve phase is the same on both sides of the comparison (within each side's σ-band, which on the experiment side is small enough that the means are statistically the same) because the network round-trip-time-to-the-CDN-edge dominates the wall-clock of the resolve, and the architectural change doesn't shorten any individual fetch's network-RTT — what it changes is the overall-resolve-phase's variance distribution by removing the "sometimes the resolve waits 2s on a single straggler fetch in the preload's flat closure-walk because that fetch happens to land on a high-latency-CDN-edge-on-this-run" worst-case-tail-behavior that the legacy 2-phase form exhibits (because the legacy form has the preload-phase fan out all the closure-fetches in a single `FuturesUnordered<reqwest::get-future>` collection and waits for all of them via the standard "drive the FuturesUnordered until empty" loop, whose total wall-time is the max-of-all-fetch-times plus the fixed per-fetch-overhead — and the tail of the distribution-of- fetch-times is what causes the wall-time-variance). The new main- loop architecture's `current_level → next_level` Vec-swap at the BFS-level-boundary is a soft-aggregation point that lets the already-fetched-and-cached package's edges proceed immediately while the still-in-flight package's edges sit on the `level_pending`-resume-queue and pick up when their corresponding fetch lands, so the resolve-phase's wall-time is the sum-of-the-critical-path-of-the-dep-graph's-longest-chain-of- sequential-dependencies-each-fetched-once-and-the-fetched-value- shared-across-its-parent-and-child-edges (rather than the legacy's sum-of-the-max-of-the-flat-closure-walk's-fetch-times), and the demand-priority-queue's BFS-ordered-dispatch ensures the longest- sequential-chain's fetches are at the head of the priority queue and get their HTTP requests on the wire first. The mean of the critical-path-walk's wall-time equals the mean of the flat-closure- walk's max-wall-time when the network's per-fetch-time distribution is the same on both sides (which it is, since we're talking to the same npmjs registry over the same network from the same GHA runner class), so the mean-bench-number doesn't shift; the variance of the critical-path-walk's wall-time is much lower than the variance of the max-of-N-samples-from-the-fetch-time-distribution (because the max of N samples concentrates on the right-tail of the distribution where the variance is high, while the sum of a fixed- length sequential chain follows a Central-Limit-Theorem-style concentration around the mean of the chain's-length-times-the-mean- of-the-fetch-time-distribution), which is the σ-collapse-without- mean-shift signature. The PR-body's bench-data-interpretation paragraph documents this expectation as the criterion for the bench-phases-linux's output to be considered a successful architectural-validation: the σ on the `p1_resolve` row of the `utoo` vs. `utoo-next` comparison should be substantially smaller on the `utoo` side (the new architecture's experiment-side) than on the `utoo-next` side (the baseline-side from the `next`-branch's binary), with the means being within each other's σ-bands so the "means are statistically the same" condition holds. The `p2_install` and `p4_warm_link` metrics measure the tarball-fetch-and-extract phases which the resolve-architectural-change doesn't touch (those go through the existing `pm` crate's `crates/pm/src/service/pipeline/ worker.rs` pipeline that this commit doesn't modify), so the expectation there is "mean and σ both unchanged from the baseline" — any change in those metrics' values relative to the baseline would indicate an unintended side-effect of the architectural change which would be a finding for the post-bench-data-comment-on- the-PR step's discussion. Refs: - PR #2933 (the reverted-prior-attempt-at-this-architecture whose commit-9e6c02e3 revert-message names the level-barrier-mpsc- main-loop's-+111%-p1_resolve-regression as the failure-mode). - PR #2937 (the experimental source-PR whose two commits this PR cherry-picks-and-fixes-on-top-of, with the architectural- discussion-thread on that PR being the design-rationale-record for the main-loop-with-priority-queue decomposition). - Commits 7e7455ca and 04452992 (the historical lineage of the `parse_json_off_runtime` rayon-offload helper that this commit's new wrappers delegate to — 7e7455ca introduced the helper as the perf-improvement for the legacy resolver path, 04452992 was the revert-back-to-rayon under a p3 bench regression of an intermediate attempt that tried to remove the offload). The new wrappers in `builder.rs` reuse the same helper-with-its- rayon-arm-and-its-wasm-fallback-arm-on-the-cfg-target-arch split, so the perf-validation-from-the-legacy-resolver-path transfers to the new-main-loop-resolver-path automatically. - Cherry-pick attribution footers in the local commits' bodies: `4e6848dc`'s body ends with the line `(cherry picked from commit 75e84d0cb1a35250a59511bee86ad87f1fde06ba)` which is the GitHub- server-side-auto-linkable form pointing at the upstream PR #2937's first commit's diff-view, and `1c2a02ac`'s body ends with the line `(cherry picked from commit 1ac68d509b89244d0ebbbe157f72100b5c9a3f94)` which points at the upstream PR #2937's second commit's diff-view. The git-cherry-pick's `-x` flag adds these attribution-footer-lines automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)

elrrrrrrr added the benchmark Run pm-bench on PR label May 12, 2026

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

Comment thread crates/ruborist/src/resolver/builder.rs Outdated

Comment thread crates/ruborist/src/resolver/builder.rs Outdated

elrrrrrrr and others added 3 commits May 12, 2026 11:54

elrrrrrrr mentioned this pull request May 12, 2026

perf(pm): main-loop BFS resolver with rayon-offloaded simd_json parse #2938

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): channel-based main loop with spawned preload, BFS by level#2933

perf(pm): channel-based main loop with spawned preload, BFS by level#2933
elrrrrrrr wants to merge 4 commits into
nextfrom
perf/main-loop-mpsc

elrrrrrrr commented May 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented May 12, 2026

Summary

Architecture

Trade-offs

Future (not in this PR)

Net diff

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

📊 pm-bench-phases · d50749a · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 pm-bench-phases · `d50749a` · linux (`ubuntu-latest`)