Skip to content

perf(pm): channel-based main loop with spawned preload, BFS by level#2933

Open
elrrrrrrr wants to merge 4 commits into
nextfrom
perf/main-loop-mpsc
Open

perf(pm): channel-based main loop with spawned preload, BFS by level#2933
elrrrrrrr wants to merge 4 commits into
nextfrom
perf/main-loop-mpsc

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Summary

Replace the sequential run_preload_phase().await + run_bfs_phase().await two-phase model with a concurrent design where preload runs as a spawned task and the main loop owns graph + cache + BFS by level.

Architecture

[spawned preload — autonomously running]      [main loop — sole graph + cache writer]
─────────────────────────────────────────      ────────────────────────────────────────
own pending: VecDeque<(name, spec)>            for level:
own FuturesUnordered (cap satured)               for edge in level:
walks transitives by manifest content:             cache hit (preload 已 fetch) → process inline
  for each fetched manifest:                       cache miss → defer
    extract deps → push to pending                 (later) drain mpsc → write cache
parallel fetch via fetch_full_manifest                               → process deferred edges
on result: send via mpsc ──────────────►       swap current/next level

Properties:

  • Single-writer cache: main owns local HashMap<String, Arc<FullManifest>>. Eliminates DashMap shard contention from concurrent preload fetch tasks.
  • Strict BFS level barrier: level N edges fully resolved before level N+1 starts. Preserves npm-aligned npm: alias slot occupancy without needing a Replace graph fix.
  • Cache keyed by underlying name: npm:raw-body@2.1.3 and a real ms package fetch into separate cache slots; race between alias and same-named real package can't poison either slot.
  • Preload bandwidth filled regardless of BFS progress: preload runs autonomously at config.concurrency cap; doesn't wait for BFS.
  • WASM falls back to existing run_preload_phase + run_bfs_phase.

Trade-offs

  • Trait change: RegistryClient future-returning methods now require + Send for tokio::spawn. Implementations need to be Send-compatible (UnifiedRegistry already is via Arc<...> internals).
  • R: Clone bound added to build_deps/build_deps_with_config etc — registry is Arc-cloned into the spawned preload task.

Future (not in this PR)

  • Priority queue inside dispatcher: BFS-needed manifests highest priority, preload transitives next, tgz prefetch lowest. Today preload's pending is FIFO; priority becomes important when we add tgz prefetch.
  • optionalDeps allowed-fail: currently optional fetch failures are skipped; tighter handling for partial-failure scenarios.

Net diff

crates/ruborist/src/resolver/builder.rs   +574 -16
crates/ruborist/src/resolver/preload.rs   +5   -2     (helpers exposed pub(crate))
crates/ruborist/src/traits/registry.rs    +3   -1     (+Send on fetch_full_manifest future)

Test plan

  • cargo build -p utoo-pm --profile release-local (poolab, 14s)
  • cargo test -p utoo-ruborist --lib (163 passed)
  • cargo test -p utoo-pm (253 passed, 3 ignored)
  • cargo clippy --all-targets -- -D warnings --no-deps clean
  • Smoke test: e2e case 11e (npm: alias) on poolab Linux — top-level ms = raw-body ✓, no nested ms
  • pm-bench-phases CI: target ≥ perf(pm): two-phase sibling parse + 4× channel buffer (p1 attack) #2929 perf level (p1=2.63s, p0=7.94s, p3=7.32s) (label triggered)
  • full e2e on poolab + GHA

🤖 Generated with Claude Code

Replace the sequential `run_preload_phase().await + run_bfs_phase().await`
two-phase model with a concurrent design:

- `tokio::spawn(preload_to_channel)`: fetches FullManifests in parallel,
  walks transitive deps by manifest content, sends each fetched manifest
  to the main loop via mpsc keyed by **underlying** package name. Does
  NOT touch any shared cache — stays a pure provider.
- Main loop (`mb_fetch_with_graph`): owns the graph + a local
  HashMap<String, Arc<FullManifest>> cache. Runs BFS level-by-level;
  for each edge tries the local cache first, otherwise defers and
  drains preload's mpsc until the underlying name arrives.

Architectural properties:
- Single-writer cache: main is the sole writer of the local FullManifest
  store; eliminates DashMap shard contention from concurrent preload
  fetch tasks.
- Strict BFS level barrier: level N edges are fully resolved before
  level N+1 starts, preserving npm-aligned `npm:` alias slot occupancy
  semantics without needing a `Replace` graph fix.
- Cache keyed by underlying name: `npm:raw-body@2.1.3` and a real `ms`
  package fetch into separate cache slots, so race between alias and
  same-named real package can't poison either slot.
- Preload bandwidth filled regardless of BFS progress: preload spawned
  task runs autonomously at `config.concurrency` cap.
- WASM falls back to existing `run_preload_phase + run_bfs_phase`.

Future work (not in this PR): priority queue inside preload's
dispatcher (BFS-needed manifest > preload transitive walk > tgz
prefetch); explicit optionalDeps allowed-fail handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added the benchmark Run pm-bench on PR label May 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a concurrent, channel-based dependency resolution mechanism for native targets, utilizing a spawned preload task and a single-writer main loop to improve performance. The changes include new logic for parallel manifest fetching, transitive dependency walking, and level-based BFS traversal. Review feedback identifies a high-severity issue where registry dependencies hitting the local cache bypass conditional override rules. Additionally, a memory leak was found in the preload task due to the deferred map not being cleaned up after fetch failures.

Comment thread crates/ruborist/src/resolver/builder.rs Outdated
Comment thread crates/ruborist/src/resolver/builder.rs Outdated
elrrrrrrr and others added 3 commits May 12, 2026 11:54
…ling

- preload_to_channel takes registry_url: String, calls service::manifest::fetch_full_manifest directly (no MemoryCache/ManifestStore/OnceMap coupling)
- Remove R: Clone + Send + Sync + 'static bounds from public APIs (registry no longer captured by spawned task)
- Add registry_url() to RegistryClient trait with empty default; UnifiedRegistry delegates to inherent

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
process_registry_edge silently swallowed resolve_target_version /
get_core_version failures regardless of edge type, masking
prod-dep-enotarget cases (e.g. tap@9999.0000.9999) that should
fail. Now: optional → Skipped event + return Ok; non-optional →
ResolveError::Version / ResolveError::ManifestNotFound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aph)

PR-F's level-barrier mpsc main loop regressed p1_resolve from
3.31s (utoo-next baseline) to 6.98s on GHA Linux (+111%). Per
fallback directive, dispatch native back to the existing
run_preload_phase + run_bfs_phase pair; remove the dead
mb_fetch_with_graph + helpers (process_dependency_with_resolved,
preload_to_channel, process_registry_edge, handle_processed,
drain_until_progress, chain_err, graph_has_unresolved_edges,
PreloadResult). Also drop the now-unused RegistryClient::registry_url()
trait method.

Diff is -582 net lines; behavior matches utoo-next while keeping
the e2e prod-dep-enotarget fix (process_registry_edge no longer
exists, BFS path goes through resolve_registry_dep which already
errors correctly on non-optional version mismatch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · d50749a · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.11s 0.23s 10.50s 10.09s 704M 326.1K
utoo-next 9.01s 1.54s 10.49s 12.17s 989M 124.6K
utoo-npm 8.57s 0.05s 11.11s 12.39s 1.29G 176.3K
utoo 8.16s 0.15s 10.42s 11.98s 923M 117.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.8K 18.2K 1.20G 6M 1.89G 1.77G 1M
utoo-next 138.2K 101.4K 1.17G 5M 1.73G 1.73G 2M
utoo-npm 131.4K 87.2K 1.17G 5M 1.73G 1.73G 2M
utoo 133.3K 82.0K 1.17G 5M 1.73G 1.73G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.95s 0.06s 3.98s 1.09s 512M 175.9K
utoo-next 3.08s 0.05s 5.16s 2.03s 618M 79.5K
utoo-npm 3.02s 0.03s 5.22s 2.00s 616M 81.5K
utoo 3.18s 0.10s 5.33s 2.10s 618M 86.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.6K 4.5K 203M 3M 107M - 1M
utoo-next 71.0K 112.6K 201M 2M 7M 3M 2M
utoo-npm 72.0K 114.5K 201M 2M 7M 3M 2M
utoo 71.9K 114.1K 201M 2M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.81s 0.17s 6.37s 9.74s 642M 209.2K
utoo-next 6.99s 2.14s 5.08s 10.60s 508M 62.7K
utoo-npm 6.44s 0.10s 5.54s 10.91s 915M 119.8K
utoo 5.63s 0.18s 4.95s 10.31s 502M 65.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.1K 6.7K 1.00G 4M 1.78G 1.78G 1M
utoo-next 107.6K 53.3K 1001M 3M 1.73G 1.73G 2M
utoo-npm 105.6K 62.7K 1001M 2M 1.73G 1.73G 2M
utoo 86.1K 45.6K 1001M 2M 1.73G 1.73G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.42s 0.13s 0.19s 2.45s 140M 33.2K
utoo-next 2.24s 0.04s 0.50s 3.78s 80M 18.7K
utoo-npm 2.15s 0.02s 0.53s 3.78s 82M 19.0K
utoo 2.05s 0.11s 0.50s 3.77s 79M 18.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 215 22 5M 23K 1.93G 1.77G 1M
utoo-next 41.1K 18.9K 7K 7K 1.73G 1.72G 2M
utoo-npm 46.1K 20.7K 23K 13K 1.73G 1.72G 2M
utoo 40.7K 18.7K 9K 10K 1.73G 1.72G 2M

npmmirror.com: no output captured.

elrrrrrrr added a commit that referenced this pull request May 12, 2026
…lver

The cherry-picked PR #2937 commits in this branch's local history (commit
4e6848dc, which is the cherry-pick of upstream's 75e84d0c "perf(pm):
experiment main-loop bfs resolver" on the `analyze-deps-install-flow`
branch, and commit 1c2a02ac, which is the cherry-pick of upstream's
1ac68d50 "perf(pm): prioritize bfs manifest requests" on the same upstream
branch — both local commits carrying the standard `git cherry-pick -x`
attribution footer of the form `(cherry picked from commit <40-hex-SHA>)`
in their commit-message bodies for the GitHub-server-side auto-link-back
to the upstream commits' diff-views) introduced the main-loop BFS-resolver
architecture in the ruborist crate. The first cherry-picked commit added
the `run_main_loop_bfs` async-fn entry point and its supporting state-
machine types (`WaitingEdge = (NodeIndex, DependencyEdgeInfo)` as the
per-inflight-key waiter-list element, the `FetchRequest` and `FetchDone`
enums for the per-package fetch-side request-and-response values, the
`FetchFuture = tokio::task::JoinHandle<FetchDone>` type alias for the
spawned-HTTP-task's handle, the `FetchKey` discriminating between the
full-manifest and the version-specific-manifest fetch shapes, the `FetchPriority`
enum with the demand-vs-prefetch two-level priority, and the `FetchQueues`
struct holding the per-priority VecDeques and the `queued: HashMap<FetchKey,
FetchPriority>` and `active: HashMap<FetchKey, FetchPriority>` accounting
tables for the in-flight-vs-queued state-machine of the fetch dispatcher's
concurrency control), the dispatcher function `apply_fetch_result` that
the main loop's drain step calls for each completed JoinHandle's FetchDone
output to update the cache HashMaps and the failure-record HashMaps and to
fire the per-key waiter-list-drain onto the BFS-frontier-resume queue
`level_pending: VecDeque<WaitingEdge>`, the post-Version-fetch transitive-
deps-extractor `schedule_transitive_prefetches` that walks the freshly-
fetched core-manifest's `dependencies` / `peerDependencies` /
`optionalDependencies` maps and enqueues each transitive child as a new
prefetch-priority FetchRequest into the FetchQueues' `prefetch` VecDeque,
the per-edge cache-lookup helpers `resolve_full_for_edge` and
`resolve_version_for_edge` that the BFS-frontier-iteration body calls to
either get a hit on the local cache HashMaps (which advances the BFS
edge in-place to its resolved state in the dependency graph) or to
register the edge as a waiter on the inflight HashMap's per-key
`Vec<WaitingEdge>` if the fetch isn't yet done (which parks the edge until
the dispatcher's drain step's `if let Some(waiters) = full_waiters.remove(
&name) { level_pending.extend(waiters); }` and the analogous version-
side line move the waiters onto the `level_pending` VecDeque for the
next iteration of the outer BFS-level loop), and the dispatch-fetch step
that consults the FetchQueues' priority-ordered VecDeques and the
`active` HashMap to spawn new HTTP-fetch tasks up to the concurrency-cap
of `config.concurrency` (which defaults to 16 per the cherry-pick's
PreloadConfig::default() value), with the spawned task's body being a
straight `tokio::spawn`-wrapped `reqwest::get`-equivalent-plus-the-
ETag-header-and-the-response-bytes-pickup-and-the-FetchDone-variant-
construction-and-the-channel-send-back-to-the-main-loop. The second
cherry-picked commit refined the demand-vs-prefetch priority discipline
in the `FetchQueues::pop_next` method so the BFS-frontier-discovered
fetches (which the BFS-iteration's cache-miss-on-an-edge step pushes as
demand-priority FetchRequests into the `demand` VecDeque) get dispatched
strictly ahead of the speculative-transitive-walker's prefetch-priority
requests (which the `schedule_transitive_prefetches` function pushes into
the `prefetch` VecDeque), with the in-flight-count check ensuring the
total concurrent fetches don't exceed the configured cap. The combination
of the soft-aggregation-at-the-`current_level → next_level` Vec-swap at
the BFS-level-boundary (which is not a hard fetch-dispatch-barrier so
fetches for level-N+1 packages can already be in-flight while the level-N
edges are still being processed in the per-level inner-loop's body) and
the demand-over-prefetch priority discipline at the dispatcher's per-
iteration pop-step gives the architecture the "BFS-frontier's-tail-fetch-
doesn't-gate-the-entire-resolve-phase" property — which is the
architectural-win-fingerprint that the σ-collapse on the bench-phases-
linux's `p1_resolve` hyperfine metric measures (the prior-art numbers
from PR #2937's own bench-output showed σ dropping from approximately
1.0s on the legacy 2-phase preload-then-BFS baseline's variance to
approximately 0.08s on the experiment-main-loop variant's variance, a
13×-variance-reduction that's the standard signature of "the slowest
fetch in the closure no longer bottlenecks the whole resolve phase
because the BFS-demanded fetches lead the priority queue and the long-
tail speculative-prefetch fetches don't gate the BFS's progress on the
already-arrived keys").

What the cherry-pick *also* introduced — and what this commit fixes — is a
regression on the cross-pool placement of the simd_json parse work. The
cherry-pick's `parse_full_manifest_inline` and `parse_core_manifest_inline`
functions (defined in `crates/ruborist/src/resolver/builder.rs` adjacent
to the `apply_fetch_result` dispatcher and called from the dispatcher's
two arm-bodies on the `FetchDone::{Full, Version}` variants via the
synchronous `Result::and_then(parse_*_inline)` combinator chain) do the
simd_json work synchronously on the tokio worker thread that the main
loop's outer task is running on. The synchronous parse blocks that worker
thread from polling any other in-flight future during the parse window —
including the other concurrent HTTP fetches' response-bytes-arrival
events on the tokio reactor's IO event-source. The tokio runtime is
multi-threaded by default in the pm crate's runtime setup, so the
blocking-of-one-worker doesn't completely starve the IO event loop (the
other tokio worker threads can still poll the IO events for the other
in-flight fetches), but the worker-pool's effective parallelism is
reduced for the duration of each parse — which is the same anti-pattern
that the existing helper `crate::service::manifest::parse_json_off_runtime`
in the legacy resolver path was introduced to eliminate via a cross-pool
handoff to rayon's dedicated CPU thread pool.

The history of the `parse_json_off_runtime` helper in the codebase is
the perf-validation backdrop for the present commit. Commit 7e7455ca
"perf(pm): offload simd_json parse to rayon (IO/CPU separation)"
introduced the helper, which uses the standard `rayon::spawn(move ||
simd_json::serde::from_slice::<T>(&mut bytes))` plus the
`tokio::sync::oneshot::channel` cross-pool-handoff pattern. A later
commit 04452992 "perf(pm): revert parse_json_off_runtime to rayon — fix
legacy install p3" was the result of an experiment that tried to undo
the rayon offload and put the simd_json work back inline on the tokio
worker; that experiment's bench-phases data showed a regression on the
p3 (warm-link install) metric and the commit-message of 04452992
specifically names the regression as the reason for the revert-of-the-
revert-back-to-the-rayon-form. The on-disk state of the codebase since
04452992 is the rayon-offload form on the legacy resolver path, and
the pattern is the load-bearing perf-equilibrium for the simd_json
work in the existing fetch-and-parse pipeline. The cherry-picked PR
#2937's authors, in introducing the new main-loop resolver, wrote two
new parse-helper functions (the `parse_*_inline` pair in `builder.rs`)
that didn't reuse the existing rayon-offload helper — and that's the
oversight this commit closes.

The fix is mechanical and follows the established cross-pool-handoff
pattern. The existing helper at `crates/ruborist/src/service/manifest.rs:20`
is `async fn parse_json_off_runtime<T: serde::de::DeserializeOwned +
Send + 'static>(mut bytes: Vec<u8>) -> Result<T, anyhow::Error>` whose
body is the standard `(tx, rx) = oneshot::channel(); rayon::spawn(move
|| { let result = from_slice::<T>(&mut bytes).map_err(|e|
anyhow!("JSON parse error: {e}")); let _ = tx.send(result); }); rx.await
.map_err(...)?` form on the not-wasm32 cfg-arm and the inline-fallback
on the wasm32 cfg-arm. The visibility of this function is bumped from
module-private to `pub(crate)` (a single-keyword `pub(crate) ` prefix
on the `async fn` declaration line) so the resolver layer's
`builder.rs` can reach it via the crate-internal path
`crate::service::parse_json_off_runtime`. A corresponding re-export
line `pub(crate) use manifest::parse_json_off_runtime;` is added to
the service-module's mod.rs (the file
`crates/ruborist/src/service/mod.rs`) so the crate-internal-path-
resolution finds the symbol at the canonical `crate::service::*`
namespace — rustfmt's auto-formatting placed the new re-export line
at the canonical-ordering position between the existing single-import
`pub use http::client_builder;` line and the multi-import brace-block
`pub use manifest::{FetchManifestBytesResult, FetchManifestOptions,
FetchManifestResult, FetchVersionManifestOptions, MetadataFormat,
fetch_full_manifest, fetch_full_manifest_bytes, fetch_full_manifest_fresh,
fetch_version_manifest, fetch_version_manifest_bytes};` (which the
cherry-pick had already augmented with the new `*_bytes` and
`FetchManifestBytes*` symbols for the bytes-returning fetch-and-defer-
the-parse-to-the-main-loop machinery), with the single-symbol
`pub(crate)` form sitting before the multi-symbol `pub` brace-block
per rustfmt's convention for sibling imports of the same module
namespace.

The two synchronous inline-parse helpers in `builder.rs` are renamed
to indicate the cross-pool-handoff semantic — `parse_full_manifest_inline`
becomes `parse_full_manifest_off_runtime` and
`parse_core_manifest_inline` becomes `parse_core_manifest_off_runtime`
— and converted from `fn ... -> anyhow::Result<Arc<Manifest>>` to
`async fn ... -> anyhow::Result<Arc<Manifest>>` so they can `.await`
the rayon-helper's oneshot-receiving future. Their bodies replace the
direct `simd_json::serde::from_slice(&mut parse_buf).map_err(|e|
anyhow::anyhow!("JSON parse error: {e}"))?` synchronous call with the
delegation `crate::service::parse_json_off_runtime(<bytes>).await?`
that hops the parse work to rayon and waits for the result via the
helper's standard oneshot-channel-pickup. The full-manifest variant
retains the post-parse raw-bytes-attachment line `manifest.raw =
Arc::from(raw_bytes);` that the cherry-pick had — this attaches the
original HTTP response bytes (which the helper's parse step doesn't
need after it's done, since simd_json's in-place SIMD-aligned-buffer-
parse consumes the bytes-vector as `&mut`) to the parsed-manifest's
`raw: Arc<[u8]>` field for the warm-cache-persistence step that the
ProjectCache-writer-on-the-resolve-phase-completion serializes the
manifests-and-their-original-JSON-bytes into the project-level disk
cache. This raw-bytes-attachment pattern is unchanged from the legacy
`service::manifest::fetch_full_manifest`'s body at lines 117-123 of
the manifest.rs file, where the legacy resolver path attaches the raw
bytes after the helper-side parse to the manifest object before
returning it. The core-manifest variant (the slim `CoreVersionManifest`
struct, which is the cherry-pick's main-loop's-cache's per-version
value-type, the lighter form of the full-manifest that strips out the
unnecessary-for-the-resolve-pass fields) has no `raw` field, so its
body is the simpler `crate::service::parse_json_off_runtime::<CoreVersionManifest>
(bytes).await.map(Arc::new)` method-chain that returns the typed-
result wrapped in `Arc::new` on the success arm.

The dispatcher function `apply_fetch_result` in `builder.rs` (which
the cherry-picked code defines with the `#[allow(clippy::too_many_arguments)]`
attribute since it takes 12 mutable-reference arguments into the main-
loop's state-machine: the full and version cache HashMaps, the full
and version waiter-list HashMaps, the full and version failure-record
HashMaps, the FetchQueues priority queue, the PreloadConfig reference,
the supports_semver bool, and the level_pending VecDeque) is converted
from a synchronous `fn apply_fetch_result(...)` to an `async fn
apply_fetch_result(...)` so the two match-arm bodies for the FetchDone
variants can `.await` the new parse-wrapper functions. The function's
12-argument-list and the unit return type are unchanged — only the
`async fn` qualifier and the `.await`s inside the body change. The
two match arms' synchronous `match result.and_then(|...| parse_*_inline(
<bytes>)) { Ok(<binding>) => { <cache-update-and-waiter-drain> }
Err(e) => { <failure-record-insert-and-error-log> } }` chains are
rewritten as the explicit two-step "destructure-the-fetch-result-tuple-
or-bytes, await the rayon-offloaded parse on Ok, propagate the fetch-
error verbatim on Err, then match the unified `anyhow::Result<Arc<
Manifest>>` value against the same Ok-and-Err arms with the same arm-
body contents as the cherry-pick's original code" form. The
`Result::and_then` combinator is fundamentally synchronous (its
mapping function returns a `Result`, not a `Future` of a `Result`), so
it can't compose with an `async fn` mapping function — the explicit
match-on-result-then-await-the-mapping-fn-on-Ok-arm-then-match-the-
unified-result form is the canonical async-aware rewrite of the
sequential-result-chain. The Full variant's destructure is the tuple-
form `Ok((bytes, _etag))` where the `_etag` is the response-side ETag
header value from the upstream registry which the cherry-picked
fetch-task captures-and-includes in the `FetchDone::Full { result:
anyhow::Result<(Vec<u8>, Option<String>)>, ... }` variant's result-
tuple — the underscore-prefix on the `_etag` binding-name discards the
value because the main-loop's in-process cache-dedup logic doesn't
make use of the ETag (the persistent ManifestStore in the
`UnifiedRegistry`'s registry-side handles the ETag-driven conditional-
GET semantics for the cross-process warm-cache, separately from the
within-process inflight-dedup-HashMap mechanism). The Version variant's
destructure is the plain `Ok(bytes)` form because the `FetchDone::Version
{ result: anyhow::Result<Vec<u8>>, ... }` variant's result is just the
raw bytes — the version-specific-manifest endpoint at
`registry.npmjs.org/<package>/<version-spec>` (the cherry-picked
`fetch_version_manifest_bytes` helper invocation) doesn't return an
ETag header per the npm registry API's conventions for the per-version
sub-resource, only the full-manifest endpoint at
`registry.npmjs.org/<package>` returns the etag for the top-level
versions-manifest resource. The post-arm waiter-list-drain
`if let Some(waiters) = full_waiters.remove(&name) { level_pending.extend(
waiters); }` and the analogous `if let Some(waiters) =
version_waiters.remove(&key) { level_pending.extend(waiters); }`
(where `key = (name, spec)` is the per-version-spec composite key
that the version-waiter-list is keyed on, since two BFS edges referring
to the same package-name but different version-specs are independent
inflight-fetch slots) are unchanged from the cherry-pick's shape —
only the parse-call's sync-to-async-await transformation changes the
two arms' code, the surrounding cache-and-waiter accounting is the
same. The Version-arm's additional `schedule_transitive_prefetches(
&manifest, preload_config, supports_semver, full_cache, version_cache,
full_failures, version_failures, fetch_queues)` call on the
successfully-parsed core-manifest's-Ok-arm — which walks the manifest's
dependency-maps and pushes-each-transitive-child as a prefetch-priority
FetchRequest into the FetchQueues' `prefetch` VecDeque, with the
inflight-dedup-check against the existing HashMap-entries and the
priority-upgrade-from-prefetch-to-demand-if-an-existing-prefetch-key-
gets-touched-by-a-BFS-demanded-edge logic that the second cherry-pick
commit `1ac68d50` added — is also unchanged from the cherry-pick's
shape, since the transitive-walker's per-arg-types are unchanged by
the async-conversion of the parse-step (the walker takes immutable
references to the cache HashMaps for the dedup-check and mutable
references to the FetchQueues for the push, none of which the parse-
step's sync-vs-async distinction affects).

The dispatcher's sole call site inside `run_main_loop_bfs`'s body —
the 12-argument-multi-line-call `apply_fetch_result(done,
&mut full_cache, &mut version_cache, &mut full_waiters,
&mut version_waiters, &mut full_failures, &mut version_failures,
&mut fetch_queues, &preload_config, supports_semver,
&mut level_pending,)` which sits inside an `if let Some(handle_result)
= fetches.next().await { let done = handle_result.map_err(|e|
registry_error::<R::Error>(format!("manifest fetch task failed: {e}"))
)?; apply_fetch_result(<the-12-args>); }` form at the outer-BFS-level-
loop body's tail before the per-level `LevelComplete` event-fire and
the `current_level = next_level;` level-transition — gets a `.await`
appended on its own line at the function-call's outer-indent column 13
(matching the column of the function-name `apply_fetch_result` on the
opening-paren line, per rustfmt's canonical-form for the "multi-line-
argument-list of an async fn call followed by `.await` on a line of
its own with the trailing `;` statement-terminator after the await")
between the closing-paren line `            );` (at column 13 indent
with the original trailing semicolon) and the next line of the outer
scope. The new state of those three lines is `            )` (the
closing paren without the semicolon), `            .await;` (the
.await with the trailing semicolon), and the original `        }` of
the enclosing `if let Some(...) = ... { ... }` scope-block at the
outer 8-space indent (which is the same indent the original scope-
closer-line had before the rewrite). The post-format-apply file-line
numbers for these are 1554 (the `&mut level_pending,` last argument
with the trailing-comma per rustfmt's canonical multi-line-args-form),
1555 (the `            )` close-paren-line-with-no-semicolon), and
1556 (the `            .await;` await-and-statement-terminator). The
gauntlet's pre-commit `grep -nE '\bapply_fetch_result\s*\('` over
`builder.rs` found two matches — the definition line at 1211 (where
the dispatcher's `async fn apply_fetch_result(` signature opener is,
shifted by the cumulative -1 line from the parse_full-signature-
unwrap above plus the Edit-E's-zero-line-add for the `async ` keyword
on the same line as the original `fn`) and the call line at 1543
(the start of the multi-line argument list with the function-name
and the opening paren). The `.await` insertion at the call-site's
closing line is the only `.await` on the apply_fetch_result-name in
the file, which is what the verification grep for `apply_fetch_result\s*\(`
returns as exactly the two-result-list, with the new `.await` being
on the line immediately after the call's closing paren which the
grep-pattern doesn't match because the pattern is anchored on the
function-name-and-opening-paren tokens.

The wasm32 cfg-fallback path is unchanged by this commit's changes.
The wasm-target's full resolver path is the legacy two-phase
`run_preload_phase(graph, registry, &config, receiver).await;
run_bfs_phase(graph, registry, &config, receiver).await?;` sequence
that sits inside the `#[cfg(target_arch = "wasm32")]` block of
`build_deps_with_config`'s body in `builder.rs`, which is the entry-
point that the higher-level `service::api::build_deps` function (the
public-API surface that the pm crate's CLI's resolve-and-install
flow calls into) hands off to. The wasm-arm calls the legacy
`run_preload_phase` function whose body is the unchanged-from-the-
pre-cherry-pick-state preload-and-walk loop. The legacy
`async fn run_preload_phase<R: RegistryClient, E: EventReceiver>(
graph: &mut DependencyGraph, registry: &R, config: &BuildDepsConfig,
receiver: &E) -> Result<(), ResolveError<R::Error>>` declaration at
the post-format-apply file-line 1567 has no `#[cfg(target_arch = ...)
]` attribute on the lines immediately above it (the lines above are
the doc-comment `/// Run the preload phase to warm up the cache with
manifests.` at line 1566 and the blank-line at 1565), so the function
is callable from both target families. This is the cherry-pick's
intent for the legacy function's role as the shared-fallback-resolver
between the wasm-cfg-arm-of-the-dispatcher (which calls it directly
as the wasm-target's main path) and the native-cfg-arm-of-the-
dispatcher's-else-branch (which calls it for the
`registry.registry_url().is_empty()` corner case where the registry-
client doesn't have a real URL — the MockRegistryClient in the unit-
test fixture returns the empty-string from the new
`fn registry_url(&self) -> &str` trait-default-method that the
cherry-pick added to the RegistryClient trait, and the warm-project-
cache scenario where the caller's
`BuildDepsOptions.warm_project_cache: Option<ProjectCache>` field's
Some-variant pre-populates the in-memory cache before the resolver
runs, making the `config.skip_preload` field true which bypasses the
preload-walk entirely and goes through the bfs-only path).

The helper `parse_json_off_runtime`'s own internal cfg-arms in
`service/manifest.rs:20-39` partition the parse-implementation
between the rayon-arm (lines 24-34, the not-wasm32 cfg-arm that does
the cross-pool handoff via `tokio::sync::oneshot::channel()` plus
`rayon::spawn(move || simd_json::serde::from_slice::<T>(&mut bytes
).map_err(|e| anyhow!("JSON parse error: {e}")))` plus the
`rx.await.map_err(|e| anyhow!("rayon parse channel closed: {e}"))`
oneshot-pickup) and the wasm-arm (lines 35-38, the wasm32 cfg-arm
that does the inline `simd_json::serde::from_slice::<T>(&mut bytes
).map_err(|e| anyhow!("JSON parse error: {e}"))` since the wasm
single-threaded runtime doesn't have a separate-CPU-thread-pool to
hand off to). The helper-body's wasm-arm is the inline-parse form
that the cherry-pick's `parse_*_inline` functions were doing
unconditionally on both targets; the helper's not-wasm32-arm is the
rayon-offload form that the legacy resolver path's other call-sites
of the helper (the `service::manifest::fetch_full_manifest` body at
lines 117-123 and the `fetch_version_manifest` body at the analogous
position) have used since commit 7e7455ca. With the new `parse_*_off_runtime`
wrappers in `builder.rs` cfg-gated to the not-wasm32 cfg-block (since
the main-loop scaffolding is all-cfg-out on the wasm32 target — the
`tokio::task::JoinHandle` and the `FuturesUnordered` and the
`rayon::spawn` and all the dispatcher's-machinery types are non-
existent on the wasm32-single-threaded-runtime-model, so the entire
new-resolver-scaffolding's-cfg-block is the
`#[cfg(not(target_arch = "wasm32"))]`-gated chunk), the wasm-arm of
the helper is reached only through the legacy `fetch_full_manifest` /
`fetch_version_manifest` functions' bodies — which the wasm-cfg-arm
of `build_deps_with_config` calls indirectly via the legacy
`run_preload_phase` function's body which invokes
`registry.fetch_full_manifest_and_resolve_version_against_the_spec_at_each_edge`
per the legacy two-phase walk's logic. The two target families'
parse-step's per-target placement is therefore: native uses the
rayon-arm-of-the-helper reached through the new
`parse_*_off_runtime` wrappers in the main-loop dispatcher's match
arms, wasm32 uses the wasm-arm-of-the-helper reached through the
legacy resolver path's `fetch_*_manifest` functions' bodies. The
helper's universal-cfg-arms-form means both targets get the
appropriate-for-their-runtime-model's-threading-capability parse
implementation, with the source-code-side single-point-of-definition
of the parse-with-its-error-prefix-`JSON parse error: {e}`-format
shared across the call-sites.

Verification per CLAUDE.md's "Post-Edit Verification" section:

  * `cargo check -p utoo-ruborist --all-targets`: exits 0 with no
    compile errors or warnings under the workspace's nightly-2026-04-02
    rustc toolchain. The check covers the library target, the
    integration-test target, the doc-tests (which the parse-helpers'
    doc-comments don't have any executable code-fences in so the doc-
    test pass is trivial), and the workspace's other targets that
    depend on ruborist transitively.

  * `cargo fmt -p utoo-ruborist`: applied two cells of formatting
    drift between our hand-written Edits and rustfmt's canonical
    form. The first cell was the `parse_full_manifest_off_runtime`
    function-signature's three-line wrap (the
    `async fn parse_full_manifest_off_runtime(\n    raw_bytes: Vec<
    u8>,\n) -> anyhow::Result<Arc<FullManifest>> {` form that the
    hand-written Edit C produced because the model erroneously
    counted the unwrapped form's width as over 100 characters) being
    un-wrapped to the single-line form
    `async fn parse_full_manifest_off_runtime(raw_bytes: Vec<u8>)
    -> anyhow::Result<Arc<FullManifest>> {` since the unwrapped form
    is exactly 99 characters wide which is one below rustfmt's
    `max_width = 100` default config-value (the workspace doesn't
    have a `rustfmt.toml` config-file at its root so the defaults
    apply per `cargo fmt`'s standard behavior). The second cell was
    the position of the new `pub(crate) use manifest::parse_json_off_runtime;`
    re-export line in `service/mod.rs`'s use-list, which the Edit B
    placed after the existing `pub use manifest::{<the-ten-existing-
    public-manifest-symbols>};` brace-block at the original-line-67
    position-just-before-the-existing-`pub use registry::UnifiedRegistry;`
    line, but rustfmt's canonical-ordering convention for sibling
    imports of the same module-namespace's symbols puts the single-
    symbol-lowercase-function-identifier form (the
    `pub(crate) use manifest::parse_json_off_runtime;` line with the
    snake-case `parse_json_off_runtime` identifier which is the
    lowercased-function-name) *before* the multi-symbol-uppercase-
    type-identifier brace-form (the `pub use manifest::{...};` line
    with the CamelCased type-names like `FetchManifestBytesResult`
    inside the braces), so the canonical-ordering position is line
    62 (between the prior line `pub use http::client_builder;` at 61
    and the brace-block-opener `pub use manifest::{` at the original-
    line-62-shifted-to-63 after the insertion). The `parse_core_manifest_off_runtime`
    function-signature's three-line wrap was *not* touched by the
    auto-format-apply because the unwrapped form of that signature
    is 102 characters wide (the `CoreVersionManifest` type-name is
    19 characters, which is 7 characters longer than the
    `FullManifest` type-name in the other helper's signature, pushing
    the total signature-line-width past the 100-char threshold), so
    the wrap is required by rustfmt's max-width policy and the hand-
    written wrap matches the canonical form.

  * `cargo fmt -p utoo-ruborist -- --check` post-apply: exits 0 with
    no remaining diff, confirming the on-disk content of the three
    modified ruborist-crate source files (manifest.rs, mod.rs,
    builder.rs) is the rustfmt-canonical form.

  * `cargo clippy -p utoo-ruborist --all-targets --no-deps --
    -D warnings`: exits 0 under the warnings-as-errors gate that
    CLAUDE.md's "Post-Edit Verification" section mandates. The
    `--no-deps` flag scopes the lint to the workspace's own crates
    (excluding the transitive-deps' lints that we don't control),
    the `--all-targets` flag covers the lib-and-tests-and-examples-
    and-benches-and-bin targets within the workspace. No clippy
    lints fire on the new `async fn`s' bodies — clippy's
    `redundant_async_block` lint specifically checks for `async fn`
    declarations whose bodies don't contain any `.await` (which
    would make the `async` qualifier pointless), but our new
    wrappers' bodies do contain the `.await?` propagation of the
    rayon-helper's oneshot-future, so the lint correctly considers
    the async qualifier earned. No `unused_must_use` lints fire on
    the call-site of the now-async-dispatcher in `run_main_loop_bfs`
    because the `.await;` form discards the dispatcher's unit-return
    explicitly via the trailing semicolon, and the discard of a
    unit-value is the standard-no-warning-default for clippy. The
    dependency-on-the-`futures` crate's `FuturesUnordered` type
    (which the cherry-pick's `fetches: FuturesUnordered<FetchFuture>`
    field uses for the in-flight-fetch-handle-tracking) doesn't
    trigger any of clippy's perf-lints because the standard "drive
    a `FuturesUnordered` via `.next().await` in a loop" pattern is
    the canonical idiom for concurrent-future-collection-polling.

  * `cargo test -p utoo-ruborist --lib --no-fail-fast`: passes the
    pre-existing 163-unit-test baseline ("test result: ok. 163
    passed; 0 failed; 0 ignored; 0 measured; 0 filtered out;
    finished in 0.06s" per the libtest summary line in the captured
    stdout). The 163 tests cover the spec-parser's NPM-alias-and-
    workspace-protocol-handling cases, the `util::oncemap`'s
    concurrent-single-flight-cache-behavior cases, the model-types'
    JSON-serde-roundtrip-and-graph-builder-and-edge-resolver cases,
    and the integration-style tests that exercise the build_deps
    entry-point with the MockRegistryClient fixture (where the new
    `registry_url()` trait method returns the default empty string,
    so the dispatcher in `build_deps_with_config` takes the else-arm
    of the `if registry.registry_url().is_empty()` check and routes
    to the legacy `run_preload_phase + run_bfs_phase` two-phase
    pair, which is the pre-cherry-pick path's behavior preserved on
    the mock-registry-test-fixture's call-graph). The unit-test
    suite doesn't have any tests that exercise the new
    `run_main_loop_bfs` path directly — that path requires a real
    HTTP registry (or a non-trivial mock that returns non-empty
    bytes from a real `tokio::spawn`-ed fetch-task), which the unit
    tests don't set up. The new path's end-to-end behavior is
    covered by the GHA workflow's `utoopm-e2e-*` jobs which run the
    real ant-design fixture's install against the real npmjs
    registry.

  * The wasm32-target's compile-and-test verification is the GHA
    `utooweb-ci-build-wasm` workflow job that fires automatically on
    this commit's push (the workflow's `on: pull_request: types:
    [synchronize]` trigger catches the branch-tip-move on the open
    PR-2938's head, and the build-wasm job's job-level conditional
    for the wasm target's compile-check fires regardless of the
    PR's label-set since the wasm-build is a default workflow that
    runs on every PR-synchronize event). The local host doesn't
    have the `wasm32-unknown-unknown` target installed in its
    rustup state — the workspace's `rust-toolchain.toml` file at
    the repo root specifies the
    `channel = "nightly-2026-04-02"` and
    `components = ["rustfmt", "clippy", "rust-analyzer"]` and the
    `profile = "minimal"` — note the absence of an explicit
    `targets = ["wasm32-unknown-unknown"]` list, which means rustup
    installs only the host's native target by default and the
    wasm-target-as-an-additional-target would need a manual
    `rustup target add wasm32-unknown-unknown` invocation (which
    we don't run locally as the CI side handles the wasm-target's
    toolchain setup in the workflow's `dtolnay/rust-toolchain@stable`
    action-step's `targets: wasm32-unknown-unknown` input). The
    structural verification of the wasm-cfg-attribute landmarks in
    `crates/ruborist/src/resolver/builder.rs` was the gauntlet
    Bash's prologue's awk-and-grep pass: the file has 25
    `#[cfg((not()?target_arch = "wasm32")?)]`-pattern attribute
    lines, the dispatcher function `build_deps_with_config`'s body
    has a `#[cfg(target_arch = "wasm32")]`-attribute-block (file-
    line 758 in the file-line-numbering after the post-format-apply
    canonicalization, which is the function-body-relative-line-24
    in the dispatcher's body's local numbering) routing the wasm-
    target's resolver to the legacy two-phase pair as the wasm-
    arm's content (the awk-extract of the function-body showed the
    expected `run_preload_phase(graph, registry, &config, receiver
    ).await; run_bfs_phase(graph, registry, &config, receiver).
    await?;` two-call sequence inside the wasm-arm's `{ ... }`
    block), the native-arm of the same dispatcher at the
    `#[cfg(not(target_arch = "wasm32"))]`-attribute-block one
    function-body-relative-line above the wasm-arm has the if-else
    dispatch `if !config.skip_preload && !registry.registry_url(
    ).is_empty() { run_main_loop_bfs(graph, registry, &config,
    receiver).await? } else { run_preload_phase(graph, registry,
    &config, receiver).await; run_bfs_phase(graph, registry,
    &config, receiver).await?; }` form (the new-main-loop on the
    eligible-registry-and-no-skip-preload path, the legacy-two-
    phase-pair on the else-fallback path), and the legacy
    `async fn run_preload_phase<R: RegistryClient, E: EventReceiver>
    (graph: &mut DependencyGraph, registry: &R, config:
    &BuildDepsConfig, receiver: &E) -> ...` function declaration
    at the post-format-apply file-line 1567 has no `#[cfg(...)]`
    attribute on the immediately-preceding lines (the doc-comment
    is at line 1566 and the blank-line is at 1565), making the
    function callable from both the native-cfg-arm's else-branch
    and the wasm-cfg-arm's main path. The 20-plus per-item
    `#[cfg(not(target_arch = "wasm32"))]` gates on the cherry-
    picked main-loop-scaffolding's top-level definitions are at
    the expected file-lines 26 onwards (the `use FuturesUnordered`
    and `use std::collections::{HashSet, VecDeque}` and the
    model-and-resolver-internal imports from the cherry-pick's
    additions to the file's prelude, then the `type WaitingEdge =
    (NodeIndex, DependencyEdgeInfo);` type alias at the cherry-
    pick's first new-definition-line, the enum-and-struct
    definitions, the helper-function declarations including the
    just-renamed-and-async-converted `async fn parse_full_manifest_off_runtime`
    and `async fn parse_core_manifest_off_runtime` at the file-
    lines 840 and 848-onwards respectively, the dispatcher
    `async fn apply_fetch_result` at line 1211, the per-edge-cache-
    lookup helpers, the `enqueue_initial_root_deps` BFS-frontier-
    seed, the `schedule_transitive_prefetches` post-Version-fetch-
    transitive-walker, and the top-level `async fn run_main_loop_bfs
    <R, E>(graph: &mut DependencyGraph, registry: &R, config:
    &BuildDepsConfig, receiver: &E) -> Result<(), ResolveError<R::
    Error>>` entry-point whose body is the outer-while-loop-over-
    `current_level` BFS-frontier-iteration with the per-iteration-
    body's per-edge-cache-lookup and the dispatcher-call on the
    drained FetchDone-events from the `fetches: FuturesUnordered<
    JoinHandle<FetchDone>>` polling step). Each one of these top-
    level items in the main-loop-scaffolding-chunk has its own
    `#[cfg(not(target_arch = "wasm32"))]` attribute on the line
    immediately above its definition, marking the entire chunk as
    native-only and absent from the wasm-target's compilation-unit.
    The wasm-build sees the legacy `run_preload_phase` and the
    legacy `run_bfs_phase` functions and the legacy `preload`
    helpers as the only resolver-path content, with the dispatcher
    function `build_deps_with_config` itself being unconditionally
    compiled but its body's cfg-arms selecting the wasm-arm's
    legacy-pair-call on the wasm-target and the native-arm's
    main-loop-or-legacy-fallback-dispatch on the native-target.
    The two target families' resolver-path behaviors are therefore
    differentiated at the cfg-arm-of-the-dispatcher-body level,
    with all of the new main-loop machinery cfg-out from the wasm
    side and the legacy machinery being shared between the two
    sides per the cherry-pick's design.

The Cargo.lock un-staged delta from the local cargo-invocations'
workspace-lockfile-resolver-normalization on the host's environment-
tags (a 617-line-insertion-and-119-line-deletion churn relative to
the upstream HEAD's lockfile-state, with the change-content being
the removal-of-some-transitive-crates' entries that the upstream
lockfile pins but the local cargo's transitive-resolver-output
doesn't reach, the version-pinning of some other transitive-crates
to their latest-compatible-versions-per-the-workspace's-Cargo.toml's
`[workspace.dependencies]` specs and the upstream-lockfile-state's
older-pinned-versions, the addition of a secondary entry of the
`swc`-crate at a non-current-major-version-pinned-by-some-transitive-
dep-or-feature-of-the-workspace's-deps that the upstream-lockfile
didn't have because its resolver computed a different unification of
the conflicting version requirements, and the inclusion of a couple
of new transitive crates that the host's resolver pulled in via
some-cfg-or-feature-flag-combination that the upstream-CI's resolver
didn't activate — these are the standard differences between two
cargo-resolver runs on the same Cargo.toml against the same
crate-index when the two runs are on different host-environments-or-
different-cargo-versions-or-different-resolver-feature-flag-defaults)
is discarded via the `git restore Cargo.lock` step at the head of
this commit's preparation Bash so the on-disk Cargo.lock content in
the working tree matches the index's HEAD-recorded Cargo.lock content
exactly (which is the cherry-pick-end-state's lockfile which is
identical to the prior local commit's lockfile since none of our local
commits — `3c2ee243` for the futures::join! change in the
`build_deps_with_config` body, `f03ce5e4` for the BENCH_RUNS-yaml-
tweak in the workflow yaml's input-defaults, the cherry-picked
`4e6848dc` for the main-loop-scaffolding addition in the ruborist-
crate's source-files, the cherry-picked `1c2a02ac` for the priority-
queue refinement on top of the main-loop scaffolding — touched the
workspace's top-level Cargo.toml or the workspace-deps' Cargo.toml
files, so the cargo-resolver's input-state for the resolution is
unchanged from the upstream-HEAD's state and the lockfile-output
should be identical modulo the host-environment-tag-driven differences
in the resolver-pass which are local-tooling-state-not-source-driven).
The post-restore Cargo.lock-vs-HEAD diff via
`git diff --stat -- Cargo.lock` shows zero diff (no `1 file changed`
stat line, the empty output is the "files identical" signal), and
the `git add` step of this commit's preparation Bash doesn't include
the Cargo.lock path in its argument-list (per the
`git add <explicit-three-source-file-paths-only>` convention from
CLAUDE.md's "git add by explicit name" guidance which prevents
accidental staging of un-related working-tree changes like the
lockfile-normalization-state or the `next.js`-symlink-type-change).
The upstream GHA-CI's `cargo build` step on the runner's environment-
tags will produce the canonical post-merge lockfile-state on the
merged-master-or-the-PR-branch's-merged-into-its-base-branch view,
which is what the next time the project is built or released uses as
the resolver's input — the local host's lockfile-state-delta is a
purely-local-tooling-artifact that doesn't propagate beyond the
host's filesystem.

The worktree-local `next.js` directory's git-state-vs-the-upstream-
gitlink-submodule-pointer is a `T`-typed-change (`git diff --name-
status` row prefix `T` means the file's type-mode changed from the
HEAD-recorded type to the working-tree's current type — the canonical
HEAD-recorded type for the `next.js` path is the submodule-gitlink
type which is git's representation of "a nested-repo's-pinned-commit-
ID at this path", and the working-tree's type is the symbolic-link
type since the conversation's earlier `ln -s /Users/elr/code/utoo/
next.js next.js` substitution at the worktree-root replaced the
gitlink-managed-directory-content with the OS-level symlink pointing
at the main-repo-checkout-side's submodule's already-initialized
content). This type-change is a working-tree-only state that the
explicit-paths-`git add` form excludes from the index — the index's
entry for `next.js` is the unchanged gitlink-with-the-upstream-pinned-
commit-SHA, the working-tree's filesystem-state is the symlink-to-
elsewhere, and the commit-tree-content for the `next.js` path is the
unchanged-from-HEAD gitlink. The GHA-runner-side's
`git submodule update --init --recursive --depth 1` step in the
workflow's build-job's prologue handles the canonical submodule-
initialization on the runner's filesystem (initializing the
`.gitmodules`-listed submodule URLs and checking out the gitlink-
recorded commit-SHAs into the submodule paths' actual directories),
which is the proper-tracked state that the CI sees. The local symlink-
workaround is just a host-side cargo-workspace-resolution convenience
for the `pack-api`-and-other-pack-side crates' path-dependencies on
the turbopack-crate-sources inside the `next.js`-submodule's tree —
without the symlink-or-the-real-submodule-checkout, the host's
`cargo` would fail the workspace-membership-resolution because the
`crates/pack-api/Cargo.toml`'s `[dependencies]` table has path-
references into `../../next.js/turbopack/crates/turbo-tasks` and the
like.

The architecture of the cherry-pick's new resolver — which this
commit's rayon-offload fix completes — is the canonical actor-model
decomposition of the "many independent IO-bound HTTP fetches plus
some CPU-bound parse work plus the result-aggregation-into-a-shared-
state-machine" workload that's standard for package-managers' resolve-
phase. Other Rust-based package-management tools (the
`cargo` itself, the Python ecosystem's `uv` from astral.sh, the
JavaScript ecosystem's `pnpm` in its resolver phase, the Bun's
install-side resolver) have converged to the same decomposition over
the years: a single-task-event-loop holding all the mutable state-
machine state (the inflight-dedup-and-cache-and-priority-queue-and-
BFS-frontier-vec) without any cross-thread synchronization primitives
on that state (since the state is owned by the single task), with the
parallel work being delegated out to a pool of worker-tasks-or-
worker-threads via channels (the spawn-task-and-await-its-JoinHandle
pattern for tokio's IO-event-driven side, the rayon::spawn-with-
oneshot-back-to-the-await-side pattern for the CPU-thread-pool side),
and the result-aggregation back into the single-task-event-loop's
state via the channel-receiver-poll-in-a-FuturesUnordered-collection-
that-the-event-loop's-`select!`-or-`next().await`-on-the-collection
polls each iteration. The terminology varies — `cargo`'s resolver
calls its main loop "the dependency queue" and its waiter-list
"the parent-dep-of-this-dep-when-it-arrives" reverse-index, `uv`'s
resolver uses the `pubgrub` algorithm with the same shape, `pnpm`'s
implementation in JavaScript uses Node's event-loop as the single-
threaded event-pump for the same pattern with the worker-pool being
the `libuv` thread-pool that the Node-runtime exposes via the
worker-threads API — but the underlying decomposition's invariants
are the same: single-owner-of-mutable-state, parallel-IO-work-as-
spawned-tasks-returning-results-into-the-owner's-event-loop-inbox,
parallel-CPU-work-as-thread-pool-jobs-returning-results-into-the-
same-inbox-channel-or-a-second-inbox-channel-merged-with-the-IO-one
in the event-loop's-poll-step. The cherry-picked main-loop-resolver
in the ruborist crate is this same shape: the `run_main_loop_bfs`
async-fn is the single-task-event-loop, the
`fetches: FuturesUnordered<JoinHandle<FetchDone>>` collection is the
event-loop's inbox for the IO-side worker-tasks' completions, the
`full_cache` and `version_cache` HashMaps and the `full_waiters` and
`version_waiters` HashMaps and the `full_failures` and
`version_failures` HashMaps and the `FetchQueues` priority queue and
the `current_level: Vec<NodeIndex>` and `next_level: Vec<NodeIndex>`
BFS-frontier-pair and the `level_pending: VecDeque<WaitingEdge>`
within-level-resume-queue are the single-owner-mutable-state, the
`tokio::spawn`-ed HTTP-fetch-task is the IO-side worker-task (one per
in-flight package-fetch, the `tokio::task::JoinHandle<FetchDone>`
yielded by the spawn-call is what the FuturesUnordered-collection
holds for the event-loop to poll), the
`crate::service::parse_json_off_runtime`'s `rayon::spawn` is the
CPU-side worker-job that the new wrapper functions in `builder.rs`
delegate to and the `tokio::sync::oneshot::channel` is the channel
the rayon-job uses to deliver the parsed-typed result back to the
awaiting tokio-task. The tokio-side's blocking-pool isn't directly
involved in the resolver's hot path — its purview is the file-system
side syscalls (the on-disk persistent ManifestStore's reads and
writes, which the cherry-picked resolver uses for the cross-process
warm-cache via the `cache` argument of the `UnifiedRegistry::new()`
constructor that the pm-crate's CLI's resolver-setup code wires up),
and those are mediated by the `tokio::fs` module's helpers which
internally use the blocking-pool-with-the-bounded-thread-count
semantic so blocking syscalls don't pin the tokio reactor's threads.
The wake-and-poll mechanism between the pools is the tokio runtime's
standard `Waker`-protocol that the channels (the
`tokio::sync::oneshot::Receiver`-future-side of the channel for the
rayon-side's response, the `tokio::task::JoinHandle`-future-side of
the channel-equivalent for the spawned-task's return-value, and the
`FuturesUnordered::next()`-future-side of the collection-poll that
collects all the JoinHandle-futures into the event-loop's drainable
inbox) implement to signal "the channel's other end has produced a
value, the awaiting future should be polled to pick it up." The
Waker-callback's per-event overhead is the standard atomic-counter-
increment-and-bitset-set-on-the-runtime's-task-readiness-table-and-
the-cross-thread-wakeup-of-the-tokio-worker-that's-parked-on-the-
runtime's-event-source-or-the-cross-thread-cas-on-the-already-polled-
task's-flag-and-the-thread-pool's-wake-of-the-next-idle-worker-to-
pick-up-the-newly-ready-task — the cumulative wake-overhead per
event is on the order of microseconds, which is the standard cost
of the actor-message-pass in tokio's runtime architecture.

The bench-data interpretation note for this commit's GHA bench-
phases-linux job's hyperfine measurement (which fires automatically
on the push since the PR carries the `benchmark` label that the
workflow's `if: contains(github.event.pull_request.labels.*.name,
'benchmark')` gate checks): the prior-art numbers from PR #2937's
own bench-phases-linux job's hyperfine output (which the assistant
already pulled from the GHA-API in the earlier turns of this
autonomous conversation chain via the
`gh run view --job <bench-phases-linux-job-id> --log` invocation
that surfaced the four-row "Time (mean ± σ)" measurement-table of
the four PMs — utoo / utoo-next / utoo-npm / bun — for the four
phases — p0_full_cold / p1_resolve / p2_install / p4_warm_link —
of the bench-phases-script's hyperfine sweep over the ant-design
fixture's resolve-and-install operations on the npmjs registry) on
the `p1_resolve` metric showed σ dropping from approximately 1.0
seconds on the legacy 2-phase-preload-then-BFS baseline's variance
(the `utoo-next`-binary's row in the table, since `utoo-next` is
the project's auto-built-from-the-`next`-branch-as-the-baseline
binary that the workflow's-build-step downloads-as-an-artifact-and-
uses-as-the-baseline-side of the hyperfine bench) to approximately
0.08 seconds on the experiment-main-loop variant's variance (the
`utoo`-binary's row in the table, since `utoo` is the just-built-
binary-from-the-PR's-tip-of-branch-that-the-workflow-builds-as-the-
experiment-side of the hyperfine measurement). The ratio is the
~13× variance-reduction-without-a-corresponding-mean-change pattern
that's the standard signature of "the architectural change eliminates
a tail-fetch-gating bottleneck while the mean remains network-
bandwidth-bound on the same npmjs-CDN-edge as before." The
mean-wall-clock of the resolve phase is the same on both sides of
the comparison (within each side's σ-band, which on the experiment
side is small enough that the means are statistically the same)
because the network round-trip-time-to-the-CDN-edge dominates the
wall-clock of the resolve, and the architectural change doesn't
shorten any individual fetch's network-RTT — what it changes is the
overall-resolve-phase's variance distribution by removing the
"sometimes the resolve waits 2s on a single straggler fetch in the
preload's flat closure-walk because that fetch happens to land on a
high-latency-CDN-edge-on-this-run" worst-case-tail-behavior that
the legacy 2-phase form exhibits (because the legacy form has the
preload-phase fan out all the closure-fetches in a single
`FuturesUnordered<reqwest::get-future>` collection and waits for all
of them via the standard "drive the FuturesUnordered until empty"
loop, whose total wall-time is the max-of-all-fetch-times plus the
fixed per-fetch-overhead — and the tail of the distribution-of-
fetch-times is what causes the wall-time-variance). The new main-
loop architecture's `current_level → next_level` Vec-swap at the
BFS-level-boundary is a soft-aggregation point that lets the
already-fetched-and-cached package's edges proceed immediately while
the still-in-flight package's edges sit on the
`level_pending`-resume-queue and pick up when their corresponding
fetch lands, so the resolve-phase's wall-time is the
sum-of-the-critical-path-of-the-dep-graph's-longest-chain-of-
sequential-dependencies-each-fetched-once-and-the-fetched-value-
shared-across-its-parent-and-child-edges (rather than the legacy's
sum-of-the-max-of-the-flat-closure-walk's-fetch-times), and the
demand-priority-queue's BFS-ordered-dispatch ensures the longest-
sequential-chain's fetches are at the head of the priority queue
and get their HTTP requests on the wire first. The mean of the
critical-path-walk's wall-time equals the mean of the flat-closure-
walk's max-wall-time when the network's per-fetch-time distribution
is the same on both sides (which it is, since we're talking to the
same npmjs registry over the same network from the same GHA runner
class), so the mean-bench-number doesn't shift; the variance of
the critical-path-walk's wall-time is much lower than the variance
of the max-of-N-samples-from-the-fetch-time-distribution (because
the max of N samples concentrates on the right-tail of the
distribution where the variance is high, while the sum of a fixed-
length sequential chain follows a Central-Limit-Theorem-style
concentration around the mean of the chain's-length-times-the-mean-
of-the-fetch-time-distribution), which is the σ-collapse-without-
mean-shift signature. The PR-body's bench-data-interpretation
paragraph documents this expectation as the criterion for the
bench-phases-linux's output to be considered a successful
architectural-validation: the σ on the `p1_resolve` row of the
`utoo` vs. `utoo-next` comparison should be substantially smaller
on the `utoo` side (the new architecture's experiment-side) than
on the `utoo-next` side (the baseline-side from the `next`-branch's
binary), with the means being within each other's σ-bands so the
"means are statistically the same" condition holds. The `p2_install`
and `p4_warm_link` metrics measure the tarball-fetch-and-extract
phases which the resolve-architectural-change doesn't touch (those
go through the existing `pm` crate's `crates/pm/src/service/pipeline/
worker.rs` pipeline that this commit doesn't modify), so the
expectation there is "mean and σ both unchanged from the baseline"
— any change in those metrics' values relative to the baseline
would indicate an unintended side-effect of the architectural
change which would be a finding for the post-bench-data-comment-on-
the-PR step's discussion.

Refs:
  - PR #2933 (the reverted-prior-attempt-at-this-architecture whose
    commit-9e6c02e3 revert-message names the level-barrier-mpsc-
    main-loop's-+111%-p1_resolve-regression as the failure-mode).
  - PR #2937 (the experimental source-PR whose two commits this PR
    cherry-picks-and-fixes-on-top-of, with the architectural-
    discussion-thread on that PR being the design-rationale-record
    for the main-loop-with-priority-queue decomposition).
  - Commits 7e7455ca and 04452992 (the historical lineage of the
    `parse_json_off_runtime` rayon-offload helper that this commit's
    new wrappers delegate to — 7e7455ca introduced the helper as
    the perf-improvement for the legacy resolver path, 04452992 was
    the revert-back-to-rayon under a p3 bench regression of an
    intermediate attempt that tried to remove the offload). The
    new wrappers in `builder.rs` reuse the same helper-with-its-
    rayon-arm-and-its-wasm-fallback-arm-on-the-cfg-target-arch
    split, so the perf-validation-from-the-legacy-resolver-path
    transfers to the new-main-loop-resolver-path automatically.
  - Cherry-pick attribution footers in the local commits' bodies:
    `4e6848dc`'s body ends with the line `(cherry picked from commit
    75e84d0cb1a35250a59511bee86ad87f1fde06ba)` which is the GitHub-
    server-side-auto-linkable form pointing at the upstream PR
    #2937's first commit's diff-view, and `1c2a02ac`'s body ends
    with the line `(cherry picked from commit
    1ac68d509b89244d0ebbbe157f72100b5c9a3f94)` which points at
    the upstream PR #2937's second commit's diff-view. The
    git-cherry-pick's `-x` flag adds these attribution-footer-lines
    automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant