perf(pm): demand driver + cutover (#3028 integrated on select)#3084
perf(pm): demand driver + cutover (#3028 integrated on select)#3084elrrrrrrr wants to merge 4 commits into
Conversation
Third leaf module of the demand resolver, after `state` (#3079) and `queue` (#3080). `select` is the pure per-edge resolution decision: fn select_edge(state, edge, name, spec, mode) -> EdgeStep // EdgeStep = Resolve | Skip | Fail | Park { wait, fetch } It only reads `ManifestState` and returns a decision value — no `&mut`, no async, no I/O, no graph mutation — so it's unit-testable in isolation (7 tests covering semver/full-manifest cache hits, recorded failures, parks, client-side version resolution, and optional-skip). Dead-code-staged (`#![allow(dead_code)]`); the driver that consumes it lands in the follow-up PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review: drop the one-line `resolve_version_from_full_manifest` wrapper (inline `resolve_version_from_versions`), and collapse the duplicated "recorded failure? -> cached manifest?" probe — which appeared in the semver path, the full-manifest early check, and the resolved-version check — into a single `settled_step(state, name, lookup_key, spec)`. The alias is now derived (`lookup_key != spec`) instead of hand-set per call. No behaviour change: the 7 select unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rsions)
Address review: deciding an edge had to probe three separate maps
(`full.cache`, `full.failures`, `versions_cache`) to learn a package's
status. A package has at most one of those, so fold them into one
enum-keyed map:
enum PackageVersions { Failed(String), Full(Arc<FullManifest>), List(Arc<VersionsInfo>) }
state.packages: HashMap<String, PackageVersions>
`select_full_manifest` becomes a single `state.package(name)` lookup + a
`match`, and the "at most one source" invariant is now enforced by the
type. `full.waiters` becomes `package_waiters`.
Also fixes a latent precedence bug: a cached version manifest now resolves
the edge even if the package's full-manifest fetch later failed (the old
order let the package failure shadow a usable cached manifest). New test
pins it.
178 tests pass (8 select unit tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stack the driver work on #3083 (select + PackageVersions) and adapt it to the new state API. This is the "#3028-on-#3083 refactored" integration candidate — the driver, the spawn/multi-core model, the cutover, and the perf tuning, now built on the clean leaf modules (state / queue / select). - demand/driver.rs (+ provider impl, manifest helpers, http pool, cutover, pm wiring) brought from the unintegrated driver branch. - driver adapted to PackageVersions: `state.full.cache.insert` / `versions_cache` / `full.failures` → `set_package(Full/List/Failed)`; `full.waiters`/`full.wake` → `park_on_package`/`wake_package`; `full.is_settled || versions_cache.contains` → `has_package_source`. 182 tests pass, clippy clean (default). Bench-gate next to confirm it still lands ~2.4s (the whole point: match #3028 on the refactored split). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a demand-driven BFS dependency resolution loop, transitioning the resolver to a single-flight, level-by-level pipeline driven by a new ManifestProvider trait. It also optimizes HTTP client pooling with multiple independent connection pools on native targets and improves JSON parsing performance by using mutable buffers. The review feedback highlights a potential Denial of Service (DoS) vulnerability where vector capacity is pre-allocated from an untrusted Content-Length header without a cap, and recommends resuming unwinding on task panics instead of converting them into normal errors.
| let done = done.map_err(|e| { | ||
| registry_error::<R::Error>(format!("manifest fetch task failed: {e}")) | ||
| })?; |
There was a problem hiding this comment.
When a spawned task panics, the JoinHandle returns a JoinError indicating a panic. Converting this panic into a normal ResolveError hides the panic and implements a form of recovery logic, which violates the rule of treating panics as unrecoverable bugs.
Instead, resume the unwind using std::panic::resume_unwind(e.into_panic()) when a panic is detected.
let done = match done {
Ok(d) => d,
Err(e) => {
if e.is_panic() {
std::panic::resume_unwind(e.into_panic());
}
return Err(registry_error::<R::Error>(format!("manifest fetch task failed: {e}")));
}
};References
- Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.
| let done = done.map_err(|e| { | ||
| registry_error::<R::Error>(format!("manifest fetch task failed: {e}")) | ||
| })?; |
There was a problem hiding this comment.
When a spawned task panics, the JoinHandle returns a JoinError indicating a panic. Converting this panic into a normal ResolveError hides the panic and implements a form of recovery logic, which violates the rule of treating panics as unrecoverable bugs.
Instead, resume the unwind using std::panic::resume_unwind(e.into_panic()) when a panic is detected.
let done = match done {
Ok(d) => d,
Err(e) => {
if e.is_panic() {
std::panic::resume_unwind(e.into_panic());
}
return Err(registry_error::<R::Error>(format!("manifest fetch task failed: {e}")));
}
};References
- Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.
| let capacity = response | ||
| .content_length() | ||
| .and_then(|len| usize::try_from(len).ok()) | ||
| .unwrap_or(0); |
There was a problem hiding this comment.
Pre-allocating a vector's capacity directly from an untrusted Content-Length header can lead to Denial of Service (DoS) via Out Of Memory (OOM) panics if a malicious or misconfigured registry returns an extremely large value.
Capping the pre-allocated capacity to a reasonable maximum (e.g., 10MB) prevents this vulnerability while still allowing the vector to grow dynamically if the actual response body is larger.
| let capacity = response | |
| .content_length() | |
| .and_then(|len| usize::try_from(len).ok()) | |
| .unwrap_or(0); | |
| let capacity = response | |
| .content_length() | |
| .and_then(|len| usize::try_from(len).ok()) | |
| .unwrap_or(0) | |
| .min(10 * 1024 * 1024); |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.16s | 0.24s | 9.87s | 10.00s | 734M | 332.6K |
| utoo-next | 8.32s | 0.14s | 10.18s | 11.95s | 866M | 122.5K |
| utoo-npm | 9.28s | 0.99s | 10.60s | 12.45s | 948M | 127.1K |
| utoo | 8.25s | 0.37s | 11.02s | 12.04s | 941M | 146.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.5K | 17.4K | 1.11G | 6M | 1.75G | 1.64G | 1M |
| utoo-next | 103.3K | 72.3K | 1.08G | 4M | 1.62G | 1.61G | 2M |
| utoo-npm | 135.4K | 103.2K | 1.08G | 5M | 1.62G | 1.61G | 2M |
| utoo | 101.1K | 62.1K | 1.08G | 5M | 1.62G | 1.61G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 1.95s | 0.04s | 4.08s | 0.99s | 521M | 162.4K |
| utoo-next | 2.91s | 0.09s | 5.31s | 1.53s | 624M | 85.0K |
| utoo-npm | 3.27s | 0.23s | 5.55s | 1.89s | 621M | 84.0K |
| utoo | 2.45s | 0.13s | 6.07s | 1.61s | 642M | 120.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 8.1K | 4.6K | 205M | 3M | 111M | - | 1M |
| utoo-next | 44.1K | 83.3K | 202M | 2M | 7M | 3M | 2M |
| utoo-npm | 68.9K | 110.1K | 202M | 2M | 7M | 3M | 2M |
| utoo | 18.1K | 33.6K | 205M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.71s | 0.10s | 5.82s | 9.74s | 560M | 182.5K |
| utoo-next | 7.16s | 1.82s | 4.74s | 10.88s | 424M | 64.3K |
| utoo-npm | 7.00s | 1.87s | 4.87s | 10.80s | 514M | 61.1K |
| utoo | 6.11s | 0.99s | 4.75s | 10.56s | 436M | 54.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 3.5K | 7.1K | 934M | 3M | 1.67G | 1.67G | 1M |
| utoo-next | 107.1K | 49.1K | 904M | 3M | 1.61G | 1.61G | 2M |
| utoo-npm | 102.1K | 51.4K | 903M | 3M | 1.61G | 1.61G | 2M |
| utoo | 83.7K | 50.9K | 903M | 2M | 1.61G | 1.61G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.47s | 0.08s | 0.21s | 2.43s | 135M | 33.8K |
| utoo-next | 2.60s | 0.39s | 0.53s | 3.83s | 80M | 19.3K |
| utoo-npm | 2.36s | 0.03s | 0.51s | 3.80s | 80M | 18.7K |
| utoo | 2.46s | 0.13s | 0.51s | 3.83s | 80M | 18.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 364 | 25 | 5M | 28K | 1.82G | 1.64G | 1M |
| utoo-next | 44.1K | 17.4K | 308K | 13K | 1.61G | 1.61G | 2M |
| utoo-npm | 43.5K | 18.1K | 308K | 11K | 1.61G | 1.61G | 2M |
| utoo | 42.0K | 17.7K | 307K | 7K | 1.62G | 1.61G | 2M |
npmmirror.com: no output captured.
|
Splitting this further before review — code volume is too dense for one read. Stack-of-3 plan:
Bench-gate on every PR in the stack to catch any cumulative drift. #3028 stays as reference baseline. Will update this PR's description to point at the stack as soon as PR-A opens. |
Type-level scaffolding for the demand-driven resolver rework (#3028 / #3084). The trait the demand driver dispatches through — `service::ManifestProvider` — and its registry-backed adapter (`impl ManifestProvider for UnifiedRegistry` in the new `service::registry::provider` module) land here unreferenced. The driver that consumes the trait is the next PR in this stack (`resolver/demand/driver.rs`, ~700 lines). The flip of the public entry-points (`build_deps_*` and `resolve_*` in `resolver::builder`, plus the `api::build_lockfile` host call site) from the legacy `RegistryClient` bound to the new `ManifestProvider` bound is the third PR, and the two runtime tunings — the HTTP-client pool that fans connections across Cloudflare edge IPs, and the resolver-side `get_resolver_manifests_concurrency_limit` knob that raises the in-flight cap for non-semver registries (npmjs) from the existing 64 (tarball-side `get_manifests_concurrency_limit`) to 256 — ride along with the cutover. They're scoped that way because their payoff is the demand driver's single-flight de-duplication of concurrent fetches for the same package: landing the same tunings on the legacy two-phase resolver overcommits its non-deduplicated per-edge concurrency at the npmjs front door and regresses the resolve phase. The bench-vs-`utoo-next` comparison on this PR is expected to sit at noise — the active runtime path of `utoo install` is byte-identical to `next` here, because nothing in this PR is reachable from `build_deps_with_config`'s body or from any other live entry, the HTTP client stays at its single shared instance, and `Context.concurrency` keeps reading the existing tarball-side knob. File by file: * `service/manifest_provider.rs` (new, +106 lines): the trait definition. One async method that takes a `ManifestJob` and returns a `ManifestJobDone` (the typed job-and-result shape; see `traits/registry.rs` below). Carries `Send + Sync` under a `#[cfg_attr(not(target_arch = "wasm32"), async_trait)]` and the `?Send` form for wasm — the native build can `tokio::spawn` the job futures across the multi-threaded runtime (which is where the demand driver's perf comes from), and the single-threaded wasm runtime still works via `spawn_local`. * `service/registry/` (rename + new sibling): the existing flat `service/registry.rs` becomes `service/registry/mod.rs` so a new sibling file `service/registry/provider.rs` (+179 lines) can hold the `impl ManifestProvider for UnifiedRegistry` without bloating `mod.rs` and without widening the visibility of `UnifiedRegistry`'s private fields (`store`, `registry_url`, the supports-semver flag) — child modules already see private items of their parent module. The `UnifiedRegistry` struct body itself is unchanged. * `traits/registry.rs` (+67 lines): the trait's job and error shapes — `RegistryError`, `ManifestJob` (the `Versions`, `Full`, `ExactVersion` job kinds the driver issues), the paired `ManifestJobDone` and the `ManifestFullData` payload the full-manifest kind returns, and the `MetadataFormat` enum for the response-content-type negotiation (`application/vnd.npm.install-v1+json` vs the full form). Pulled out as named items so the trait's signatures don't leak any of the existing `service::fetch` module's internal types. * `service/manifest.rs` (+183 lines): two new helpers the adapter uses to keep `simd_json::serde::from_slice`'s in-place buffer mutation off the tokio runtime. The existing `parse_json_off_runtime` (a borrowing form that copies the buffer inside the worker) gets a buffer-consuming sibling `parse_json_vec_off_runtime(Vec<u8>)` whose callers can hand ownership of the response-body bytes straight in. The full-manifest parse picks up a sibling `parse_full_manifest_with_core_off_runtime(bytes, spec)` that returns both the parsed `FullManifest` and, when a spec was supplied that names an exact version, the `CoreVersionManifest` slice for that version — so the adapter's `ManifestJob::ExactVersion` path can hand the per-version result back to the driver without a second pass over the full document. Both helpers dispatch the CPU-bound parse to `rayon::spawn` on native and inline it on wasm via a `#[cfg(target_arch)]` switch. * `service/cache.rs` (+43 lines): two methods on `ProjectCacheData` that bridge the on-disk shape (a per-package map of specs and resolved-version manifests, the format the host serializes to the lockfile sidecar) and the resolver-owned `(name, spec, manifest)` tuples the demand loop emits. `resolved_manifests(&self)` flattens the on-disk map into the neutral tuple form for seeding a warm resolver run; `from_resolved(tuples)` rebuilds the on-disk shape from the tuples the resolver returned. The impl block carries `#[allow(dead_code)]` until the cutover PR points `api::build_lockfile` at them — same dead-code-staging pattern as the earlier #3079 (state) and #3083 (select) splits. * `service/mod.rs` (+4 lines): public re-exports of `ManifestProvider` and the supporting job types at the `crate::service::*` level so neither the demand module nor the pm binary has to reach into the sub-module path. * `model/manifest.rs` (+5 lines) and `model/mod.rs` (+6 lines): the small additions on the model side that the new parse helpers consume — a flat-list shape for the versions-only abbreviated-metadata response and the corresponding `pub use`. * `.github/workflows/pm-e2e-bench.yml` (+8 / -2): the bench-baseline build step (which overlays `origin/next`'s tracked files on top of the PR's tree with `git checkout origin/next -- .` so a `cargo build` against next's resolver runs against the PR's e2e harness) gets a cleanup of the paths the PR adds that don't exist in next: `git diff --no-renames --diff-filter=A --name-only origin/next HEAD -- crates/ | xargs -r rm -f`. Without it, this PR's new `service/registry/` directory ends up side-by-side with the overlay's flat `service/registry.rs`, and the build hits rustc's E0761 ("file for module `registry` found at both `mod.rs` and `registry.rs`"). The `--no-renames` flag is the load-bearing detail — under default rename detection git pairs the `registry.rs ↔ registry/mod.rs` rename as a single change, and the `--diff-filter=A` for the added side then reports zero added paths and misses the directory. The benchmark label on the PR is on so the bench gate runs. The two scaffolding tunings the perf model needs — the HTTP-client pool and the resolver-side concurrency cap of 256 for non-semver registries — live in the cutover PR alongside the entry-point bound flip, because the demand driver's single-flight is what makes the higher cap a win rather than a wall-clock regression vs `next`. The bench numbers on this PR are expected to sit at the `next` baseline within noise. Part 1/3 of the #3084 split. The remaining two are the demand driver (Part 2) and the entry-point cutover + the runtime tunings + the dead-code annotations coming off (Part 3, the bench-gated one). Refs #3028, #3084. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the demand-driven BFS resolver loop on top of the `ManifestProvider` trait from the preceding PR in the stack. The driver and its graph-building helpers exist as dead code in this PR — the entry-point switch that points `api.rs` and `builder`'s public `build_deps_*` / `resolve_*` chain at them is the third PR. Same dead-code-staging idiom as the earlier `state.rs` (#3079) and `select.rs` (#3083) splits. What lands here, file by file: * `resolver/demand/driver.rs` (new, ~700 lines): the `run_main_loop_bfs` entry — owns the per-run `ManifestState` (the cache + waiters + failures store from #3079) and the `FetchQueues` scheduler (the push/pop/complete state machine from #3080), pumps the `ManifestProvider` job stream through a `FuturesUnordered` of `tokio::task::JoinHandle`s (the multi-core spawn that gives the resolver native fan-out — `tokio::spawn` on native targets, the single-threaded `tokio::task::spawn_local` on wasm via the `#[cfg_attr]` toggle on the trait's `Send + Sync` bound). The `apply_fetch_result` glue feeds resolved manifests back into the graph through the new helpers in `builder.rs` (see below); the `select_edge` decision step from #3083 picks the next action per-edge (cache hit, version-cache hit, wait on an in-flight job, fail). The `handle_processed` wrapper around the graph-mutation step emits the existing `BuildEvent::Resolved` / `Failed` so progress receivers don't see a discontinuity once the cutover lands. A `#[cfg(test)]` module at the bottom holds the driver's unit-test scaffolding (`MockRegistryClient`, `CountingRegistry` wrapper for the single-flight property, the `create_*_manifest` helpers). One of those tests — `test_non_semver_exact_version_extract_single_flight` — is `#[ignore]`d in this PR with a reason string: it asserts on the `ManifestProvider` job count produced by a full `resolve(pkg, registry)` pipeline, which still routes through the legacy `RegistryClient::fetch_version_manifest` path in this PR. The cutover PR removes the `#[ignore]` once `resolve` is pointed at the demand driver. The other driver tests cover the loop's invariants in isolation (state transitions, waiter wake-up, schedule fairness) and pass under PR-B. * `resolver/demand/mod.rs`, `resolver/demand/queue.rs`: the small re-export and visibility adjustments to expose `run_main_loop_bfs` and `ResolverManifestCache` at the `crate::resolver::demand` level so `builder.rs` can name them, and the queue's `FetchKey` /`FetchDone` types in the shape the driver consumes. * `resolver/demand/state.rs`: a single attribute — `#[allow(dead_code)]` on the `ResolverManifestCache.entries` field. The driver writes the field via `ManifestState::into_resolver_cache()` at the end of each run; the reader is `ProjectCacheData::from_resolved` in the cutover PR's `api.rs` edit. Mirrors the symmetric annotation on the `ProjectCacheData` bridges in `service/cache.rs` from PR-A — both annotations come off when the entry-point switch wires the writer-chain to the reader-chain in PR-C. * `resolver/builder.rs`: four new graph-building helpers extracted from `process_dependency`'s internal logic so the driver can reuse them without going back through the legacy entry-points, plus the new `pub(crate) async fn build_deps_with_config_output` that wraps the demand loop with the existing tracing + receiver wiring and returns the `ResolverManifestCache` the host needs to persist: - `pub(crate) fn try_reuse_dependency(...)`: hits the graph's existing-node index before issuing a fetch, so repeat references to the same `(name, resolved-version)` share one node. - `pub fn process_dependency_with_resolved(...)`: the edge-resolution tail that runs once a manifest is in hand — creates or reuses the dependent node, attaches the edge, forwards the resolution mode flags. - `pub(crate) fn chain_err(...)`: lifts a `RegistryError` from the provider's job stream into the resolver's `ResolveError::WithChain` so the CLI's chain-aware error renderer still gets the parent → child causality string when the demand path fails the same way the legacy path used to. - `pub(crate) async fn handle_resolved_registry_manifest(...)`: the integration point between a resolved `CoreVersionManifest` and the graph — caches under both the spec and the resolved version (so later lookups by either key hit memory), spawns the dependent-edge collection, fires `BuildEvent::Resolved`. All four are reachable only from the driver in this PR; the legacy `process_dependency` keeps its inline form and the legacy entry chain (`build_deps` / `build_deps_with_*` / `resolve` / `resolve_with_options`) keeps its old `R: RegistryClient` signatures. The new `build_deps_with_config_output` is the demand-side entry the cutover PR will route `build_deps_with_config` and `api.rs` through; it carries an `#[allow(dead_code)]` for this interim state with a one-line comment naming the next PR as its caller. The three import-line tweaks at the top of `builder.rs` — `CoreVersionManifest` joining the `crate::model::manifest` brace-group, the new `use` of `ResolverManifestCache` and `run_main_loop_bfs` from `crate::resolver::demand`, and `ManifestProvider` joining the `crate::service` brace-group — are the only edits to existing lines in this file. The orphaned preload-era functions (`gather_preload_deps`, `run_preload_phase`, `run_bfs_phase`) keep their existing signatures and live call paths — the cutover PR is what `#[allow(dead_code)]`-annotates them and the cleanup PR after the cutover deletes them. The benchmark label is on this PR so the bench gate runs. Because the active resolver pipeline is unchanged in this PR (`resolve` still calls preload-then-BFS through the legacy `RegistryClient` interface), the expected bench numbers match PR-A on the standard npmjs workspace. The full `p1_resolve ≈ 2.4s / vCtx ≈ 18K` win shows up in PR-C alongside the entry-point flip. Part 2/3 of the #3084 split. Refs #3028 #3083 #3084 #3085 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What
The driver landing — stacked on #3083. This is the "#3028-on-#3083 refactored" integration candidate: #3028's proven-fast demand resolver, restacked on the clean leaf modules (state / queue / select +
PackageVersions).Contents (+1558 / −85, 17 files):
demand/driver.rs—run_main_loop_bfs+ fetch pipeline (pump_fetches/apply_fetch_result) +schedule_*/enqueue_*+handle_processed.manifest_provider(Send-trait, cfg-async_trait),registry/{mod,provider}.rs(UnifiedRegistry::execute_manifest_job, multi-core spawn),manifest.rsoff-runtime parse helpers,http.rs4-pool fan-out.builder.rsentry chain →ManifestProvider;api.rs→build_deps_with_config_output;cache.rsadapters; moved graph-build helpers (process_dependency_with_resolvedetc.).user_config.rsnpmjs concurrency 256 +ruborist_context.rsresolver-concurrency wiring.PackageVersionsadaptation:state.full.cache/versions_cache/full.failures→set_package(Full/List/Failed);full.waiters/full.wake→park_on_package/wake_package;full.is_settled || versions_cache.contains→has_package_source.Status
Draft pending bench-gate. The whole point: confirm this still lands ~2.4s (matching #3028's proven 2.49s and the earlier #3081 2.40s). 182 tests pass + clippy clean. The
benchmarklabel triggersbench-phases-linux; I'll mark ready for review once the bench confirms parity with #3028.#3028 stays open as the reference baseline until this stack lands in
next.🤖 Generated with Claude Code