Skip to content

perf(pm): decouple manifest disk cache via ManifestStore trait#2866

Merged
elrrrrrrr merged 8 commits intonextfrom
refactor/manifest-store-decoupling
Apr 29, 2026
Merged

perf(pm): decouple manifest disk cache via ManifestStore trait#2866
elrrrrrrr merged 8 commits intonextfrom
refactor/manifest-store-decoupling

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Summary

  • Decouples the manifest disk-cache I/O from ruborist via a new ManifestStore trait. ruborist now owns only the in-memory tier; the host (pm) provides persistence.
  • Disk writes are fire-and-forget (tokio::spawn detached), so the resolver hot path no longer awaits disk I/O. ETag-validated reads stay synchronous (cold-start load required for ETag).
  • Project cache (node_modules/.utoo-manifest.json) load/save also moved out of ruborist; build_deps now returns BuildDepsOutput { lock, project_cache } and pm persists it in the background.

What changed

ruborist

  • New service::store (ManifestStore + NoopStore)
  • PackageCache collapsed into pure MemoryCache (alias kept)
  • UnifiedRegistry builder takes .store(Arc<dyn ManifestStore>) instead of .cache_dir(...)
  • build_deps returns BuildDepsOutput, no file I/O inside ruborist

pm

  • New util::manifest_store::DiskManifestStore — fire-and-forget writes
  • New util::project_cache for .utoo-manifest.json
  • helper/ruborist_context.rs wires the store + warm cache + background save
  • Callers updated: cmd/deps.rs, helper/lock.rs, service/pipeline/mod.rs

Test plan

  • cargo fmt --check
  • cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps
  • cargo test -p utoo-ruborist (162 passed)
  • cargo test -p utoo-pm (245 passed)
  • benchmark CI — checking for regressions vs next (especially cold install — disk writes are now off the critical path; should be neutral-to-better)

🤖 Generated with Claude Code

…e trait

Ruborist now owns only the in-memory manifest cache; persistence is
delegated to a host-supplied `ManifestStore`. Writes are fire-and-forget
(detached `tokio::spawn`), so the resolver hot path no longer awaits disk
I/O. ETag-validated reads stay synchronous (load_versions on cold start).

- ruborist: new `service::store` module (`ManifestStore` + `NoopStore`),
  `PackageCache` collapsed into pure `MemoryCache`, `build_deps` returns
  `BuildDepsOutput { lock, project_cache }` instead of writing to disk.
- pm: new `util::manifest_store::DiskManifestStore` (fire-and-forget
  spawn writers) and `util::project_cache` for the project-level
  `.utoo-manifest.json`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added the benchmark Run pm-bench on PR label Apr 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the manifest caching architecture by decoupling persistent storage from the core resolver through a new ManifestStore trait. It introduces DiskManifestStore for file-based persistence and updates the resolution pipeline to handle project-level warm caches. Review feedback suggests making file writes atomic to prevent corruption, refining the MemoryCache API to distinguish between new instances and the global singleton, and optimizing ManifestStore lookups to avoid querying with version ranges.

Comment thread crates/ruborist/src/service/registry.rs
Comment thread crates/ruborist/src/service/cache.rs Outdated
Comment thread crates/pm/src/util/manifest_store.rs Outdated
Comment thread crates/pm/src/util/project_cache.rs Outdated
elrrrrrrr and others added 2 commits April 29, 2026 14:52
- pm: lift `write_json_file` into util/json.rs; drop reinvented read/write
  helpers in manifest_store.rs and project_cache.rs. Read path now uses
  bytes + from_slice (skips UTF-8 validation). Write tries `write` first
  and only falls back to create_dir_all on NotFound, eliminating the
  per-store mkdir syscall on the warm-cache rewrite path.
- ruborist: drop dead aliases (`ManifestCache`, the `ProjectCache`
  wrapper struct) and the `MemoryCache::global` constructor — all three
  were unused after the disk-cache split. Update doc strings to reflect
  memory/store/network instead of "three-tier".
- ruborist: extract `prepopulate_warm_cache` helper to flatten the
  4-deep nesting in `build_deps`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o-wasm

The manifest-store decoupling refactor missed the WASM caller — wire
NoopStore as the persistence backend (browser has no disk cache today)
and unwrap the new BuildDepsOutput.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · d9b6880 · mac (macos-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 16.94s 0.54s 5.96s 16.90s 780M 50.3K
utoo-npm 18.33s 2.00s 9.02s 19.93s 1.03G 102.9K
utoo 21.29s 5.63s 11.06s 25.37s 1.14G 104.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 16.3K 139.4K - - 1.76G 1.91G 1M
utoo-npm 14.3K 361.1K - - 1.63G 1.87G 2M
utoo 11.7K 398.7K - - 1.63G 1.87G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.52s 0.16s 2.95s 1.42s 484M 31.4K
utoo-npm 6.49s 0.92s 4.89s 2.84s 546M 37.1K
utoo 5.89s 1.58s 4.21s 2.43s 567M 38.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 29 20.0K - - 110M - 1M
utoo-npm 16 71.3K - - 28M 5M 2M
utoo 39 131.3K - - 27M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 20.77s 3.81s 4.04s 21.62s 532M 34.6K
utoo-npm 18.90s 1.58s 4.83s 21.20s 829M 75.1K
utoo 16.48s 1.27s 4.59s 21.05s 744M 78.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 6.2K 139.1K - - 1.70G 1.94G 1M
utoo-npm 1.5K 236.7K - - 1.61G 1.87G 2M
utoo 1.4K 231.1K - - 1.61G 1.87G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 5.64s 0.76s 0.12s 2.31s 53M 4.0K
utoo-npm 4.78s 0.71s 0.63s 3.20s 94M 6.8K
utoo 4.59s 0.61s 0.59s 3.31s 89M 6.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.7K 1.3K - - 1.86G 1.91G 1M
utoo-npm 13.6K 80.4K - - 1.61G 1.86G 2M
utoo 13.1K 73.6K - - 1.63G 1.86G 2M

npmmirror.com

elrrrrrrr and others added 5 commits April 29, 2026 15:22
Mac runners are scarce and slow, and the npmmirror leg of the phase-
isolated bench duplicates work for PR feedback. Restrict the auto-
trigger (benchmark label) path to linux + npmjs; manual workflow_dispatch
with target=pm-bench-phases still runs the full mac + npmjs/npmmirror
matrix as an escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- registry.rs: skip the ManifestStore lookup for non-exact specs.
  Persisted keys are resolved versions, so range/tag specs would always
  miss; on Windows the range chars (`*`, `>`, `<`) aren't valid filenames
  either. Gate on `Version::parse_from_npm(spec).is_ok()`.
- cache.rs: drop `MemoryCache::new()`; the impl returned the global
  singleton, contradicting the usual `new()` semantics. Callers now use
  `PackageCache::default()` (same singleton, but the ambiguity is gone).
- project_cache.rs: persist atomically via tmp + rename so a crash
  mid-write keeps the previous good cache intact. Per-package manifest
  writes intentionally stay non-atomic — they're individually
  opportunistic and the extra rename per-package would add ~10k syscalls
  on a cold install for no real safety win.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cargo fmt -p utoo-wasm only — the utoo-wasm crate sorts lowercase
before uppercase, opposite of the workspace default we accidentally
applied earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The tmp + rename pattern raced with `service::clean::remove_stale_entries`,
which does `read_dir(node_modules)` followed by `symlink_metadata` per
entry. The spawned save creates `.utoo-manifest.json.tmp` between those
two calls; if the rename fires between read_dir and stat, the per-entry
stat hits ENOENT and install fails with `No such file or directory`.

Reverts to the simpler non-atomic write_json_file path. Gemini's
crash-safety concern remains valid in principle, but the manifest cache
is opportunistic — a corrupt half-written file simply fails the next
parse and falls through to a fresh resolve. That's a much smaller failure
mode than racing clean_deps and breaking install on every regen.

Reproduced locally via `e2e/pm/stale-lockfile`; reverting fixes it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The helper bundled serialize + mkdir-retry + write to serve only two
fire-and-forget cache writers, while the rest of the codebase keeps
JSON writes inline (lock.rs, init.rs, rebuild.rs, migrate.rs). Drop
the helper:

- manifest_store.rs: inline `write_json` private fn that keeps the
  try-write-then-mkdir-on-NotFound trick (matters on cold installs
  with thousands of writes per package).
- project_cache.rs: plain `create_dir_all + to_vec + write`. One
  write per install, mkdir is trivial.

`read_json_file` stays — it's a real primitive with five+ callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added the A-Pkg Manager Area: Package Manager label Apr 29, 2026
@elrrrrrrr elrrrrrrr marked this pull request as ready for review April 29, 2026 09:44
@elrrrrrrr elrrrrrrr enabled auto-merge (squash) April 29, 2026 09:50
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · d6fe4fb · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 7.43s 0.07s 9.90s 5.93s 570M 278.0K
utoo-npm 9.28s 0.29s 10.75s 8.53s 1.34G 158.9K
utoo 8.85s 0.85s 10.90s 8.40s 1.24G 163.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 27.3K 15.6K 1.16G 8M 1.83G 1.72G 1M
utoo-npm 176.8K 150.5K 1.14G 5M 1.68G 1.68G 2M
utoo 191.8K 139.6K 1.13G 5M 1.68G 1.67G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.51s 0.02s 4.04s 0.79s 469M 179.6K
utoo-npm 5.04s 0.03s 4.62s 1.20s 431M 71.9K
utoo 5.13s 0.10s 4.78s 1.43s 441M 80.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.1K 3.6K 201M 3M 104M - 1M
utoo-npm 73.0K 1.9K 202M 2M 9M 5M 2M
utoo 107.1K 6.3K 197M 2M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 5.99s 1.39s 5.74s 6.09s 539M 191.2K
utoo-npm 6.97s 1.79s 5.41s 7.01s 877M 111.0K
utoo 5.59s 0.54s 5.37s 6.95s 976M 113.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 12.1K 7.2K 993M 5M 1.73G 1.73G 1M
utoo-npm 118.8K 67.0K 964M 3M 1.67G 1.67G 2M
utoo 109.0K 73.8K 964M 3M 1.67G 1.67G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 2.03s 0.07s 0.16s 1.19s 137M 32.4K
utoo-npm 1.85s 0.08s 0.54s 2.39s 82M 19.0K
utoo 1.84s 0.28s 0.55s 2.40s 83M 18.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 388 22 7M 62K 1.88G 1.72G 1M
utoo-npm 50.2K 18.4K 52K 17K 1.67G 1.67G 2M
utoo 50.4K 18.2K 50K 25K 1.68G 1.67G 2M

npmmirror.com: no output captured.

@elrrrrrrr elrrrrrrr merged commit b26ca67 into next Apr 29, 2026
46 checks passed
@elrrrrrrr elrrrrrrr deleted the refactor/manifest-store-decoupling branch April 29, 2026 10:10
elrrrrrrr added a commit that referenced this pull request Apr 29, 2026
PR #2866 (refactor/manifest-store-decoupling) landed on next while we were
iterating. It restructures the manifest cache surface that this PR also
heavily edits — `service/cache.rs` collapses to a pure `MemoryCache` (disk
I/O moves out into the host's `ManifestStore`), `service/registry.rs` takes
an injected `Arc<dyn ManifestStore>`, `build_deps` returns `BuildDepsOutput`
with the project cache, and `model::FullManifest::versions` becomes
`Vec<String>` (was the `Versions { keys, .. }` struct).

Resolution:
- service/{cache,registry,api}.rs, model/manifest.rs, traits/registry.rs:
  take next's verbatim — its API is the canonical post-#2866 shape and
  supersedes the older PR-side variants.
- traits/registry.rs: re-add the `MaybeSend` / `MaybeSync` markers + the
  `+ MaybeSend` bound on `fetch_full_manifest` / `fetch_version_manifest` /
  `resolve_package` so the native preload worker-pool can drive these
  futures through `tokio::spawn`. Also `#[derive(Clone)]` on
  `MockRegistryClient` + `MockPackage` (worker-pool requires `R: Clone`).
- resolver/registry.rs: realign `resolve_from_manifest` with the new
  model — `manifest.versions.clone()` instead of `.versions.keys.clone()`,
  and wrap `get_core_version`'s return with `Arc::new` for the new
  `ResolvedPackage::manifest: Arc<CoreVersionManifest>`.
- pm/cmd/view.rs: same `versions.keys` → `versions` adjustment.
- Cargo.toml: keep our `crossbeam-queue` (worker-pool SegQueue) + macOS-only
  aws-lc-rs + `rt-multi-thread` test feature alongside the new
  `async-trait` workspace dep (for `ManifestStore`). Drop the
  wasm-bindgen-futures dep we added in round-3 — `service/cache.rs` no
  longer spawns disk writes in ruborist (host's `DiskManifestStore` does
  it now), so the `spawn_disk_write` helper is gone.

cargo clippy --all-targets -- -D warnings clean.
245 pm + 162 ruborist + 10 doctests pass (+ flaky SOCKS network test).
elrrrrrrr added a commit that referenced this pull request Apr 29, 2026
A local symlink (`next.js -> /Users/elr/code/utoo/next.js`) used to share
the worktree's submodule with the main repo's checkout got committed in the
PR #2866 merge instead of the submodule gitlink, so CI couldn't find any
of the next.js sources (turbopack crates, etc.) and every job failed at
`failed to get turbo-bincode as a dependency`. Restore the gitlink to next's
ref `6f6135c5` (the same SHA upstream pins).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants