perf(pm): decouple manifest disk cache via ManifestStore trait#2866
perf(pm): decouple manifest disk cache via ManifestStore trait#2866
Conversation
…e trait
Ruborist now owns only the in-memory manifest cache; persistence is
delegated to a host-supplied `ManifestStore`. Writes are fire-and-forget
(detached `tokio::spawn`), so the resolver hot path no longer awaits disk
I/O. ETag-validated reads stay synchronous (load_versions on cold start).
- ruborist: new `service::store` module (`ManifestStore` + `NoopStore`),
`PackageCache` collapsed into pure `MemoryCache`, `build_deps` returns
`BuildDepsOutput { lock, project_cache }` instead of writing to disk.
- pm: new `util::manifest_store::DiskManifestStore` (fire-and-forget
spawn writers) and `util::project_cache` for the project-level
`.utoo-manifest.json`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the manifest caching architecture by decoupling persistent storage from the core resolver through a new ManifestStore trait. It introduces DiskManifestStore for file-based persistence and updates the resolution pipeline to handle project-level warm caches. Review feedback suggests making file writes atomic to prevent corruption, refining the MemoryCache API to distinguish between new instances and the global singleton, and optimizing ManifestStore lookups to avoid querying with version ranges.
- pm: lift `write_json_file` into util/json.rs; drop reinvented read/write helpers in manifest_store.rs and project_cache.rs. Read path now uses bytes + from_slice (skips UTF-8 validation). Write tries `write` first and only falls back to create_dir_all on NotFound, eliminating the per-store mkdir syscall on the warm-cache rewrite path. - ruborist: drop dead aliases (`ManifestCache`, the `ProjectCache` wrapper struct) and the `MemoryCache::global` constructor — all three were unused after the disk-cache split. Update doc strings to reflect memory/store/network instead of "three-tier". - ruborist: extract `prepopulate_warm_cache` helper to flatten the 4-deep nesting in `build_deps`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o-wasm The manifest-store decoupling refactor missed the WASM caller — wire NoopStore as the persistence backend (browser has no disk cache today) and unwrap the new BuildDepsOutput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 16.94s | 0.54s | 5.96s | 16.90s | 780M | 50.3K |
| utoo-npm | 18.33s | 2.00s | 9.02s | 19.93s | 1.03G | 102.9K |
| utoo | 21.29s | 5.63s | 11.06s | 25.37s | 1.14G | 104.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 16.3K | 139.4K | - | - | 1.76G | 1.91G | 1M |
| utoo-npm | 14.3K | 361.1K | - | - | 1.63G | 1.87G | 2M |
| utoo | 11.7K | 398.7K | - | - | 1.63G | 1.87G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.52s | 0.16s | 2.95s | 1.42s | 484M | 31.4K |
| utoo-npm | 6.49s | 0.92s | 4.89s | 2.84s | 546M | 37.1K |
| utoo | 5.89s | 1.58s | 4.21s | 2.43s | 567M | 38.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 29 | 20.0K | - | - | 110M | - | 1M |
| utoo-npm | 16 | 71.3K | - | - | 28M | 5M | 2M |
| utoo | 39 | 131.3K | - | - | 27M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 20.77s | 3.81s | 4.04s | 21.62s | 532M | 34.6K |
| utoo-npm | 18.90s | 1.58s | 4.83s | 21.20s | 829M | 75.1K |
| utoo | 16.48s | 1.27s | 4.59s | 21.05s | 744M | 78.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 6.2K | 139.1K | - | - | 1.70G | 1.94G | 1M |
| utoo-npm | 1.5K | 236.7K | - | - | 1.61G | 1.87G | 2M |
| utoo | 1.4K | 231.1K | - | - | 1.61G | 1.87G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 5.64s | 0.76s | 0.12s | 2.31s | 53M | 4.0K |
| utoo-npm | 4.78s | 0.71s | 0.63s | 3.20s | 94M | 6.8K |
| utoo | 4.59s | 0.61s | 0.59s | 3.31s | 89M | 6.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 17.7K | 1.3K | - | - | 1.86G | 1.91G | 1M |
| utoo-npm | 13.6K | 80.4K | - | - | 1.61G | 1.86G | 2M |
| utoo | 13.1K | 73.6K | - | - | 1.63G | 1.86G | 2M |
npmmirror.com
Mac runners are scarce and slow, and the npmmirror leg of the phase- isolated bench duplicates work for PR feedback. Restrict the auto- trigger (benchmark label) path to linux + npmjs; manual workflow_dispatch with target=pm-bench-phases still runs the full mac + npmjs/npmmirror matrix as an escape hatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- registry.rs: skip the ManifestStore lookup for non-exact specs. Persisted keys are resolved versions, so range/tag specs would always miss; on Windows the range chars (`*`, `>`, `<`) aren't valid filenames either. Gate on `Version::parse_from_npm(spec).is_ok()`. - cache.rs: drop `MemoryCache::new()`; the impl returned the global singleton, contradicting the usual `new()` semantics. Callers now use `PackageCache::default()` (same singleton, but the ambiguity is gone). - project_cache.rs: persist atomically via tmp + rename so a crash mid-write keeps the previous good cache intact. Per-package manifest writes intentionally stay non-atomic — they're individually opportunistic and the extra rename per-package would add ~10k syscalls on a cold install for no real safety win. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cargo fmt -p utoo-wasm only — the utoo-wasm crate sorts lowercase before uppercase, opposite of the workspace default we accidentally applied earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The tmp + rename pattern raced with `service::clean::remove_stale_entries`, which does `read_dir(node_modules)` followed by `symlink_metadata` per entry. The spawned save creates `.utoo-manifest.json.tmp` between those two calls; if the rename fires between read_dir and stat, the per-entry stat hits ENOENT and install fails with `No such file or directory`. Reverts to the simpler non-atomic write_json_file path. Gemini's crash-safety concern remains valid in principle, but the manifest cache is opportunistic — a corrupt half-written file simply fails the next parse and falls through to a fresh resolve. That's a much smaller failure mode than racing clean_deps and breaking install on every regen. Reproduced locally via `e2e/pm/stale-lockfile`; reverting fixes it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The helper bundled serialize + mkdir-retry + write to serve only two fire-and-forget cache writers, while the rest of the codebase keeps JSON writes inline (lock.rs, init.rs, rebuild.rs, migrate.rs). Drop the helper: - manifest_store.rs: inline `write_json` private fn that keeps the try-write-then-mkdir-on-NotFound trick (matters on cold installs with thousands of writes per package). - project_cache.rs: plain `create_dir_all + to_vec + write`. One write per install, mkdir is trivial. `read_json_file` stays — it's a real primitive with five+ callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 7.43s | 0.07s | 9.90s | 5.93s | 570M | 278.0K |
| utoo-npm | 9.28s | 0.29s | 10.75s | 8.53s | 1.34G | 158.9K |
| utoo | 8.85s | 0.85s | 10.90s | 8.40s | 1.24G | 163.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 27.3K | 15.6K | 1.16G | 8M | 1.83G | 1.72G | 1M |
| utoo-npm | 176.8K | 150.5K | 1.14G | 5M | 1.68G | 1.68G | 2M |
| utoo | 191.8K | 139.6K | 1.13G | 5M | 1.68G | 1.67G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.51s | 0.02s | 4.04s | 0.79s | 469M | 179.6K |
| utoo-npm | 5.04s | 0.03s | 4.62s | 1.20s | 431M | 71.9K |
| utoo | 5.13s | 0.10s | 4.78s | 1.43s | 441M | 80.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.1K | 3.6K | 201M | 3M | 104M | - | 1M |
| utoo-npm | 73.0K | 1.9K | 202M | 2M | 9M | 5M | 2M |
| utoo | 107.1K | 6.3K | 197M | 2M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 5.99s | 1.39s | 5.74s | 6.09s | 539M | 191.2K |
| utoo-npm | 6.97s | 1.79s | 5.41s | 7.01s | 877M | 111.0K |
| utoo | 5.59s | 0.54s | 5.37s | 6.95s | 976M | 113.3K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 12.1K | 7.2K | 993M | 5M | 1.73G | 1.73G | 1M |
| utoo-npm | 118.8K | 67.0K | 964M | 3M | 1.67G | 1.67G | 2M |
| utoo | 109.0K | 73.8K | 964M | 3M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.03s | 0.07s | 0.16s | 1.19s | 137M | 32.4K |
| utoo-npm | 1.85s | 0.08s | 0.54s | 2.39s | 82M | 19.0K |
| utoo | 1.84s | 0.28s | 0.55s | 2.40s | 83M | 18.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 388 | 22 | 7M | 62K | 1.88G | 1.72G | 1M |
| utoo-npm | 50.2K | 18.4K | 52K | 17K | 1.67G | 1.67G | 2M |
| utoo | 50.4K | 18.2K | 50K | 25K | 1.68G | 1.67G | 2M |
npmmirror.com: no output captured.
PR #2866 (refactor/manifest-store-decoupling) landed on next while we were iterating. It restructures the manifest cache surface that this PR also heavily edits — `service/cache.rs` collapses to a pure `MemoryCache` (disk I/O moves out into the host's `ManifestStore`), `service/registry.rs` takes an injected `Arc<dyn ManifestStore>`, `build_deps` returns `BuildDepsOutput` with the project cache, and `model::FullManifest::versions` becomes `Vec<String>` (was the `Versions { keys, .. }` struct). Resolution: - service/{cache,registry,api}.rs, model/manifest.rs, traits/registry.rs: take next's verbatim — its API is the canonical post-#2866 shape and supersedes the older PR-side variants. - traits/registry.rs: re-add the `MaybeSend` / `MaybeSync` markers + the `+ MaybeSend` bound on `fetch_full_manifest` / `fetch_version_manifest` / `resolve_package` so the native preload worker-pool can drive these futures through `tokio::spawn`. Also `#[derive(Clone)]` on `MockRegistryClient` + `MockPackage` (worker-pool requires `R: Clone`). - resolver/registry.rs: realign `resolve_from_manifest` with the new model — `manifest.versions.clone()` instead of `.versions.keys.clone()`, and wrap `get_core_version`'s return with `Arc::new` for the new `ResolvedPackage::manifest: Arc<CoreVersionManifest>`. - pm/cmd/view.rs: same `versions.keys` → `versions` adjustment. - Cargo.toml: keep our `crossbeam-queue` (worker-pool SegQueue) + macOS-only aws-lc-rs + `rt-multi-thread` test feature alongside the new `async-trait` workspace dep (for `ManifestStore`). Drop the wasm-bindgen-futures dep we added in round-3 — `service/cache.rs` no longer spawns disk writes in ruborist (host's `DiskManifestStore` does it now), so the `spawn_disk_write` helper is gone. cargo clippy --all-targets -- -D warnings clean. 245 pm + 162 ruborist + 10 doctests pass (+ flaky SOCKS network test).
A local symlink (`next.js -> /Users/elr/code/utoo/next.js`) used to share the worktree's submodule with the main repo's checkout got committed in the PR #2866 merge instead of the submodule gitlink, so CI couldn't find any of the next.js sources (turbopack crates, etc.) and every job failed at `failed to get turbo-bincode as a dependency`. Restore the gitlink to next's ref `6f6135c5` (the same SHA upstream pins).
Summary
ruboristvia a newManifestStoretrait. ruborist now owns only the in-memory tier; the host (pm) provides persistence.tokio::spawndetached), so the resolver hot path no longer awaits disk I/O. ETag-validated reads stay synchronous (cold-start load required for ETag).node_modules/.utoo-manifest.json) load/save also moved out of ruborist;build_depsnow returnsBuildDepsOutput { lock, project_cache }and pm persists it in the background.What changed
ruborist
service::store(ManifestStore+NoopStore)PackageCachecollapsed into pureMemoryCache(alias kept)UnifiedRegistrybuilder takes.store(Arc<dyn ManifestStore>)instead of.cache_dir(...)build_depsreturnsBuildDepsOutput, no file I/O inside ruboristpm
util::manifest_store::DiskManifestStore— fire-and-forget writesutil::project_cachefor.utoo-manifest.jsonhelper/ruborist_context.rswires the store + warm cache + background savecmd/deps.rs,helper/lock.rs,service/pipeline/mod.rsTest plan
cargo fmt --checkcargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-depscargo test -p utoo-ruborist(162 passed)cargo test -p utoo-pm(245 passed)next(especially cold install — disk writes are now off the critical path; should be neutral-to-better)🤖 Generated with Claude Code