Skip to content

perf(pm): prefetch lockfile registry downloads#2973

Draft
elrrrrrrr wants to merge 3 commits into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-skip-seeded-registry-probe-b33d922
Draft

perf(pm): prefetch lockfile registry downloads#2973
elrrrrrrr wants to merge 3 commits into
perf/pm-resolver-demand-bfsfrom
exp/pm-install-skip-seeded-registry-probe-b33d922

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 18, 2026

Summary

  • build on the clone-worker experiment (perf(pm): run install clone workers on rayon #2965)
  • prefetch registry tarball downloads/cache lookups from an existing package-lock before depth-ordered cloning starts
  • when a clone target already has scheduler download state for the same name@version, attach directly to that state and skip the seeded HTTP slot probe
  • keep seeded cache probing for git/file/http tarball paths that are not recognized as current/npmjs/npmmirror registry tarballs

Hypothesis

For p3, lockfile install already knows every registry tarball URL, so downloads should start before the depth-ordered clone loop reaches each package. For p4, cache-hit probes can complete ahead of clone requests, reducing per-package tokio seeded-probe churn.

Validation

  • cargo fmt
  • cargo test -p utoo-pm service::install::tests::test_is_prefetchable_registry_tarball
  • cargo test -p utoo-pm service::install_scheduler::tests
  • cargo test -p utoo-pm util::cloner::tests
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Benchmark

N=1 run 26018906981: p3 wall 4.98s, p4 wall 1.42s, p4 ctx 17.5K / 10.7K.

N=2 run 26019469621 (npmjs):

phase utoo same-run utoo-next same-run utoo-npm bun read
p3 wall 6.54s ±1.45 6.01s ±0.26 6.57s ±0.97 6.86s ±0.16 mixed / unstable
p3 ctx 88.4K / 53.7K 101.2K / 48.3K 102.3K / 53.2K 3.7K / 7.0K vCtx positive; iCtx mixed
p4 wall 2.20s ±0.10 2.46s ±0.07 2.43s ±0.04 3.57s ±0.06 stable positive
p4 ctx 17.7K / 11.8K 42.8K / 19.9K 40.8K / 18.0K 279 / 22 stable positive

Conclusion: keep as p4 candidate. For p3, test a combined branch with #2972's extract worker before folding.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 18, 2026
@elrrrrrrr elrrrrrrr changed the title perf(pm): skip seeded probe for registry installs perf(pm): prefetch lockfile registry downloads May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the package cloning process by transitioning from asynchronous operations to a synchronous model executed within a Rayon worker pool. Key changes include the introduction of synchronous cloning primitives and the use of an MPSC channel for task completion signaling. Review feedback highlights a potential issue with error handling in directory creation and a regression in error context reporting when accessing source metadata.

Comment on lines +128 to +134
if created_dirs.insert(dir.clone())
&& let Err(e) = fs::create_dir_all(dir)
&& e.kind() != io::ErrorKind::AlreadyExists
{
return Err(e).with_context(|| err_msg.clone());
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The check e.kind() != io::ErrorKind::AlreadyExists is likely incorrect here. According to the std::fs::create_dir_all documentation, the function returns Ok(()) if the directory already exists. If it returns an error with AlreadyExists, it typically means a non-directory file exists at that path. Swallowing this error will cause the clone operation to proceed and fail later with a more confusing error (e.g., NotADirectory when trying to create files inside it).

Comment on lines +107 to +116
pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> {
let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display());

if !fs::metadata(src)?.is_dir() {
return Err(io::Error::new(
io::ErrorKind::NotADirectory,
"Source is not a directory",
))
.with_context(|| err_msg);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error returned by fs::metadata(src)? is not wrapped in the err_msg context. This is a regression from the previous implementation where the entire block was wrapped. If the source directory is missing, the error message will be a generic 'No such file or directory' instead of including the specific source and destination paths.

Suggested change
pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> {
let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display());
if !fs::metadata(src)?.is_dir() {
return Err(io::Error::new(
io::ErrorKind::NotADirectory,
"Source is not a directory",
))
.with_context(|| err_msg);
}
pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> {
let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display());
let metadata = fs::metadata(src).with_context(|| err_msg.clone())?;
if !metadata.is_dir() {
return Err(io::Error::new(
io::ErrorKind::NotADirectory,
"Source is not a directory",
))
.with_context(|| err_msg);
}

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

Benchmark readout for run 26018906981 (npmjs, label-triggered):

phase utoo same-run utoo-next same-run utoo-npm bun read
p3 wall 4.98s ±0.32 5.90s ±0.33 9.10s ±5.59 5.31s ±0.23 positive, current best wall
p3 ctx 98.6K / 57.8K 123.7K / 54.3K 129.3K / 49.5K 9.8K / 7.5K vCtx improves; iCtx mixed
p4 wall 1.42s ±0.06 1.79s ±0.05 1.99s ±0.19 2.06s ±0.03 strong positive
p4 ctx 17.5K / 10.7K 39.6K / 16.3K 39.9K / 16.5K 303 / 16 strong positive, better than #2965

Conclusion: this is the best p4 result so far and p3 wall is also the best observed in this loop. The remaining question is p3 iCtx: prefetching lockfile downloads starts network/extract earlier and wins wall, but #2972 had lower p3 iCtx. I am re-running N=2 before folding, and then we can decide whether to keep this alone or test it combined with extract-on-rayon.

@elrrrrrrr elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 18, 2026
@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

N=2 readout for run 26019469621 (npmjs, label-triggered):

phase utoo same-run utoo-next same-run utoo-npm bun read
p3 wall 6.54s ±1.45 6.01s ±0.26 6.57s ±0.97 6.86s ±0.16 mixed / unstable
p3 ctx 88.4K / 53.7K 101.2K / 48.3K 102.3K / 53.2K 3.7K / 7.0K vCtx positive; iCtx mixed
p4 wall 2.20s ±0.10 2.46s ±0.07 2.43s ±0.04 3.57s ±0.06 stable positive
p4 ctx 17.7K / 11.8K 42.8K / 19.9K 40.8K / 18.0K 279 / 22 stable positive, current best ctx class

Updated conclusion: lockfile registry prefetch is a strong p4 optimization and keeps p4 ctx below the clone-worker-only result. p3 wall was excellent in N=1 but not stable in N=2, so this should not be treated as the p3 answer by itself. Next experiment should combine this with #2972's extract worker to see if we can keep the p4 win and recover #2972-level p3 ctx.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 84933fe · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.13s 0.19s 10.56s 10.40s 670M 319.7K
utoo-next 8.89s 0.74s 10.74s 12.86s 1010M 122.3K
utoo-npm 8.26s 0.12s 10.92s 12.61s 988M 124.6K
utoo 8.92s 1.78s 11.40s 12.55s 917M 140.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.1K 19.4K 1.20G 7M 1.88G 1.76G 1M
utoo-next 142.6K 105.4K 1.17G 5M 1.73G 1.72G 2M
utoo-npm 130.3K 97.1K 1.17G 5M 1.73G 1.72G 2M
utoo 112.8K 67.9K 1.18G 7M 1.72G 1.72G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.41s 0.10s 3.97s 1.13s 480M 185.8K
utoo-next 3.32s 0.06s 5.75s 2.23s 624M 90.8K
utoo-npm 3.39s 0.09s 5.78s 2.28s 620M 86.8K
utoo 2.58s 0.04s 6.18s 1.81s 660M 121.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 12.4K 3.7K 203M 3M 108M - 1M
utoo-next 79.4K 91.9K 201M 3M 7M 3M 2M
utoo-npm 79.5K 94.6K 201M 3M 7M 3M 2M
utoo 15.7K 19.8K 204M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.88s 0.18s 6.44s 10.24s 573M 204.5K
utoo-next 6.31s 0.23s 5.06s 11.15s 460M 59.0K
utoo-npm 6.43s 0.20s 5.21s 10.97s 484M 59.8K
utoo 5.91s 0.15s 4.96s 10.90s 484M 58.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 6.6K 8.1K 1.00G 4M 1.77G 1.77G 1M
utoo-next 102.6K 54.0K 999M 3M 1.72G 1.72G 2M
utoo-npm 99.3K 51.7K 999M 2M 1.72G 1.72G 2M
utoo 85.2K 55.4K 999M 3M 1.72G 1.72G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.34s 0.01s 0.20s 2.35s 135M 32.3K
utoo-next 2.37s 0.02s 0.50s 3.88s 81M 18.9K
utoo-npm 2.31s 0.14s 0.52s 3.85s 81M 19.0K
utoo 2.05s 0.01s 0.41s 3.51s 51M 11.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 343 25 5M 23K 1.93G 1.74G 1M
utoo-next 43.8K 20.2K 308K 28K 1.72G 1.72G 3M
utoo-npm 42.9K 20.2K 306K 10K 1.72G 1.72G 3M
utoo 18.1K 12.0K 305K 4K 1.73G 1.72G 3M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant