perf(pm): prefetch lockfile registry downloads#2973
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the package cloning process by transitioning from asynchronous operations to a synchronous model executed within a Rayon worker pool. Key changes include the introduction of synchronous cloning primitives and the use of an MPSC channel for task completion signaling. Review feedback highlights a potential issue with error handling in directory creation and a regression in error context reporting when accessing source metadata.
| if created_dirs.insert(dir.clone()) | ||
| && let Err(e) = fs::create_dir_all(dir) | ||
| && e.kind() != io::ErrorKind::AlreadyExists | ||
| { | ||
| return Err(e).with_context(|| err_msg.clone()); | ||
| } | ||
| } |
There was a problem hiding this comment.
The check e.kind() != io::ErrorKind::AlreadyExists is likely incorrect here. According to the std::fs::create_dir_all documentation, the function returns Ok(()) if the directory already exists. If it returns an error with AlreadyExists, it typically means a non-directory file exists at that path. Swallowing this error will cause the clone operation to proceed and fail later with a more confusing error (e.g., NotADirectory when trying to create files inside it).
| pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> { | ||
| let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display()); | ||
|
|
||
| if !fs::metadata(src)?.is_dir() { | ||
| return Err(io::Error::new( | ||
| io::ErrorKind::NotADirectory, | ||
| "Source is not a directory", | ||
| )) | ||
| .with_context(|| err_msg); | ||
| } |
There was a problem hiding this comment.
The error returned by fs::metadata(src)? is not wrapped in the err_msg context. This is a regression from the previous implementation where the entire block was wrapped. If the source directory is missing, the error message will be a generic 'No such file or directory' instead of including the specific source and destination paths.
| pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> { | |
| let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display()); | |
| if !fs::metadata(src)?.is_dir() { | |
| return Err(io::Error::new( | |
| io::ErrorKind::NotADirectory, | |
| "Source is not a directory", | |
| )) | |
| .with_context(|| err_msg); | |
| } | |
| pub fn clone_dir_sync(src: &Path, dst: &Path) -> Result<()> { | |
| let err_msg = format!("Failed to clone {} to {}", src.display(), dst.display()); | |
| let metadata = fs::metadata(src).with_context(|| err_msg.clone())?; | |
| if !metadata.is_dir() { | |
| return Err(io::Error::new( | |
| io::ErrorKind::NotADirectory, | |
| "Source is not a directory", | |
| )) | |
| .with_context(|| err_msg); | |
| } |
|
Benchmark readout for run 26018906981 (npmjs, label-triggered):
Conclusion: this is the best p4 result so far and p3 wall is also the best observed in this loop. The remaining question is p3 iCtx: prefetching lockfile downloads starts network/extract earlier and wins wall, but #2972 had lower p3 iCtx. I am re-running N=2 before folding, and then we can decide whether to keep this alone or test it combined with extract-on-rayon. |
|
N=2 readout for run 26019469621 (npmjs, label-triggered):
Updated conclusion: lockfile registry prefetch is a strong p4 optimization and keeps p4 ctx below the clone-worker-only result. p3 wall was excellent in N=1 but not stable in N=2, so this should not be treated as the p3 answer by itself. Next experiment should combine this with #2972's extract worker to see if we can keep the p4 win and recover #2972-level p3 ctx. |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.13s | 0.19s | 10.56s | 10.40s | 670M | 319.7K |
| utoo-next | 8.89s | 0.74s | 10.74s | 12.86s | 1010M | 122.3K |
| utoo-npm | 8.26s | 0.12s | 10.92s | 12.61s | 988M | 124.6K |
| utoo | 8.92s | 1.78s | 11.40s | 12.55s | 917M | 140.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 17.1K | 19.4K | 1.20G | 7M | 1.88G | 1.76G | 1M |
| utoo-next | 142.6K | 105.4K | 1.17G | 5M | 1.73G | 1.72G | 2M |
| utoo-npm | 130.3K | 97.1K | 1.17G | 5M | 1.73G | 1.72G | 2M |
| utoo | 112.8K | 67.9K | 1.18G | 7M | 1.72G | 1.72G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.41s | 0.10s | 3.97s | 1.13s | 480M | 185.8K |
| utoo-next | 3.32s | 0.06s | 5.75s | 2.23s | 624M | 90.8K |
| utoo-npm | 3.39s | 0.09s | 5.78s | 2.28s | 620M | 86.8K |
| utoo | 2.58s | 0.04s | 6.18s | 1.81s | 660M | 121.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 12.4K | 3.7K | 203M | 3M | 108M | - | 1M |
| utoo-next | 79.4K | 91.9K | 201M | 3M | 7M | 3M | 2M |
| utoo-npm | 79.5K | 94.6K | 201M | 3M | 7M | 3M | 2M |
| utoo | 15.7K | 19.8K | 204M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.88s | 0.18s | 6.44s | 10.24s | 573M | 204.5K |
| utoo-next | 6.31s | 0.23s | 5.06s | 11.15s | 460M | 59.0K |
| utoo-npm | 6.43s | 0.20s | 5.21s | 10.97s | 484M | 59.8K |
| utoo | 5.91s | 0.15s | 4.96s | 10.90s | 484M | 58.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 6.6K | 8.1K | 1.00G | 4M | 1.77G | 1.77G | 1M |
| utoo-next | 102.6K | 54.0K | 999M | 3M | 1.72G | 1.72G | 2M |
| utoo-npm | 99.3K | 51.7K | 999M | 2M | 1.72G | 1.72G | 2M |
| utoo | 85.2K | 55.4K | 999M | 3M | 1.72G | 1.72G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.34s | 0.01s | 0.20s | 2.35s | 135M | 32.3K |
| utoo-next | 2.37s | 0.02s | 0.50s | 3.88s | 81M | 18.9K |
| utoo-npm | 2.31s | 0.14s | 0.52s | 3.85s | 81M | 19.0K |
| utoo | 2.05s | 0.01s | 0.41s | 3.51s | 51M | 11.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 343 | 25 | 5M | 23K | 1.93G | 1.74G | 1M |
| utoo-next | 43.8K | 20.2K | 308K | 28K | 1.72G | 1.72G | 3M |
| utoo-npm | 42.9K | 20.2K | 306K | 10K | 1.72G | 1.72G | 3M |
| utoo | 18.1K | 12.0K | 305K | 4K | 1.73G | 1.72G | 3M |
npmmirror.com: no output captured.
Summary
Hypothesis
For p3, lockfile install already knows every registry tarball URL, so downloads should start before the depth-ordered clone loop reaches each package. For p4, cache-hit probes can complete ahead of clone requests, reducing per-package tokio seeded-probe churn.
Validation
Benchmark
N=1 run 26018906981: p3 wall 4.98s, p4 wall 1.42s, p4 ctx 17.5K / 10.7K.
N=2 run 26019469621 (npmjs):
Conclusion: keep as p4 candidate. For p3, test a combined branch with #2972's extract worker before folding.