perf(pm): batch install clone completions#2989
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces batching for clone operations in the installation scheduler to improve performance by reducing scheduler wakeups. It replaces individual clone tasks with batches of up to three operations and introduces a worker-based concurrency limit. Feedback indicates that the heuristic used to calculate the worker limit causes total concurrency to scale non-linearly and potentially exceed intended limits. It is suggested to either refine the formula to account for the batch size or document the rationale for the current implementation.
| fn clone_worker_limit(clone_limit: usize) -> usize { | ||
| clone_limit | ||
| .saturating_div(2) | ||
| .saturating_add(2) | ||
| .clamp(1, clone_limit.max(1)) | ||
| } |
There was a problem hiding this comment.
The clone_worker_limit calculation uses a heuristic (limit / 2 + 2) that doesn't explicitly account for CLONE_BATCH_LIMIT. This results in a total potential concurrency (workers * batch size) that scales non-linearly with the original clone_limit. For example, a clone_limit of 4 results in 4 workers (up to 12 concurrent clones), while a clone_limit of 16 results in 10 workers (up to 30 concurrent clones).
If the intention is to maintain a total concurrency close to the original clone_limit while batching, consider a formula like (clone_limit / CLONE_BATCH_LIMIT).max(1). If the increased concurrency is intentional to saturate the Rayon pool, documenting the rationale for this specific heuristic would improve maintainability.
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.00s | 0.22s | 10.25s | 9.90s | 763M | 336.4K |
| utoo-next | 7.94s | 0.37s | 10.53s | 12.21s | 1002M | 123.0K |
| utoo-npm | 8.07s | 0.10s | 10.69s | 12.06s | 1.00G | 126.3K |
| utoo | 7.65s | 0.24s | 11.07s | 10.86s | 892M | 145.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.6K | 18.2K | 1.19G | 6M | 1.86G | 1.75G | 1M |
| utoo-next | 129.1K | 96.1K | 1.16G | 5M | 1.71G | 1.70G | 2M |
| utoo-npm | 129.5K | 97.9K | 1.16G | 5M | 1.71G | 1.70G | 2M |
| utoo | 81.0K | 51.4K | 1.16G | 6M | 1.71G | 1.70G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.09s | 0.08s | 3.92s | 1.15s | 521M | 186.7K |
| utoo-next | 3.11s | 0.36s | 5.05s | 2.17s | 611M | 85.5K |
| utoo-npm | 3.00s | 0.02s | 5.21s | 2.14s | 606M | 81.0K |
| utoo | 2.30s | 0.03s | 5.71s | 1.65s | 644M | 120.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 10.2K | 4.2K | 202M | 3M | 107M | - | 1M |
| utoo-next | 74.7K | 86.1K | 200M | 2M | 7M | 3M | 2M |
| utoo-npm | 73.9K | 92.3K | 200M | 2M | 7M | 3M | 2M |
| utoo | 15.8K | 19.7K | 202M | 3M | 7M | 3M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.68s | 0.06s | 6.31s | 9.56s | 581M | 192.1K |
| utoo-next | 6.48s | 0.32s | 5.04s | 10.74s | 505M | 60.0K |
| utoo-npm | 6.14s | 0.22s | 5.13s | 10.61s | 475M | 60.4K |
| utoo | 5.47s | 0.15s | 5.04s | 9.36s | 557M | 63.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 5.3K | 7.0K | 1019M | 4M | 1.76G | 1.76G | 1M |
| utoo-next | 115.0K | 56.6K | 989M | 3M | 1.70G | 1.70G | 2M |
| utoo-npm | 110.7K | 54.9K | 989M | 3M | 1.70G | 1.70G | 2M |
| utoo | 68.8K | 42.3K | 989M | 3M | 1.70G | 1.70G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.54s | 0.07s | 0.19s | 2.42s | 135M | 33.5K |
| utoo-next | 2.48s | 0.29s | 0.49s | 3.80s | 79M | 18.5K |
| utoo-npm | 2.38s | 0.08s | 0.47s | 3.73s | 77M | 17.7K |
| utoo | 2.13s | 0.05s | 0.34s | 3.26s | 50M | 11.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 241 | 25 | 5M | 37K | 1.91G | 1.75G | 1M |
| utoo-next | 43.7K | 21.4K | 5K | 4K | 1.70G | 1.70G | 2M |
| utoo-npm | 39.3K | 18.2K | 4K | 8K | 1.70G | 1.70G | 2M |
| utoo | 7.6K | 4.5K | 5K | 23K | 1.71G | 1.70G | 2M |
npmmirror.com: no output captured.
Summary
Why
p3/p4 are dominated by materialization pressure after the earlier install scheduler changes. Batching clone completions reduces scheduler wakeups, but parent packages are on the nested placement critical path. If a parent clone completes early inside a batch but its completion is reported only after other leaf clones finish, its children cannot enter the queue. The #2991 follow-up fixes that without disabling batching for ordinary leaf clones.
Validation
cargo fmtcargo test -p utoo-pm install_schedulercargo clippy -p utoo-pm --all-targets -- -D warnings --no-depsNote: full workspace
cargo clippy --all-targets -- -D warnings --no-depsis currently blocked in this worktree by pack-core/next.js submodule API mismatch, unrelated to the PM scheduler change.Benchmark
Latest #2989 reference before #2991 integration, GHA Linux npmjs run
26089858682:#2991 AB before integration, GHA Linux npmjs run
26093544293:Integrated #2989 head (
fd38cfff), GHA Linux npmjs run26094747554:Latest full green integrated run
26095851789:Conclusion: #2991 remains directionally positive for p4/warm materialize (
2.23s -> 2.07s/2.16s, ctx stable). p3 is neutral to slightly noisy, which is acceptable because this change targets clone/materialize ordering rather than cold extract/cache population.Additional GHA Reruns
Label-triggered Linux npmjs rerun
26102023925:Label-triggered Linux npmjs rerun
26102983933:Latest label-triggered Linux npmjs rerun
26110858368:Label-triggered Linux npmjs rerun
26112228488:2.07s/2.16s/2.08s/2.20s/2.11s, ctx about7.5K/4.5K). p3 is noisier because it includes network download plus cache extraction writes. Use multiple runs for p3 rather than a single sample.Label-triggered Linux npmjs rerun
26115911963(bench job succeeded):GHA pcap diagnostic run
26116045424(diagnostic-only; pcap overhead/noise makes wall time unsuitable for ranking):Conclusion from pcap: p3 cold install does not show an obvious socket-drain starvation issue for utoo; utoo has fewer zero-window/retransmit events than bun in the install capture. The stronger signal is disk-write tail latency (
io_util_max=97.3%,w_await_max=481ms), so the remaining p3 work should focus on cache/extract/write scheduling rather than raising generic tarball concurrency.Label-triggered Linux npmjs rerun
26117122045(bench job succeeded; p3 wall noisy):Label-triggered Linux npmjs rerun
26118716599(bench job succeeded):Rejected / Inconclusive Follow-ups
6.22s,56.8K/36.5K); reject for now.4.84s) but ctx regressed (59.0K/39.2K) and p4 no-op control moved heavily; inconclusive.p3 5.38s,p4 2.12s); extra completion state is not justified.5.75s,59.7K/33.9K); do not integrate.p3 5.68s,62.3K/40.6K; p0 also regressed); do not integrate.8.02s,78.6K/46.1K); do not integrate.8.03s,77.1K/40.2K; rerun6.35s,68.7K/39.8K); do not use as a global/non-semver default.p3 7.88s,82.2K/29.4K) and p4 control was noisy; do not integrate.5.56s,64.6K/41.2K) and p0 ctx regressed (90.5K/56.3K); poolab internal AB also showed higher ctx/sys. Do not integrate.4 * extract_limit6.12s,74.5K/43.4Kvs5.83s,59.7K/39.0K); poolab internal AB also showed slow-tarball tail risk. Do not integrate.Current pick status: #2991 is already integrated into this PR. No other follow-up experiment has enough GHA evidence to pick into #2989.
Label-triggered Linux npmjs rerun
26119947793(bench job succeeded):This keeps #2989 in the same GHA/npmjs band: p3 remains noisy but competitive on wall time, while p4 stays stable around
2.1sand7.5K/4.5Kctx. This run did not capture npmmirror output.