perf(pm): batch install clone completions by elrrrrrrr · Pull Request #2989 · utooland/utoo

elrrrrrrr · 2026-05-19T08:55:46Z

Summary

Batch install clone completions so the scheduler receives one completion message for a small group of clone jobs instead of one per package.
Keep clone materialization on rayon workers; the scheduler only dispatches work and records completion state.
Use a bounded clone worker count derived from the existing clone concurrency limit, with a small fixed batch size to avoid long serialized hardlink runs.
Integrate perf(pm): prioritize clone unblockers #2991: prioritize parent clone unblockers and send them as single-job batches so nested dependency placement is not delayed behind unrelated hardlink work in the same batch.

Why

p3/p4 are dominated by materialization pressure after the earlier install scheduler changes. Batching clone completions reduces scheduler wakeups, but parent packages are on the nested placement critical path. If a parent clone completes early inside a batch but its completion is reported only after other leaf clones finish, its children cannot enter the queue. The #2991 follow-up fixes that without disabling batching for ordinary leaf clones.

Validation

cargo fmt
cargo test -p utoo-pm install_scheduler
cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Note: full workspace cargo clippy --all-targets -- -D warnings --no-deps is currently blocked in this worktree by pack-core/next.js submodule API mismatch, unrelated to the PM scheduler change.

Benchmark

Latest #2989 reference before #2991 integration, GHA Linux npmjs run 26089858682:

phase	utoo wall	utoo vCtx/iCtx
p0_full_cold	7.48s	66.5K / 43.2K
p1_resolve	2.37s	15.2K / 18.2K
p3_cold_install	5.29s	53.0K / 32.9K
p4_warm_link	2.23s	7.8K / 4.5K

#2991 AB before integration, GHA Linux npmjs run 26093544293:

phase	utoo wall	utoo vCtx/iCtx
p3_cold_install	5.21s	48.4K / 34.3K
p4_warm_link	1.86s	7.6K / 4.4K

Integrated #2989 head (fd38cfff), GHA Linux npmjs run 26094747554:

phase	utoo wall	utoo vCtx/iCtx
p0_full_cold	7.28s	67.4K / 44.2K
p1_resolve	2.39s	14.5K / 19.7K
p3_cold_install	5.30s	52.1K / 35.0K
p4_warm_link	2.07s	7.5K / 4.5K

Latest full green integrated run 26095851789:

phase	utoo wall	utoo vCtx/iCtx
p0_full_cold	10.51s	89.5K / 57.4K
p1_resolve	2.43s	15.0K / 21.2K
p3_cold_install	5.66s	54.7K / 37.6K
p4_warm_link	2.16s	7.8K / 4.6K

Conclusion: #2991 remains directionally positive for p4/warm materialize (2.23s -> 2.07s/2.16s, ctx stable). p3 is neutral to slightly noisy, which is acceptable because this change targets clone/materialize ordering rather than cold extract/cache population.

Additional GHA Reruns

Label-triggered Linux npmjs rerun 26102023925:

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	9.15s	11.98s	8.32s	7.55s	69.8K / 45.2K
p1 resolve	2.09s	3.12s	3.10s	2.41s	14.1K / 18.5K
p3 cold install	6.68s	6.46s	6.13s	8.07s	71.9K / 42.5K
p4 warm link	3.38s	2.44s	2.35s	2.08s	7.6K / 4.4K

Label-triggered Linux npmjs rerun 26102983933:

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	9.30s	8.06s	8.32s	7.34s	65.1K / 45.7K
p1 resolve	1.97s	3.05s	3.04s	2.39s	14.9K / 19.3K
p3 cold install	6.68s	9.25s	6.95s	5.30s	52.4K / 33.8K
p4 warm link	3.34s	2.37s	2.37s	2.20s	7.9K / 4.6K

Latest label-triggered Linux npmjs rerun 26110858368:

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	8.88s	8.10s	9.20s	8.56s	80.8K / 51.0K
p1 resolve	1.94s	2.94s	3.00s	3.04s	14.2K / 19.7K
p3 cold install	6.46s	6.93s	5.96s	6.63s	60.9K / 39.8K
p4 warm link	3.35s	2.33s	2.30s	2.11s	7.5K / 4.5K

Label-triggered Linux npmjs rerun 26112228488:

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	7.38s	7.44s	7.22s	6.79s	88.2K / 53.6K
p1 resolve	2.53s	3.09s	3.81s	2.69s	17.7K / 18.9K
p3 cold install	5.53s	5.63s	5.63s	6.17s	76.2K / 43.3K
p4 warm link	2.11s	1.74s	1.60s	1.67s	8.5K / 4.1K
p4 remains stable across reruns (`2.07s/2.16s/2.08s/2.20s/2.11s`, ctx about `7.5K/4.5K`). p3 is noisier because it includes network download plus cache extraction writes. Use multiple runs for p3 rather than a single sample.

Label-triggered Linux npmjs rerun 26115911963 (bench job succeeded):

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	9.36s	8.13s	8.90s	7.36s	83.3K / 51.8K
p1 resolve	2.19s	3.26s	3.07s	2.50s	15.6K / 19.6K
p3 cold install	6.50s	7.44s	7.71s	5.90s	65.5K / 41.3K
p4 warm link	3.55s	2.17s	2.34s	2.17s	7.7K / 4.5K

GHA pcap diagnostic run 26116045424 (diagnostic-only; pcap overhead/noise makes wall time unsuitable for ranking):

phase	wall	streams	zero-window / retransmit	p99 stream gap	max IO util	max write await
utoo install	17.86s	70	6 / 48	1.96ms	97.3%	481ms
utoo-next install	12.45s	69	4 / 9	2.41ms	77.9%	215ms
bun install	13.58s	259	10 / 605	41.97ms	69.2%	123ms

Conclusion from pcap: p3 cold install does not show an obvious socket-drain starvation issue for utoo; utoo has fewer zero-window/retransmit events than bun in the install capture. The stronger signal is disk-write tail latency (io_util_max=97.3%, w_await_max=481ms), so the remaining p3 work should focus on cache/extract/write scheduling rather than raising generic tarball concurrency.

Label-triggered Linux npmjs rerun 26117122045 (bench job succeeded; p3 wall noisy):

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	9.47s	9.28s	9.29s	11.05s	71.7K / 48.4K
p1 resolve	2.11s	3.10s	3.39s	2.49s	14.7K / 20.4K
p3 cold install	6.96s	10.11s	6.30s	8.32s	66.5K / 40.8K
p4 warm link	3.52s	2.27s	2.28s	1.93s	7.3K / 4.3K

Label-triggered Linux npmjs rerun 26118716599 (bench job succeeded):

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	10.29s	8.36s	8.20s	7.94s	75.8K / 55.6K
p1 resolve	2.24s	3.28s	3.25s	2.46s	16.1K / 20.9K
p3 cold install	6.47s	6.62s	11.02s	5.83s	59.7K / 39.0K
p4 warm link	3.47s	2.41s	2.31s	2.18s	7.6K / 4.6K

Rejected / Inconclusive Follow-ups

PR	idea	result
#2992	Pump clones before extracts	p3 regressed (`6.22s`, `56.8K/36.5K`); reject for now.
#2993	Halve extract slots while clone work exists	p3 wall improved (`4.84s`) but ctx regressed (`59.0K/39.2K`) and p4 no-op control moved heavily; inconclusive.
#2994	Flush parent clone unblockers early from a worker	Does not beat #2991 clearly (`p3 5.38s`, `p4 2.12s`); extra completion state is not justified.
#2995	Clone first waiter immediately after extract	p3 does not improve and vCtx rises (`5.75s`, `59.7K/33.9K`); do not integrate.
#2996	Cap install worker pressure	GHA regression vs #2989 (`p3 5.68s`, `62.3K/40.6K`; p0 also regressed); do not integrate.
#2997	Prioritize demand downloads over preload queue	Two GHA runs did not improve p3; rerun regressed (`8.02s`, `78.6K/46.1K`); do not integrate.
#2998	Raise install tarball default concurrency to 256	N=2 GHA/npmjs did not validate p3 (`8.03s`, `77.1K/40.2K`; rerun `6.35s`, `68.7K/39.8K`); do not use as a global/non-semver default.
#2999	Spawn install download workers instead of polling download futures in scheduler	No target ctx/wall signal (`p3 7.88s`, `82.2K/29.4K`) and p4 control was noisy; do not integrate.
#3000	Use bounded chunked file writes for scheduler indexed extracts	GHA p3 wall improved but ctx was neutral (`5.56s`, `64.6K/41.2K`) and p0 ctx regressed (`90.5K/56.3K`); poolab internal AB also showed higher ctx/sys. Do not integrate.
#3001	Bound downloaded tarball backlog to `4 * extract_limit`	GHA p3 regressed vs #2989 (`6.12s`, `74.5K/43.4K` vs `5.83s`, `59.7K/39.0K`); poolab internal AB also showed slow-tarball tail risk. Do not integrate.

Current pick status: #2991 is already integrated into this PR. No other follow-up experiment has enough GHA evidence to pick into #2989.

Label-triggered Linux npmjs rerun 26119947793 (bench job succeeded):

phase	bun	utoo-next	utoo-npm	utoo	utoo ctx
p0 full cold	9.00s	7.94s	8.07s	7.65s	81.0K / 51.4K
p1 resolve	2.09s	3.11s	3.00s	2.30s	15.8K / 19.7K
p3 cold install	6.68s	6.48s	6.14s	5.47s	68.8K / 42.3K
p4 warm link	3.54s	2.48s	2.38s	2.13s	7.6K / 4.5K

This keeps #2989 in the same GHA/npmjs band: p3 remains noisy but competitive on wall time, while p4 stays stable around 2.1s and 7.5K/4.5K ctx. This run did not capture npmmirror output.

gemini-code-assist

Code Review

This pull request introduces batching for clone operations in the installation scheduler to improve performance by reducing scheduler wakeups. It replaces individual clone tasks with batches of up to three operations and introduces a worker-based concurrency limit. Feedback indicates that the heuristic used to calculate the worker limit causes total concurrency to scale non-linearly and potentially exceed intended limits. It is suggested to either refine the formula to account for the batch size or document the rationale for the current implementation.

gemini-code-assist · 2026-05-19T08:58:29Z

+fn clone_worker_limit(clone_limit: usize) -> usize {
+    clone_limit
+        .saturating_div(2)
+        .saturating_add(2)
+        .clamp(1, clone_limit.max(1))
+}


The clone_worker_limit calculation uses a heuristic (limit / 2 + 2) that doesn't explicitly account for CLONE_BATCH_LIMIT. This results in a total potential concurrency (workers * batch size) that scales non-linearly with the original clone_limit. For example, a clone_limit of 4 results in 4 workers (up to 12 concurrent clones), while a clone_limit of 16 results in 10 workers (up to 30 concurrent clones).

If the intention is to maintain a total concurrency close to the original clone_limit while batching, consider a formula like (clone_limit / CLONE_BATCH_LIMIT).max(1). If the increased concurrency is intentional to saturate the Rayon pool, documenting the rationale for this specific heuristic would improve maintainability.

github-actions · 2026-05-19T19:38:44Z

📊 pm-bench-phases · `e4bbe06` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	9.00s	0.22s	10.25s	9.90s	763M	336.4K
utoo-next	7.94s	0.37s	10.53s	12.21s	1002M	123.0K
utoo-npm	8.07s	0.10s	10.69s	12.06s	1.00G	126.3K
utoo	7.65s	0.24s	11.07s	10.86s	892M	145.2K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.6K	18.2K	1.19G	6M	1.86G	1.75G	1M
utoo-next	129.1K	96.1K	1.16G	5M	1.71G	1.70G	2M
utoo-npm	129.5K	97.9K	1.16G	5M	1.71G	1.70G	2M
utoo	81.0K	51.4K	1.16G	6M	1.71G	1.70G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.09s	0.08s	3.92s	1.15s	521M	186.7K
utoo-next	3.11s	0.36s	5.05s	2.17s	611M	85.5K
utoo-npm	3.00s	0.02s	5.21s	2.14s	606M	81.0K
utoo	2.30s	0.03s	5.71s	1.65s	644M	120.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	10.2K	4.2K	202M	3M	107M	-	1M
utoo-next	74.7K	86.1K	200M	2M	7M	3M	2M
utoo-npm	73.9K	92.3K	200M	2M	7M	3M	2M
utoo	15.8K	19.7K	202M	3M	7M	3M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.68s	0.06s	6.31s	9.56s	581M	192.1K
utoo-next	6.48s	0.32s	5.04s	10.74s	505M	60.0K
utoo-npm	6.14s	0.22s	5.13s	10.61s	475M	60.4K
utoo	5.47s	0.15s	5.04s	9.36s	557M	63.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	5.3K	7.0K	1019M	4M	1.76G	1.76G	1M
utoo-next	115.0K	56.6K	989M	3M	1.70G	1.70G	2M
utoo-npm	110.7K	54.9K	989M	3M	1.70G	1.70G	2M
utoo	68.8K	42.3K	989M	3M	1.70G	1.70G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.54s	0.07s	0.19s	2.42s	135M	33.5K
utoo-next	2.48s	0.29s	0.49s	3.80s	79M	18.5K
utoo-npm	2.38s	0.08s	0.47s	3.73s	77M	17.7K
utoo	2.13s	0.05s	0.34s	3.26s	50M	11.2K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	241	25	5M	37K	1.91G	1.75G	1M
utoo-next	43.7K	21.4K	5K	4K	1.70G	1.70G	2M
utoo-npm	39.3K	18.2K	4K	8K	1.70G	1.70G	2M
utoo	7.6K	4.5K	5K	23K	1.71G	1.70G	2M

npmmirror.com: no output captured.

perf(pm): batch install clone completions

22935e7

elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 19, 2026

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 19, 2026

This was referenced May 19, 2026

perf(pm): prioritize clone unblockers #2991

Draft

perf(pm): prioritize install clone pumping #2992

Draft

perf(pm): reserve install clone capacity #2993

Draft

perf(pm): prioritize clone unblockers

fd38cff

This was referenced May 19, 2026

perf(pm): flush clone unblockers early #2994

Draft

perf(pm): clone first waiter after extract #2995

Draft

elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 19, 2026

This was referenced May 19, 2026

perf(pm): cap install worker pressure #2996

Closed

perf(pm): prioritize demand install downloads #2997

Closed

elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 19, 2026

This was referenced May 19, 2026

perf(pm): raise install tarball concurrency #2998

Closed

perf(pm): spawn install download workers #2999

Closed

perf(pm): chunk indexed extract writes #3000

Closed

perf(pm): bound install extract backlog #3001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): batch install clone completions#2989

perf(pm): batch install clone completions#2989
elrrrrrrr wants to merge 2 commits into
exp/pm-install-inline-download-futuresfrom
exp/pm-install-clone-batch-completions

elrrrrrrr commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Benchmark

Additional GHA Reruns

Rejected / Inconclusive Follow-ups

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

📊 pm-bench-phases · e4bbe06 · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elrrrrrrr commented May 19, 2026 •

edited

Loading

📊 pm-bench-phases · `e4bbe06` · linux (`ubuntu-latest`)