perf(pm): prioritize clone unblockers by elrrrrrrr · Pull Request #2991 · utooland/utoo

elrrrrrrr · 2026-05-19T11:13:22Z

Summary

This is a p3/p4 scheduler experiment on top of #2989.

#2989 batches clone completions to reduce scheduler wakeups, but that can delay parent package completion inside a batch. A parent clone completion is on the materialize critical path because it releases blocked_by_parent children. This PR keeps ordinary leaf clones batched, while prioritizing ready parent unblockers and sending them as a single-job batch.

Hypothesis

p4 warm link should improve or stay neutral if nested placements enter the queue earlier.
p3 cold install may improve when extract/download work and clone materialization overlap, because clone critical-path parents no longer wait behind unrelated hardlink work in the same batch.
ctx may rise slightly if many parent clones become single-item batches; this is the tradeoff to validate.

Validation

cargo fmt
cargo test -p utoo-pm install_scheduler
cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Bench Result

GHA Linux npmjs run 26093544293:

phase	utoo wall	vCtx	iCtx
p3 cold install	5.21s	48.4K	34.3K
p4 warm link	1.86s	7.6K	4.4K

Compared with #2989 latest npmjs reference (p3 5.29s, 53.0K/32.9K; p4 2.23s, 7.8K/4.5K), this is directionally positive. p4 is the strongest signal because this PR directly changes warm materialize scheduling.

Conclusion: keep as a candidate; the hypothesis that batched parent clone completions delay nested placement is supported.

gemini-code-assist

Code Review

This pull request refactors the clone batching logic in install_scheduler.rs to prioritize 'unblocker' jobs—parent clones that unblock nested dependencies. It introduces take_clone_batch and pop_next_clone_job to handle this prioritization, ensuring that unblockers are processed individually to allow their children to enter the queue sooner. Unit tests were added to verify the prioritization and batching behavior. Review feedback highlights a potential performance bottleneck in pop_next_clone_job due to its O(N) complexity and suggests using separate queues for unblocker and leaf jobs to achieve O(1) performance.

gemini-code-assist · 2026-05-19T11:22:15Z

+    fn pop_next_clone_job(&mut self) -> Option<ReadyClone> {
+        let unblocker_index = if self.blocked_by_parent.is_empty() {
+            None
+        } else {
+            self.clone_queue
+                .iter()
+                .position(|job| self.blocked_by_parent.contains_key(&job.spec.target))
+        };
+        match unblocker_index {
+            Some(index) => self.clone_queue.remove(index),
+            None => self.clone_queue.pop_front(),
+        }
+    }


The current implementation of pop_next_clone_job iterates through the clone_queue to find a priority job. This is an O(N) operation, and VecDeque::remove() is also O(N) on average. This could introduce a performance bottleneck if the queue becomes large.

For a more efficient implementation, consider using two separate queues: one for high-priority "unblocker" jobs and one for regular "leaf" jobs.

In SchedulerState, replace clone_queue: VecDeque<ReadyClone> with:

unblocker_clone_queue: VecDeque<ReadyClone>, leaf_clone_queue: VecDeque<ReadyClone>,

When a job is ready to be cloned (e.g., in handle_done for SeededCache and in complete_download), classify it and push it to the appropriate queue:

let ready_clone = ReadyClone { spec, cache }; if self.blocked_by_parent.contains_key(&ready_clone.spec.target) { self.unblocker_clone_queue.push_back(ready_clone); } else { self.leaf_clone_queue.push_back(ready_clone); }

Then, pop_next_clone_job can be simplified to an O(1) operation. It could also return whether the job is an unblocker to avoid a second lookup in take_clone_batch:

fn pop_next_clone_job(&mut self) -> Option<(ReadyClone, bool)> { if let Some(job) = self.unblocker_clone_queue.pop_front() { return Some((job, true)); } if let Some(job) = self.leaf_clone_queue.pop_front() { return Some((job, false)); } None }

This would require adjusting take_clone_batch to handle the new return type.

This approach would eliminate the linear scan and make the scheduling logic more efficient, which could be important for the performance experiment you are running.

github-actions · 2026-05-19T11:47:22Z

📊 pm-bench-phases · `39262ac` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	8.99s	0.25s	10.36s	10.29s	753M	337.1K
utoo-next	8.95s	0.88s	10.47s	12.57s	1007M	125.2K
utoo-npm	7.92s	0.06s	10.51s	12.07s	980M	117.5K
utoo	8.52s	1.65s	11.20s	11.20s	924M	143.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	14.5K	18.4K	1.19G	6M	1.86G	1.75G	1M
utoo-next	141.1K	98.9K	1.16G	5M	1.71G	1.70G	2M
utoo-npm	118.3K	79.6K	1.16G	5M	1.71G	1.70G	2M
utoo	81.1K	53.9K	1.16G	7M	1.71G	1.70G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.88s	0.03s	4.03s	1.04s	507M	163.7K
utoo-next	3.00s	0.01s	5.37s	2.09s	612M	88.7K
utoo-npm	2.99s	0.01s	5.53s	2.03s	604M	78.3K
utoo	2.37s	0.03s	5.95s	1.63s	644M	118.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8.2K	4.6K	203M	3M	107M	-	1M
utoo-next	70.8K	87.2K	200M	2M	7M	3M	2M
utoo-npm	71.0K	91.1K	200M	2M	7M	3M	2M
utoo	14.0K	17.0K	202M	3M	7M	3M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.90s	0.30s	6.28s	9.95s	609M	203.8K
utoo-next	7.22s	2.10s	4.95s	10.93s	501M	63.8K
utoo-npm	6.65s	1.45s	5.08s	11.12s	501M	65.0K
utoo	5.12s	0.18s	4.88s	9.44s	517M	58.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	4.1K	7.0K	1018M	3M	1.76G	1.76G	1M
utoo-next	110.2K	48.4K	989M	3M	1.70G	1.70G	2M
utoo-npm	115.3K	53.0K	989M	3M	1.70G	1.70G	2M
utoo	50.3K	33.1K	989M	2M	1.70G	1.70G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.28s	0.03s	0.21s	2.35s	134M	32.9K
utoo-next	2.33s	0.12s	0.48s	3.76s	78M	18.4K
utoo-npm	2.26s	0.01s	0.49s	3.79s	80M	18.7K
utoo	2.08s	0.04s	0.33s	3.30s	50M	11.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	320	29	5M	33K	1.91G	1.75G	1M
utoo-next	40.0K	18.6K	2K	8K	1.70G	1.70G	2M
utoo-npm	44.5K	20.2K	1K	5K	1.70G	1.70G	2M
utoo	7.7K	4.6K	3K	24K	1.71G	1.70G	2M

npmmirror.com: no output captured.

perf(pm): prioritize clone unblockers

77fc37c

elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 19, 2026

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

elrrrrrrr mentioned this pull request May 19, 2026

perf(pm): batch install clone completions #2989

Draft

This was referenced May 19, 2026

perf(pm): flush clone unblockers early #2994

Draft

perf(pm): clone first waiter after extract #2995

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): prioritize clone unblockers#2991

perf(pm): prioritize clone unblockers#2991
elrrrrrrr wants to merge 1 commit into
exp/pm-install-clone-batch-completionsfrom
exp/pm-install-prioritize-clone-unblockers

elrrrrrrr commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Hypothesis

Validation

Bench Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

📊 pm-bench-phases · 39262ac · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elrrrrrrr commented May 19, 2026 •

edited

Loading

📊 pm-bench-phases · `39262ac` · linux (`ubuntu-latest`)