Skip to content

perf(pm): spawn install download workers#2999

Closed
elrrrrrrr wants to merge 1 commit into
exp/pm-install-clone-batch-completionsfrom
exp/pm-install-spawn-download-workers
Closed

perf(pm): spawn install download workers#2999
elrrrrrrr wants to merge 1 commit into
exp/pm-install-clone-batch-completionsfrom
exp/pm-install-spawn-download-workers

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 19, 2026

Summary

  • Move install tarball downloads out of the scheduler FuturesUnordered and into short-lived tokio workers.
  • Keep scheduler ownership of dedupe, active sets, queues, extraction, and clone ordering.
  • Report download completion back through the existing OpDone::Download path.

Why

p3 cold install is dominated by registry tarball downloads plus extract/cache writes. Polling thousands of download futures directly inside the scheduler can wake the scheduler on network readiness even though the scheduler only needs the final downloaded bytes. This AB checks whether keeping network polling in worker tasks reduces scheduler interference and context-switch pressure.

Validation

  • cargo fmt
  • cargo test -p utoo-pm install_scheduler
  • cargo clippy -p utoo-pm --all-targets -- -D warnings --no-deps

Benchmark

GHA Linux npmjs run 26114547814:

phase bun utoo-next utoo-npm utoo utoo ctx
p0 full cold 12.64s 10.94s 13.93s 11.29s 104.2K / 41.9K
p1 resolve 1.77s 2.67s 2.56s 2.14s 14.9K / 19.1K
p3 cold install 10.40s 8.43s 10.85s 7.88s 82.2K / 29.4K
p4 warm link 4.93s 4.02s 3.09s 4.87s 7.7K / 4.2K

Windows e2e failed on npmjs HTTP 406 Not Acceptable while resolving @isaacs/cliui, matching known registry flake behavior and not specific to this install scheduler change.

Conclusion

Do not pick this into #2989. The intended signal was lower p3 scheduler/network context pressure, but the run shows higher p3 vCtx than #2989 and no wall improvement. p4 was also globally noisy, but the experiment still does not produce the target signal.

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 19, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the installation scheduler to use tokio::spawn for backgrounding package downloads instead of directly pinning them to the operations list. Feedback highlights a reliability risk where detached tasks could cause the scheduler to hang if a worker panics, as the download_active state would never be cleared. It is recommended to push the JoinHandle into the existing ops collection to ensure panics are correctly propagated and the scheduler remains robust.

Comment on lines +453 to +460
let done_tx = self.done_tx.clone();
tokio::spawn(async move {
let result = download_bytes(&package.tarball_url)
.await
.map(DownloadOutcome::Bytes)
.map_err(|e| format!("{e:#}"));
OpDone::Download { package, result }
}));
let _ = done_tx.send(OpDone::Download { package, result });
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While moving downloads to background workers improves scheduler performance, using tokio::spawn with a detached channel send creates a reliability risk. If a download worker panics, the download_active entry will never be cleared, and the scheduler will hang during shutdown while waiting for a message that will never arrive.

Since self.ops is already used to track asynchronous operations, it is safer to push the JoinHandle into self.ops. This ensures that if a worker panics, the scheduler task will also panic (adhering to the project's policy of not recovering from panics) and prevents potential hangs. Polling a JoinHandle is extremely efficient and will not cause the scheduler interference issues described in the PR.

            let handle = tokio::spawn(async move {
                let result = download_bytes(&package.tarball_url)
                    .await
                    .map(DownloadOutcome::Bytes)
                    .map_err(|e| format!("{e:#}"));
                OpDone::Download { package, result }
            });
            self.ops.push(Box::pin(async move {
                handle.await.expect("Download worker panicked")
            }));
References
  1. Do not implement recovery logic for panics. Panics should be treated as unrecoverable bugs that need to be fixed, not as transient, recoverable errors.

@elrrrrrrr
Copy link
Copy Markdown
Contributor Author

Closing this AB: GHA did not show the target p3 ctx/wall improvement. The Windows e2e failure is the known npmjs 406 flake, but the perf signal is not good enough to keep.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · d038e89 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.33s 0.39s 10.43s 10.38s 704M 331.6K
utoo-next 8.92s 1.31s 10.83s 12.82s 992M 128.2K
utoo-npm 9.04s 1.52s 10.65s 12.38s 961M 125.6K
utoo 9.90s 4.26s 11.28s 11.35s 905M 137.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.7K 18.0K 1.19G 6M 1.88G 1.76G 1M
utoo-next 137.3K 98.7K 1.16G 5M 1.71G 1.70G 2M
utoo-npm 120.2K 84.4K 1.16G 5M 1.71G 1.70G 2M
utoo 86.5K 50.8K 1.16G 6M 1.71G 1.70G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.06s 0.02s 4.02s 1.04s 504M 171.4K
utoo-next 3.00s 0.04s 5.45s 2.06s 616M 82.3K
utoo-npm 3.09s 0.05s 5.48s 2.05s 623M 76.7K
utoo 2.35s 0.09s 5.96s 1.62s 645M 115.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 9.4K 4.2K 203M 3M 108M - 1M
utoo-next 72.4K 88.4K 200M 2M 7M 3M 2M
utoo-npm 72.1K 91.0K 200M 2M 7M 3M 2M
utoo 13.9K 19.7K 202M 3M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.80s 0.12s 6.32s 9.98s 600M 205.7K
utoo-next 6.09s 0.09s 4.97s 10.95s 489M 59.6K
utoo-npm 7.95s 3.46s 5.15s 11.41s 525M 61.0K
utoo 5.04s 0.11s 4.89s 9.73s 527M 57.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.4K 6.8K 1018M 4M 1.76G 1.76G 1M
utoo-next 99.0K 51.2K 989M 3M 1.70G 1.70G 2M
utoo-npm 123.5K 48.2K 990M 3M 1.70G 1.70G 2M
utoo 59.2K 38.7K 989M 2M 1.70G 1.70G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.48s 0.06s 0.20s 2.35s 133M 32.5K
utoo-next 2.21s 0.07s 0.49s 3.78s 80M 18.9K
utoo-npm 2.15s 0.03s 0.49s 3.75s 80M 18.9K
utoo 2.05s 0.06s 0.34s 3.25s 50M 11.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 371 25 5M 18K 1.91G 1.75G 1M
utoo-next 43.1K 19.8K 6K 10K 1.70G 1.70G 2M
utoo-npm 43.9K 20.4K 5K 7K 1.70G 1.70G 2M
utoo 7.3K 4.2K 6K 6K 1.71G 1.70G 2M

npmmirror.com: no output captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant