Conversation
CI pcap analysis against npmjs.org revealed TLS CPU as a major
preload bottleneck. Per-handshake timing on CI:
utoo (ring) bun (BoringSSL)
CH → SH (1 RTT) 11 ms 10 ms (network)
CCS → first AppData 78 ms p50 12 ms p50
154 ms max 17 ms max
TOTAL CH → AppData 162 ms 46 ms
The "CCS → AppData" phase is dominated by post-ChangeCipherSpec
client work (Finished MAC verify, state machine transition,
request dispatch). Observed at CI pcap capture:
utoo: Client CCS spread 154ms (first conn done at 0.975s, last
at 1.129s), then first AppData *all* fire within 11ms at
~1.13s — classic CPU-saturation pattern where 128 parallel
TLS handshakes serialise across 4 blocking threads and HTTP
dispatch is starved until crypto drains.
bun: CCS spread 51ms, AppData spread 43ms — dispatch flows
smoothly as each conn completes.
`ring` (rustls' default provider) is pure Rust + hand-tuned
assembly; `aws-lc-rs` wraps BoringSSL's primitives which are more
aggressively optimised for x86_64 AES-NI + SHA-NI.
Reqwest 0.12's `rustls-tls-native-roots` feature pins `__rustls-ring`
via Cargo's feature unification — no way to override. Swap to
`rustls-tls-native-roots-no-provider`, add a direct `rustls` dep
with the `aws-lc-rs` feature, load native root certs via
`rustls-native-certs` and pass the `ClientConfig` into reqwest via
`use_preconfigured_tls`.
Local M2 (ARM, where ring's hand-tuned ARM assembly is already
near-optimal) shows neutral perf (~2.9s both providers). Waiting
on CI to confirm the x86_64 CCS→AppData gap closes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request switches the HTTP client's TLS backend from ring to aws-lc-rs to improve handshake performance, implementing a custom rustls configuration that loads native certificates. The review feedback suggests logging a warning if the root certificate store is empty to help diagnose environment issues and notes that the global installation of the crypto provider may be unnecessary and could cause conflicts if the crate is used as a library.
| for cert in roots.certs { | ||
| // Best-effort: skip any cert rustls refuses (same tolerance | ||
| // native-tls shows). A hard fail here would brick every | ||
| // request on a box with one bad root in its trust store. | ||
| let _ = root_store.add(cert); | ||
| } |
There was a problem hiding this comment.
If load_native_certs() fails to find any valid certificates (e.g., in a minimal container environment), root_store will be empty, causing all subsequent HTTPS requests to fail with certificate verification errors. While the best-effort approach is appropriate for individual certificate failures, it would be beneficial to log a warning if the resulting root_store is completely empty to aid in troubleshooting environment issues.
| fn build_rustls_config() -> Result<rustls::ClientConfig> { | ||
| // Install aws-lc-rs as the default for any other rustls consumer in | ||
| // the process. Idempotent — only the first call per process wins. | ||
| let _ = rustls::crypto::aws_lc_rs::default_provider().install_default(); |
There was a problem hiding this comment.
Calling install_default() sets the process-wide default crypto provider for rustls. While this is generally acceptable for a CLI tool, it can cause conflicts if this crate is ever used as a library within a larger application that has already initialized a different provider (like ring). Since you are already explicitly passing the provider via builder_with_provider on line 128, this global installation might be unnecessary unless you intend to affect other rustls consumers in the same process.
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 9.08s | 0.14s | 10.10s | 10.22s | 644M | 310.9K |
| utoo-npm | 10.84s | 0.65s | 11.91s | 13.76s | 1.24G | 159.4K |
| utoo | 10.69s | 0.48s | 12.45s | 13.91s | 1.27G | 155.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 15.0K | 17.4K | 1.16G | 6M | 1.83G | 1.72G | 1M |
| utoo-npm | 175.4K | 166.0K | 1.14G | 5M | 1.68G | 1.68G | 2M |
| utoo | 195.1K | 158.5K | 1.14G | 6M | 1.68G | 1.68G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.27s | 0.02s | 3.81s | 1.08s | 501M | 185.3K |
| utoo-npm | 5.55s | 0.11s | 6.05s | 1.11s | 429M | 72.5K |
| utoo | 6.13s | 1.00s | 6.33s | 1.39s | 427M | 76.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 10.1K | 3.6K | 200M | 3M | 104M | - | 1M |
| utoo-npm | 66.8K | 2.5K | 204M | 2M | 9M | 5M | 2M |
| utoo | 90.2K | 2.8K | 207M | 3M | 9M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 7.17s | 0.78s | 6.07s | 9.94s | 575M | 192.4K |
| utoo-npm | 8.35s | 1.38s | 5.58s | 11.76s | 854M | 110.3K |
| utoo | 7.02s | 0.14s | 5.41s | 11.40s | 916M | 114.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 5.0K | 7.0K | 993M | 4M | 1.73G | 1.73G | 1M |
| utoo-npm | 128.9K | 83.8K | 965M | 3M | 1.67G | 1.67G | 2M |
| utoo | 113.9K | 85.3K | 964M | 3M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.57s | 0.06s | 0.17s | 2.41s | 135M | 31.1K |
| utoo-npm | 2.39s | 0.07s | 0.61s | 3.89s | 82M | 19.3K |
| utoo | 2.21s | 0.14s | 0.59s | 3.94s | 82M | 18.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 323 | 26 | 25K | 19K | 1.84G | 1.73G | 1M |
| utoo-npm | 44.2K | 19.8K | 16K | 12K | 1.67G | 1.67G | 2M |
| utoo | 50.2K | 22.0K | 15K | 10K | 1.68G | 1.67G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 43.67s | 28.17s | 9.44s | 9.79s | 567M | 428.9K |
| utoo-npm | 17.29s | 0.67s | 8.40s | 14.09s | 788M | 111.3K |
| utoo | 21.55s | 6.82s | 8.24s | 14.03s | 766M | 118.7K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 57.5K | 4.8K | 1.12G | 10M | 1.83G | 1.72G | 2M |
| utoo-npm | 218.5K | 127.6K | 978M | 7M | 1.67G | 1.67G | 2M |
| utoo | 217.3K | 105.4K | 981M | 8M | 1.67G | 1.68G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 1.45s | 0.07s | 3.97s | 1.05s | 627M | 171.4K |
| utoo-npm | 10.99s | 6.82s | 2.26s | 0.58s | 76M | 16.2K |
| utoo | 1.20s | 0.05s | 1.64s | 0.44s | 79M | 16.0K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 4.9K | 6.1K | 152M | 2M | 106M | - | 2M |
| utoo-npm | 46.0K | 745 | 12M | 2M | - | 4M | 2M |
| utoo | 30.5K | 1.4K | 16M | 2M | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 24.12s | 6.09s | 5.85s | 9.00s | 245M | 95.1K |
| utoo-npm | 30.21s | 7.45s | 6.32s | 12.93s | 619M | 89.5K |
| utoo | 27.71s | 9.73s | 6.20s | 12.96s | 628M | 93.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 35.3K | 2.9K | 998M | 7M | 1.73G | 1.73G | 2M |
| utoo-npm | 188.3K | 102.9K | 965M | 6M | 1.67G | 1.67G | 2M |
| utoo | 186.8K | 100.3K | 979M | 6M | 1.67G | 1.67G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.18s | 0.11s | 0.22s | 2.35s | 135M | 31.1K |
| utoo-npm | 2.19s | 0.10s | 0.56s | 3.90s | 83M | 18.7K |
| utoo | 2.11s | 0.02s | 0.56s | 3.95s | 82M | 19.1K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 348 | 27 | 7M | 50K | 1.88G | 1.72G | 2M |
| utoo-npm | 49.6K | 20.6K | 48K | 12K | 1.67G | 1.67G | 2M |
| utoo | 48.5K | 21.9K | 49K | 18K | 1.67G | 1.67G | 2M |
📊 pm-bench-phases ·
|
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 19.22s | 0.53s | 6.58s | 19.69s | 759M | 49.0K |
| utoo-npm | 19.28s | 2.70s | 9.41s | 20.74s | 965M | 104.3K |
| utoo | 19.33s | 2.12s | 9.43s | 20.01s | 1.01G | 102.6K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 16.2K | 142.0K | - | - | 1.79G | 1.90G | 1M |
| utoo-npm | 13.2K | 373.2K | - | - | 1.63G | 1.84G | 2M |
| utoo | 13.9K | 391.2K | - | - | 1.63G | 1.83G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 2.25s | 0.10s | 2.71s | 1.24s | 506M | 32.8K |
| utoo-npm | 5.40s | 0.46s | 4.51s | 2.49s | 555M | 37.4K |
| utoo | 7.24s | 2.21s | 5.22s | 3.27s | 553M | 37.4K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 28 | 20.2K | - | - | 110M | - | 1M |
| utoo-npm | 13 | 73.1K | - | - | 28M | 5M | 2M |
| utoo | 32 | 88.7K | - | - | 28M | 5M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 20.43s | 4.37s | 3.67s | 19.79s | 543M | 35.4K |
| utoo-npm | 18.05s | 2.24s | 4.48s | 20.36s | 581M | 77.8K |
| utoo | 14.92s | 1.17s | 4.40s | 20.16s | 862M | 77.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 6.1K | 135.8K | - | - | 1.70G | 1.94G | 1M |
| utoo-npm | 1.5K | 241.7K | - | - | 1.61G | 1.84G | 2M |
| utoo | 1.3K | 235.1K | - | - | 1.61G | 1.84G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 6.14s | 1.57s | 0.14s | 2.77s | 51M | 3.9K |
| utoo-npm | 4.57s | 0.53s | 0.57s | 2.91s | 92M | 6.7K |
| utoo | 4.42s | 0.46s | 0.58s | 3.22s | 93M | 7.0K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 17.9K | 1.2K | - | - | 1.86G | 1.91G | 1M |
| utoo-npm | 13.3K | 69.9K | - | - | 1.61G | 1.82G | 2M |
| utoo | 13.5K | 80.1K | - | - | 1.63G | 1.82G | 2M |
npmmirror.com
p0_full_cold
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 35.75s | 6.20s | 7.22s | 21.25s | 552M | 35.7K |
| utoo-npm | 34.49s | 0.27s | 7.40s | 21.00s | 746M | 77.8K |
| utoo | 34.27s | 2.16s | 6.33s | 17.33s | 779M | 82.2K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 14.7K | 172.7K | - | - | 1.79G | 1.90G | 2M |
| utoo-npm | 949 | 421.3K | - | - | 1.61G | 1.87G | 2M |
| utoo | 1.0K | 436.4K | - | - | 1.61G | 1.82G | 2M |
p1_resolve
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 3.79s | 0.11s | 2.40s | 1.36s | 524M | 34.0K |
| utoo-npm | 6.76s | 0.18s | 1.31s | 0.69s | 79M | 5.7K |
| utoo | 29.18s | 40.71s | 1.34s | 0.57s | 81M | 5.9K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 42 | 34.1K | - | - | 111M | - | 2M |
| utoo-npm | 12 | 51.7K | - | - | - | 4M | 2M |
| utoo | 40 | 40.1K | - | - | - | 4M | 2M |
p3_cold_install
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 23.97s | 0.60s | 3.71s | 15.63s | 216M | 14.4K |
| utoo-npm | 36.95s | 4.67s | 4.35s | 14.32s | 672M | 74.7K |
| utoo | 39.31s | 3.17s | 4.23s | 14.08s | 750M | 76.5K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 1.8K | 162.3K | - | - | 1.65G | 1.91G | 2M |
| utoo-npm | 1.5K | 358.3K | - | - | 1.61G | 1.87G | 2M |
| utoo | 1.9K | 362.9K | - | - | 1.61G | 1.87G | 2M |
p4_warm_link
| PM | wall | ±σ | user | sys | RSS | pgMinor |
|---|---|---|---|---|---|---|
| bun | 5.30s | 0.90s | 0.12s | 2.34s | 44M | 3.4K |
| utoo-npm | 6.23s | 0.19s | 0.92s | 4.74s | 92M | 6.8K |
| utoo | 5.35s | 0.51s | 0.72s | 4.04s | 90M | 6.8K |
| PM | vCtx | iCtx | netRX | netTX | cache | node_mod | lock |
|---|---|---|---|---|---|---|---|
| bun | 13.3K | 904 | - | - | 1.78G | 1.90G | 2M |
| utoo-npm | 12.1K | 81.9K | - | - | 1.61G | 1.84G | 2M |
| utoo | 12.3K | 79.1K | - | - | 1.61G | 1.84G | 2M |
Summary
Single-commit probe: cherry-pick `b167c977 perf(ruborist): rustls with aws-lc-rs crypto provider instead of ring` from #2818 onto fresh `origin/next`. Nothing else.
3 files, +66/-3.
Hypothesis
#2818 (whole bundle) showed p1_resolve 5.45s → 2.62s (-52%) on linux. Worker-pool alone (#2832) was nop. Cap raise alone was nop (#2830 cap=128). The most plausible single-change driver remaining: TLS handshake cost.
Original commit msg measured TLS handshake on CI:
128 parallel TLS handshakes serialised on 4-thread crypto pool with `ring` ⇒ HTTP dispatch starves until crypto drains. `aws-lc-rs` (BoringSSL primitives) has aggressive AES-NI / SHA-NI optimisations.
If this single commit reproduces #2818's -52% on p1_resolve, TLS crypto provider is the perf driver, not worker-pool / cap / parse architecture.
Test plan
🤖 Generated with Claude Code