Skip to content

perf(pm): probe — rustls aws-lc-rs alone#2835

Closed
elrrrrrrr wants to merge 1 commit intonextfrom
perf/probe-aws-lc-rs
Closed

perf(pm): probe — rustls aws-lc-rs alone#2835
elrrrrrrr wants to merge 1 commit intonextfrom
perf/probe-aws-lc-rs

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Summary

Single-commit probe: cherry-pick `b167c977 perf(ruborist): rustls with aws-lc-rs crypto provider instead of ring` from #2818 onto fresh `origin/next`. Nothing else.

3 files, +66/-3.

Hypothesis

#2818 (whole bundle) showed p1_resolve 5.45s → 2.62s (-52%) on linux. Worker-pool alone (#2832) was nop. Cap raise alone was nop (#2830 cap=128). The most plausible single-change driver remaining: TLS handshake cost.

Original commit msg measured TLS handshake on CI:

utoo (ring) bun (BoringSSL)
CH→SH 11ms 10ms
CCS→AppData 78ms p50 / 154ms max 12ms p50 / 17ms max
Total 162ms 46ms

128 parallel TLS handshakes serialised on 4-thread crypto pool with `ring` ⇒ HTTP dispatch starves until crypto drains. `aws-lc-rs` (BoringSSL primitives) has aggressive AES-NI / SHA-NI optimisations.

If this single commit reproduces #2818's -52% on p1_resolve, TLS crypto provider is the perf driver, not worker-pool / cap / parse architecture.

Test plan

  • CI `pm-e2e-*`
  • CI `utooweb-ci-build-wasm`
  • CI `pm-bench-phases-*` head-to-head vs `utoo-next`

🤖 Generated with Claude Code

CI pcap analysis against npmjs.org revealed TLS CPU as a major
preload bottleneck. Per-handshake timing on CI:

                          utoo (ring)    bun (BoringSSL)
  CH → SH (1 RTT)           11 ms          10 ms    (network)
  CCS → first AppData       78 ms p50      12 ms p50
                           154 ms max      17 ms max
  TOTAL CH → AppData       162 ms         46 ms

The "CCS → AppData" phase is dominated by post-ChangeCipherSpec
client work (Finished MAC verify, state machine transition,
request dispatch). Observed at CI pcap capture:

  utoo: Client CCS spread 154ms (first conn done at 0.975s, last
        at 1.129s), then first AppData *all* fire within 11ms at
        ~1.13s — classic CPU-saturation pattern where 128 parallel
        TLS handshakes serialise across 4 blocking threads and HTTP
        dispatch is starved until crypto drains.
  bun:  CCS spread 51ms, AppData spread 43ms — dispatch flows
        smoothly as each conn completes.

`ring` (rustls' default provider) is pure Rust + hand-tuned
assembly; `aws-lc-rs` wraps BoringSSL's primitives which are more
aggressively optimised for x86_64 AES-NI + SHA-NI.

Reqwest 0.12's `rustls-tls-native-roots` feature pins `__rustls-ring`
via Cargo's feature unification — no way to override. Swap to
`rustls-tls-native-roots-no-provider`, add a direct `rustls` dep
with the `aws-lc-rs` feature, load native root certs via
`rustls-native-certs` and pass the `ClientConfig` into reqwest via
`use_preconfigured_tls`.

Local M2 (ARM, where ring's hand-tuned ARM assembly is already
near-optimal) shows neutral perf (~2.9s both providers). Waiting
on CI to confirm the x86_64 CCS→AppData gap closes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added the benchmark Run pm-bench on PR label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request switches the HTTP client's TLS backend from ring to aws-lc-rs to improve handshake performance, implementing a custom rustls configuration that loads native certificates. The review feedback suggests logging a warning if the root certificate store is empty to help diagnose environment issues and notes that the global installation of the crypto provider may be unnecessary and could cause conflicts if the crate is used as a library.

Comment on lines +115 to +120
for cert in roots.certs {
// Best-effort: skip any cert rustls refuses (same tolerance
// native-tls shows). A hard fail here would brick every
// request on a box with one bad root in its trust store.
let _ = root_store.add(cert);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If load_native_certs() fails to find any valid certificates (e.g., in a minimal container environment), root_store will be empty, causing all subsequent HTTPS requests to fail with certificate verification errors. While the best-effort approach is appropriate for individual certificate failures, it would be beneficial to log a warning if the resulting root_store is completely empty to aid in troubleshooting environment issues.

fn build_rustls_config() -> Result<rustls::ClientConfig> {
// Install aws-lc-rs as the default for any other rustls consumer in
// the process. Idempotent — only the first call per process wins.
let _ = rustls::crypto::aws_lc_rs::default_provider().install_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling install_default() sets the process-wide default crypto provider for rustls. While this is generally acceptable for a CLI tool, it can cause conflicts if this crate is ever used as a library within a larger application that has already initialized a different provider (like ring). Since you are already explicitly passing the provider via builder_with_provider on line 128, this global installation might be unnecessary unless you intend to affect other rustls consumers in the same process.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 63b12a7 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 9.08s 0.14s 10.10s 10.22s 644M 310.9K
utoo-npm 10.84s 0.65s 11.91s 13.76s 1.24G 159.4K
utoo 10.69s 0.48s 12.45s 13.91s 1.27G 155.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.0K 17.4K 1.16G 6M 1.83G 1.72G 1M
utoo-npm 175.4K 166.0K 1.14G 5M 1.68G 1.68G 2M
utoo 195.1K 158.5K 1.14G 6M 1.68G 1.68G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.27s 0.02s 3.81s 1.08s 501M 185.3K
utoo-npm 5.55s 0.11s 6.05s 1.11s 429M 72.5K
utoo 6.13s 1.00s 6.33s 1.39s 427M 76.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 10.1K 3.6K 200M 3M 104M - 1M
utoo-npm 66.8K 2.5K 204M 2M 9M 5M 2M
utoo 90.2K 2.8K 207M 3M 9M 5M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 7.17s 0.78s 6.07s 9.94s 575M 192.4K
utoo-npm 8.35s 1.38s 5.58s 11.76s 854M 110.3K
utoo 7.02s 0.14s 5.41s 11.40s 916M 114.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.0K 7.0K 993M 4M 1.73G 1.73G 1M
utoo-npm 128.9K 83.8K 965M 3M 1.67G 1.67G 2M
utoo 113.9K 85.3K 964M 3M 1.67G 1.67G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.57s 0.06s 0.17s 2.41s 135M 31.1K
utoo-npm 2.39s 0.07s 0.61s 3.89s 82M 19.3K
utoo 2.21s 0.14s 0.59s 3.94s 82M 18.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 323 26 25K 19K 1.84G 1.73G 1M
utoo-npm 44.2K 19.8K 16K 12K 1.67G 1.67G 2M
utoo 50.2K 22.0K 15K 10K 1.68G 1.67G 2M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 43.67s 28.17s 9.44s 9.79s 567M 428.9K
utoo-npm 17.29s 0.67s 8.40s 14.09s 788M 111.3K
utoo 21.55s 6.82s 8.24s 14.03s 766M 118.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 57.5K 4.8K 1.12G 10M 1.83G 1.72G 2M
utoo-npm 218.5K 127.6K 978M 7M 1.67G 1.67G 2M
utoo 217.3K 105.4K 981M 8M 1.67G 1.68G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.45s 0.07s 3.97s 1.05s 627M 171.4K
utoo-npm 10.99s 6.82s 2.26s 0.58s 76M 16.2K
utoo 1.20s 0.05s 1.64s 0.44s 79M 16.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.9K 6.1K 152M 2M 106M - 2M
utoo-npm 46.0K 745 12M 2M - 4M 2M
utoo 30.5K 1.4K 16M 2M - 4M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 24.12s 6.09s 5.85s 9.00s 245M 95.1K
utoo-npm 30.21s 7.45s 6.32s 12.93s 619M 89.5K
utoo 27.71s 9.73s 6.20s 12.96s 628M 93.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 35.3K 2.9K 998M 7M 1.73G 1.73G 2M
utoo-npm 188.3K 102.9K 965M 6M 1.67G 1.67G 2M
utoo 186.8K 100.3K 979M 6M 1.67G 1.67G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.18s 0.11s 0.22s 2.35s 135M 31.1K
utoo-npm 2.19s 0.10s 0.56s 3.90s 83M 18.7K
utoo 2.11s 0.02s 0.56s 3.95s 82M 19.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 348 27 7M 50K 1.88G 1.72G 2M
utoo-npm 49.6K 20.6K 48K 12K 1.67G 1.67G 2M
utoo 48.5K 21.9K 49K 18K 1.67G 1.67G 2M

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 63b12a7 · mac (macos-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 19.22s 0.53s 6.58s 19.69s 759M 49.0K
utoo-npm 19.28s 2.70s 9.41s 20.74s 965M 104.3K
utoo 19.33s 2.12s 9.43s 20.01s 1.01G 102.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 16.2K 142.0K - - 1.79G 1.90G 1M
utoo-npm 13.2K 373.2K - - 1.63G 1.84G 2M
utoo 13.9K 391.2K - - 1.63G 1.83G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.25s 0.10s 2.71s 1.24s 506M 32.8K
utoo-npm 5.40s 0.46s 4.51s 2.49s 555M 37.4K
utoo 7.24s 2.21s 5.22s 3.27s 553M 37.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 28 20.2K - - 110M - 1M
utoo-npm 13 73.1K - - 28M 5M 2M
utoo 32 88.7K - - 28M 5M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 20.43s 4.37s 3.67s 19.79s 543M 35.4K
utoo-npm 18.05s 2.24s 4.48s 20.36s 581M 77.8K
utoo 14.92s 1.17s 4.40s 20.16s 862M 77.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 6.1K 135.8K - - 1.70G 1.94G 1M
utoo-npm 1.5K 241.7K - - 1.61G 1.84G 2M
utoo 1.3K 235.1K - - 1.61G 1.84G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 6.14s 1.57s 0.14s 2.77s 51M 3.9K
utoo-npm 4.57s 0.53s 0.57s 2.91s 92M 6.7K
utoo 4.42s 0.46s 0.58s 3.22s 93M 7.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.9K 1.2K - - 1.86G 1.91G 1M
utoo-npm 13.3K 69.9K - - 1.61G 1.82G 2M
utoo 13.5K 80.1K - - 1.63G 1.82G 2M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 35.75s 6.20s 7.22s 21.25s 552M 35.7K
utoo-npm 34.49s 0.27s 7.40s 21.00s 746M 77.8K
utoo 34.27s 2.16s 6.33s 17.33s 779M 82.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.7K 172.7K - - 1.79G 1.90G 2M
utoo-npm 949 421.3K - - 1.61G 1.87G 2M
utoo 1.0K 436.4K - - 1.61G 1.82G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 3.79s 0.11s 2.40s 1.36s 524M 34.0K
utoo-npm 6.76s 0.18s 1.31s 0.69s 79M 5.7K
utoo 29.18s 40.71s 1.34s 0.57s 81M 5.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 42 34.1K - - 111M - 2M
utoo-npm 12 51.7K - - - 4M 2M
utoo 40 40.1K - - - 4M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 23.97s 0.60s 3.71s 15.63s 216M 14.4K
utoo-npm 36.95s 4.67s 4.35s 14.32s 672M 74.7K
utoo 39.31s 3.17s 4.23s 14.08s 750M 76.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 1.8K 162.3K - - 1.65G 1.91G 2M
utoo-npm 1.5K 358.3K - - 1.61G 1.87G 2M
utoo 1.9K 362.9K - - 1.61G 1.87G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 5.30s 0.90s 0.12s 2.34s 44M 3.4K
utoo-npm 6.23s 0.19s 0.92s 4.74s 92M 6.8K
utoo 5.35s 0.51s 0.72s 4.04s 90M 6.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 13.3K 904 - - 1.78G 1.90G 2M
utoo-npm 12.1K 81.9K - - 1.61G 1.84G 2M
utoo 12.3K 79.1K - - 1.61G 1.84G 2M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant