Skip to content

perf(pm): iter1 — HTTP layer (gzip+brotli, DNS, conn pool, fire-and-forget)#2854

Closed
elrrrrrrr wants to merge 13 commits intonextfrom
perf/iter1-http
Closed

perf(pm): iter1 — HTTP layer (gzip+brotli, DNS, conn pool, fire-and-forget)#2854
elrrrrrrr wants to merge 13 commits intonextfrom
perf/iter1-http

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

Iter1 on top of #2838 bundle. Adds 5 HTTP-layer commits: gzip+brotli, DNS round-robin + per-family rotate, hot HTTP/1.1 conn pool with no idle timeout + tcp_keepalive, fire-and-forget disk cache writes. Aim: close the 0.32s gap from bundle (3.03s) → #2834 (2.62s).

elrrrrrrr and others added 13 commits April 27, 2026 21:04
Copy the OnceMap util module from pm/util to ruborist/util so
the resolver can use it for per-name dedup of concurrent
manifest fetches. Foundation commit — no usage yet, that lands
in the worker-pool wiring commit.

The pm-side oncemap.rs stays for cloner/downloader users.
CI pcap analysis against npmjs.org revealed TLS CPU as a major
preload bottleneck. Per-handshake timing on CI:

                          utoo (ring)    bun (BoringSSL)
  CH → SH (1 RTT)           11 ms          10 ms    (network)
  CCS → first AppData       78 ms p50      12 ms p50
                           154 ms max      17 ms max
  TOTAL CH → AppData       162 ms         46 ms

The "CCS → AppData" phase is dominated by post-ChangeCipherSpec
client work (Finished MAC verify, state machine transition,
request dispatch). Observed at CI pcap capture:

  utoo: Client CCS spread 154ms (first conn done at 0.975s, last
        at 1.129s), then first AppData *all* fire within 11ms at
        ~1.13s — classic CPU-saturation pattern where 128 parallel
        TLS handshakes serialise across 4 blocking threads and HTTP
        dispatch is starved until crypto drains.
  bun:  CCS spread 51ms, AppData spread 43ms — dispatch flows
        smoothly as each conn completes.

`ring` (rustls' default provider) is pure Rust + hand-tuned
assembly; `aws-lc-rs` wraps BoringSSL's primitives which are more
aggressively optimised for x86_64 AES-NI + SHA-NI.

Reqwest 0.12's `rustls-tls-native-roots` feature pins `__rustls-ring`
via Cargo's feature unification — no way to override. Swap to
`rustls-tls-native-roots-no-provider`, add a direct `rustls` dep
with the `aws-lc-rs` feature, load native root certs via
`rustls-native-certs` and pass the `ClientConfig` into reqwest via
`use_preconfigured_tls`.

Local M2 (ARM, where ring's hand-tuned ARM assembly is already
near-optimal) shows neutral perf (~2.9s both providers). Waiting
on CI to confirm the x86_64 CCS→AppData gap closes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the OnceMap from #2832-foundation into UnifiedRegistry's
resolve_full_manifest. When N packages all transitively pull
on the same dep (e.g. 50 packages → react), only the first
hits the network; the others await the shared Arc<result>
on Notify and clone it on resolve.

DashMap + tokio::sync::Notify keeps the fast path (cache hit)
lock-free. cfg-gated on native; wasm path keeps the direct
fetch since dashmap + tokio sync are unnecessary there.

FullManifestResult derives Clone so OnceMap waiters can each
take owned copies.
N long-lived `tokio::spawn` workers pull work from a shared
lock-free `SegQueue`. Each worker is independent on tokio's
multi-thread runtime, so when one worker is parsing manifest
JSON (CPU-bound, simd_json), other workers continue driving
their network IO and other parses run on different cores.
Replaces `FuturesUnordered` where the main task owned all
preload futures and polled them cooperatively — every per-
future await continuation (including parse) ran serialised
in that single task.

The architectural fix alone (proven nop in PR #2830 with
spawn_local) needs the network-layer drivers (aws-lc-rs +
OnceMap, earlier in this stack) to actually express its
benefit. With them: TLS handshake CPU is fast (aws-lc-rs),
duplicate fetches are deduped (OnceMap), so worker-pool's
N independent task slots are doing actual unique work.

## MaybeSend / MaybeSync shim

`tokio::spawn` requires `Future: Send + 'static`, which
forces `RegistryClient` trait surface to grow `+ Send` on
its `impl Future` returns + `Self: Sync` / `Self::Error:
Send` on default impls. Native uses real Send/Sync; wasm32
uses no-op marker traits because `JsFuture` is `!Send` and
tasks run via `wasm_bindgen_futures::spawn_local`.

`PreloadRegistry` trait alias bundles `RegistryClient +
Clone + MaybeSend + MaybeSync + 'static` so the 5 resolver
entry points (`build_deps*`, `resolve*`, `run_preload_phase`)
read clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aws-lc-sys cc_builder.rs:692 panics on wasm32 target (no
support). Move the explicit rustls + aws-lc-rs dependency to
native-only; reqwest's rustls-tls-native-roots-no-provider
feature on wasm is a no-op since wasm uses the fetch API and
the browser's TLS stack, not rustls.
reqwest pins every new connection to the first resolved IP even when
DNS returns multiple A records. On registries backed by a CDN with
many IPs (antgroup returns 8, npm/Cloudflare returns 2-4) this means
all concurrent pool connections land on one IP, which caps effective
parallelism regardless of `pool_max_idle_per_host`.

Rotate the returned address list by an atomic counter on every
`resolve` call so reqwest's connect loop picks a different IP per
new connection. Connections end up uniformly distributed across all
A records returned by DNS.

Measured on ant-design / antgroup registry (cold deps, local):
- utoo-h1 (single IP): 5.38s HTTP phase, 120 conn on 1 IP
- utoo-h1 + DNS rotation: 3.95s HTTP phase, 8 IPs × 8 conn each
- bun baseline: 3.72s HTTP phase, 4 IPs × 64 conn each

Total deps-resolve wall time now matches bun (~3.3s vs 3.3s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pcap diff against bun revealed a bug in our DNS round-robin. bun
distributes its 256 preload connections cleanly as 64 per IP
across 4 Cloudflare edges. utoo dumped 66 of 128 connections
(51%) onto a single IP (104.16.5.34), leaving 11 other IPs with
only 5-6 connections each — effectively serialising a quarter of
the phase on one overloaded server-side instance.

Root cause: `getaddrinfo` for `registry.npmjs.org` returns 10
IPv6 + 12 IPv4 = 22 addresses, v6 first. Our rotation was
`offset % 22`, which for offsets 0..10 (= 11 of every 22) made
IPv6 the first candidate. On CI runners without IPv6 routing,
every one of those 11 offsets failed through the v6 prefix and
landed on the *same* first IPv4 entry (104.16.5.34). The other
11 v4 addresses only saw traffic from offsets 11..21 — one
connection each.

Math verified with the pcap: 128 offsets × (11/22) = 64
expected on v4[0]; observed 66 (plus some jitter from Happy
Eyeballs timing).

Fix: split addrs into v6 / v4, rotate each family independently,
then re-concatenate with the system resolver's original family
order preserved. Each new connection now cycles through all
reachable IPv4 addresses in turn instead of all piling up on the
first one after v6 gives up.

Expected p1_resolve impact: the overloaded server instance
currently serialising 66 TLS handshakes + request/response pairs
should see ~11 instead, and the 11 underutilised IPs pick up the
slack. Preliminary math: if server-side queue delay scaled
linearly with load, 66 → 11 could cut per-request tail latency
meaningfully, possibly closing a chunk of the 1.5 s p1 gap vs
bun (though some of that gap is still structural tokio/reqwest
overhead).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run 8 at cap=128: utoo p1_resolve 4.237s vs bun 1.997s. `send_us`
avg 59ms, p99 292ms, max 420ms — the tail is substantial and
consistent with TLS handshakes happening mid-phase when reqwest
closes idle conns after 90s default, then has to reopen when the
next BFS wave reuses the same host/IP.

Three conn-pool adjustments that target that tail:

- `pool_idle_timeout(None)`: never close idle conns during a
  single resolve. ant-design pulls 2730 manifests across ~12
  IPv6 addresses at cap=128 → ~11 hot conns per IP. Preserving
  them avoids re-handshake cost mid-run.
- `pool_max_idle_per_host(256 -> 1024)`: oversized vs current
  cap so reqwest never evicts a hot conn under pressure.
- `tcp_keepalive(Some(30s))`: tell the kernel to probe the
  socket so intermediate NAT/firewalls don't silently drop the
  TCP state during quiet gaps, forcing a silent reconnect.

The histogram will tell us whether this shifts `send_us` tail
numbers (p90/p99) without regressing the body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Profiling the remaining p1_resolve gap (utoo 5.57s vs bun 1.88s on CI)
pointed at disk cache serialisation as the last inline CPU sink on the
hot path: each resolved manifest called
`serde_json::to_string_pretty(...).await` from
`set_versions_to_disk` / `set_version_manifest_to_disk`, so the resolve
future couldn't return until the pretty-JSON encoder and fs write had
both finished. pcap showed the classic signature — active TCP streams
dipping to the mid-20s while the main preload task was busy serialising.

Three changes, all local to the two functions:
1. Convert both from `async fn` to plain `fn` that internally
   `tokio::spawn` the serialisation + write. The resolve path no longer
   awaits disk I/O; the cache populates in the background for the next
   run.
2. Swap `serde_json::to_string_pretty` for `serde_json::to_vec`. Compact
   encoding avoids the 2–3× overhead of pretty-printing, and skipping
   the intermediate `String` saves one allocation per write.
3. `set_version_manifest_to_disk` now takes `Arc<CoreVersionManifest>`
   instead of `&CoreVersionManifest`. Call sites already hold an
   `Arc<CoreVersionManifest>`, so they clone the Arc (atomic inc)
   instead of the spawn deep-cloning a 10–50 KB struct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr added benchmark Run pm-bench on PR bench-phases Run pm-bench-phases workflow labels Apr 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes dependency resolution by refactoring manifest preloading into a multi-threaded worker pool and implementing a OnceMap to deduplicate concurrent fetches. It also adopts the aws-lc-rs crypto provider for faster TLS handshakes and introduces round-robin DNS resolution. Feedback suggests optimizing the preloading statistics and DNS rotation logic to reduce allocations, and warns about the potential for lost cache updates when using fire-and-forget background tasks in a CLI context.

Comment on lines +301 to +307
stats.total_processed = {
let mut set: HashSet<String> = HashSet::with_capacity(processed.len());
for entry in processed.iter() {
set.insert(entry.key().clone());
}
set.len()
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DashSet already provides a .len() method which is much more efficient than cloning all entries into a new HashSet just to count them. This intermediate allocation is unnecessary.

    stats.total_processed = processed.len();

Comment on lines +113 to +140
fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> {
if addrs.is_empty() {
return Vec::new();
}
let rotate = |slice: &[SocketAddr]| -> Vec<SocketAddr> {
if slice.is_empty() {
return Vec::new();
}
let start = offset % slice.len();
slice[start..]
.iter()
.chain(&slice[..start])
.copied()
.collect()
};
let v6: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv6()).copied().collect();
let v4: Vec<SocketAddr> = addrs.iter().filter(|a| a.is_ipv4()).copied().collect();
let v6_rot = rotate(&v6);
let v4_rot = rotate(&v4);
// Preserve v6-first ordering if that's what the resolver gave us;
// Happy Eyeballs will still prefer v6 when it's reachable.
let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true);
if v6_first {
v6_rot.into_iter().chain(v4_rot).collect()
} else {
v4_rot.into_iter().chain(v6_rot).collect()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation performs multiple intermediate allocations (filtering into new vectors, then rotating into more vectors). Since this is called on every DNS resolution in a performance-sensitive path, consider using rotate_left in-place to reduce overhead.

    fn rotate_addrs(addrs: &[SocketAddr], offset: usize) -> Vec<SocketAddr> {
        if addrs.is_empty() {
            return Vec::new();
        }
        let mut v6: Vec<_> = addrs.iter().filter(|a| a.is_ipv6()).copied().collect();
        let mut v4: Vec<_> = addrs.iter().filter(|a| a.is_ipv4()).copied().collect();
        if !v6.is_empty() {
            let mid = offset % v6.len();
            v6.rotate_left(mid);
        }
        if !v4.is_empty() {
            let mid = offset % v4.len();
            v4.rotate_left(mid);
        }
        let v6_first = addrs.first().map(|a| a.is_ipv6()).unwrap_or(true);
        if v6_first {
            v6.extend(v4);
            v6
        } else {
            v4.extend(v6);
            v4
        }
    }

Comment on lines +337 to +352
tokio::spawn(async move {
let path = get_versions_cache_path(&cache_dir, &name);
if let Some(parent) = path.parent() {
let _ = tokio_fs_ext::create_dir_all(parent).await;
}
Err(e) => {
tracing::debug!("Failed to serialize versions for {name}: {e}");
match serde_json::to_vec(&info) {
Ok(bytes) => {
if let Err(e) = tokio_fs_ext::write(&path, bytes).await {
tracing::debug!("Failed to write versions cache for {name}: {e}");
}
}
Err(e) => {
tracing::debug!("Failed to serialize versions for {name}: {e}");
}
}
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Spawning a background task for cache writes is good for performance, but in a CLI context, these tasks may be terminated if the main process exits before they complete. Consider providing a way to join these tasks or using a more robust persistence mechanism if cache reliability is critical.

Comment on lines +405 to +420
tokio::spawn(async move {
let path = get_manifest_cache_path(&cache_dir, &name, &version);
if let Some(parent) = path.parent() {
let _ = tokio_fs_ext::create_dir_all(parent).await;
}
Err(e) => {
tracing::debug!("Failed to serialize manifest for {name}@{version}: {e}");
match serde_json::to_vec(&*manifest) {
Ok(bytes) => {
if let Err(e) = tokio_fs_ext::write(&path, bytes).await {
tracing::debug!("Failed to write manifest cache for {name}@{version}: {e}");
}
}
Err(e) => {
tracing::debug!("Failed to serialize manifest for {name}@{version}: {e}");
}
}
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Same as set_versions_to_disk, this fire-and-forget approach might lead to lost cache updates if the process exits quickly.

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 021e2e8 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 8.99s 0.27s 10.01s 9.70s 617M 306.5K
utoo-npm 10.05s 0.16s 11.21s 12.92s 1.17G 159.4K
utoo 10.29s 1.02s 13.36s 12.91s 1.35G 166.8K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 17.2K 16.9K 1.16G 6M 1.83G 1.72G 1M
utoo-npm 175.6K 158.7K 1.14G 5M 1.68G 1.68G 2M
utoo 152.0K 108.6K 1.13G 5M 1.68G 1.68G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.45s 0.09s 3.78s 1.08s 481M 171.0K
utoo-npm 5.89s 0.21s 5.09s 1.73s 429M 75.3K
utoo 3.19s 0.06s 7.06s 1.99s 593M 73.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 11.9K 3.3K 201M 3M 104M - 1M
utoo-npm 67.8K 2.5K 205M 2M 9M 5M 2M
utoo 41.2K 20.4K 196M 2M 7M 5M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 7.35s 0.86s 6.19s 9.47s 575M 197.5K
utoo-npm 8.54s 1.21s 5.49s 11.13s 830M 114.9K
utoo 7.59s 1.75s 5.54s 11.12s 869M 106.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 6.0K 7.5K 993M 4M 1.73G 1.73G 1M
utoo-npm 135.7K 87.3K 965M 3M 1.67G 1.67G 2M
utoo 133.2K 98.8K 965M 3M 1.67G 1.67G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.38s 0.15s 0.19s 2.41s 135M 32.4K
utoo-npm 2.33s 0.09s 0.57s 3.85s 82M 19.3K
utoo 2.18s 0.03s 0.56s 3.81s 84M 19.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 234 27 7M 18K 1.88G 1.72G 1M
utoo-npm 47.6K 21.3K 38K 25K 1.67G 1.67G 2M
utoo 49.7K 21.6K 41K 41K 1.68G 1.67G 2M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 30.11s 4.73s 9.38s 10.35s 528M 369.2K
utoo-npm 64.93s 39.06s 8.50s 14.18s 687M 106.2K
utoo 32.99s 12.14s 8.27s 13.39s 759M 116.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 108.2K 5.1K 1.12G 14M 1.84G 1.72G 2M
utoo-npm 296.7K 84.4K 1.00G 12M 1.67G 1.68G 2M
utoo 258.3K 98.3K 999M 10M 1.67G 1.67G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 1.63s 0.06s 3.92s 1.09s 559M 177.6K
utoo-npm 18.04s 20.80s 1.56s 0.81s 75M 16.1K
utoo 24.15s 30.71s 1.40s 0.68s 81M 17.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 5.4K 6.1K 152M 3M 107M - 2M
utoo-npm 49.6K 290 12M 2M - 4M 2M
utoo 35.2K 1.0K 16M 3M - 4M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 16.78s 0.46s 5.90s 9.43s 244M 92.9K
utoo-npm 21.33s 0.16s 6.14s 12.04s 613M 102.7K
utoo 22.75s 2.51s 6.31s 12.12s 750M 109.0K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 67.3K 4.6K 999M 9M 1.73G 1.73G 2M
utoo-npm 197.1K 101.5K 966M 7M 1.67G 1.67G 2M
utoo 198.2K 101.1K 985M 7M 1.67G 1.67G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.18s 0.11s 0.19s 2.39s 136M 31.9K
utoo-npm 2.24s 0.04s 0.58s 3.81s 82M 18.6K
utoo 2.20s 0.10s 0.60s 3.83s 83M 19.2K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 338 28 7M 46K 1.88G 1.72G 2M
utoo-npm 47.8K 20.9K 35K 12K 1.67G 1.67G 2M
utoo 49.3K 22.5K 38K 13K 1.67G 1.67G 2M

@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 021e2e8 · mac (macos-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 15.84s 2.09s 5.49s 14.28s 764M 49.3K
utoo-npm 14.32s 0.41s 7.37s 14.83s 1.04G 105.2K
utoo 13.80s 1.79s 8.41s 15.52s 1.28G 122.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 16.2K 150.9K - - 1.76G 1.91G 1M
utoo-npm 12.6K 367.2K - - 1.63G 1.87G 2M
utoo 5.9K 346.0K - - 1.63G 1.85G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.48s 0.19s 2.41s 0.97s 484M 31.6K
utoo-npm 5.30s 0.61s 3.73s 1.80s 539M 36.6K
utoo 4.39s 0.13s 4.83s 2.24s 650M 43.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 28 25.2K - - 110M - 1M
utoo-npm 8 77.7K - - 28M 5M 2M
utoo 42 87.3K - - 27M 5M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 14.61s 3.96s 3.20s 14.37s 506M 33.0K
utoo-npm 13.63s 2.81s 3.25s 13.12s 814M 78.7K
utoo 12.74s 3.65s 3.41s 13.87s 664M 74.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 2.5K 135.2K - - 1.70G 1.94G 1M
utoo-npm 1.5K 233.2K - - 1.61G 1.87G 2M
utoo 1.3K 234.4K - - 1.61G 1.87G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 4.03s 0.61s 0.09s 1.97s 51M 3.8K
utoo-npm 2.98s 0.26s 0.44s 2.45s 96M 7.1K
utoo 3.16s 0.86s 0.39s 2.49s 87M 6.6K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 18.7K 949 - - 1.86G 1.91G 1M
utoo-npm 12.6K 72.3K - - 1.61G 1.87G 2M
utoo 12.7K 72.1K - - 1.63G 1.87G 2M

npmmirror.com

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 38.03s 16.34s 5.87s 15.70s 554M 35.8K
utoo-npm 22.89s 1.84s 8.54s 16.41s 866M 87.3K
utoo 40.68s 32.15s 9.87s 18.47s 1006M 96.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 13.7K 159.6K - - 1.78G 1.89G 2M
utoo-npm 9.1K 493.7K - - 1.63G 1.88G 2M
utoo 4.9K 488.8K - - 1.63G 1.83G 2M

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 3.06s 0.09s 2.29s 1.21s 510M 33.2K
utoo-npm 15.54s 7.92s 5.35s 2.62s 544M 37.4K
utoo 24.11s 35.04s 6.27s 3.77s 567M 38.5K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 11 33.7K - - 111M - 2M
utoo-npm 20 81.2K - - 28M 4M 2M
utoo 41 102.4K - - 27M 4M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 23.46s 3.52s 4.84s 27.03s 322M 21.2K
utoo-npm 35.29s 11.37s 6.02s 23.60s 693M 77.6K
utoo 25.71s 1.89s 4.57s 15.72s 691M 75.9K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 1.9K 168.3K - - 1.70G 1.94G 2M
utoo-npm 1.6K 330.3K - - 1.60G 1.83G 2M
utoo 1.3K 372.6K - - 1.60G 1.83G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 5.12s 0.37s 0.12s 2.23s 49M 3.8K
utoo-npm 4.87s 1.78s 0.70s 3.51s 88M 6.5K
utoo 6.28s 0.38s 0.81s 4.71s 91M 6.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 14.5K 1.3K - - 1.87G 1.91G 2M
utoo-npm 13.0K 79.8K - - 1.61G 1.83G 2M
utoo 13.0K 81.8K - - 1.63G 1.83G 2M

@elrrrrrrr elrrrrrrr closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bench-phases Run pm-bench-phases workflow benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant