perf(cache): single tar decompress per cache hit (PR-A of 5) by Exelord · Pull Request #88 · vznjs/vx

Exelord · 2026-05-16T14:02:10Z

Summary

Cache-hit path was decompressing the same tar.zst twice:

Cache.get(hash) reads + zstd-decompresses to extract stdout/stderr + entry list
Cache.restoreOutputs(hash, dir) reads + zstd-decompresses again to extract outputs/

The orchestrator (execute-task.ts:285-292) calls these back-to-back for the same hash on every cache hit. So every hit paid 2× the decompress cost.

Fix

Single-slot stash on Cache: get() decompresses once and parks bytes keyed by hash; restoreOutputs() consumes the slot when the hash matches, else falls back to a fresh decompress (so standalone callers + tests stay correct). Slot is evicted on consume and on close().

Behavioral test

Counts Bun.zstdDecompress invocations across three scenarios:

Scenario	Decompresses
`get(h) → restoreOutputs(h)` (the hot path)	1 (was 2)
`restoreOutputs(h)` standalone (no prior `get`)	1
`get(h1) → restoreOutputs(h2)` (stale slot)	2 (correct — fresh for h2)

Magnitude

Per-hit savings scale with artifact size: ~microseconds for tiny artifacts, ~1-5ms for typical build outputs (few hundred KB). On a fully-cached 200-task run that's ~200ms-1s back — a meaningful chunk of the gap to Turbo's 120ms baseline.

Test plan

All 19 baseline perf tests pass
477/477 total tests green
Lint + format clean
New zstdDecompress-counting test pins the fix as a behavioral invariant, so a future regression (e.g. someone removing the slot) fails loud

What's next

This is PR-A of a 5-PR perf push:

A (this PR) — single decompress per cache hit
B — manifest-based per-file skip on restore (Turbo parity, biggest win)
C — single root git ls-files instead of per-project
D — reverse-dependency scheduling sort (Nx parity)
E — batch cache-hit lookups in prepareRun

The baseline test file (tests/cache-baseline.test.ts) is included here as it was on the xxh3 branch but didn't make it into main via PR #87's merge.

https://claude.ai/code/session_016HXj6HW6bxSn8EYuKcxTD9

Generated by Claude Code

18 wall-clock budget tests covering every cache hot-path step: hash primitives: xxh3hex(64B string) — median < 3µs xxh3hex(64KB Uint8Array) — median < 100µs xxh3 seed-chain (10 fields) — median < 20µs hashFile (mtime+size fast path): cold (fresh path each call) — median < 8ms warm (1KB) — median < 30µs warm (1MB) — median < 30µs fast-path is ≥ 20× faster than cold (relative) Cache.key: empty inputs — median < 200µs 10 files (warm) — median < 2ms 100 files (warm) — median < 5ms 1000 files (warm) — median < 35ms scales near-linearly (1000/100 ratio ≤ 30×) save / restore (tar.zst): save empty archive — median < 30ms save 10 small outputs — median < 30ms restore 10 small outputs — median < 30ms SQLite writes: recordRun single — median < 5ms recordRuns batched (50) — median < 30ms batched is ≥ 3× faster per row than single (relative) Budgets are ~3-5× the observed p99 on the calibration box: generous enough to absorb CI-runner variance, tight enough to fail loud if someone accidentally swaps the hash back to SHA-256, drops a fast- path early-return, or introduces an O(N²) into the input loop. Median-over-N is the test signal (most noise-resistant); p99 / min / max get printed for diagnostics on failure. `VX_PERF_SCALE` env var multiplies every budget for slow CI runners; `VX_PERF=0` skips the whole suite during local iteration. Stable across 5 back-to-back runs on the calibration box.

`Cache.get()` and `Cache.restoreOutputs()` both decompressed the same tar.zst artifact independently. The orchestrator's cache-hit path (execute-task.ts:285-292) calls them back-to-back for the same hash — so every cache hit did two zstd decompress rounds where one would do. Fix: single-slot stash on the Cache instance. `get()` decompresses once and parks the bytes; `restoreOutputs()` consumes them when the hash matches, falling back to fresh decompress when it doesn't (so standalone callers and the test surface stay correct). The slot is keyed by hash, evicted on consume and on `close()`. Behavioral test added to `tests/cache-baseline.test.ts` monkey- patches `Bun.zstdDecompress` to count invocations across three scenarios: 1. get(h) → restoreOutputs(h) → 1 decompress (was 2) 2. restoreOutputs(h) standalone → 1 decompress 3. get(h1) → restoreOutputs(h2) stale → 2 decompresses (correct) Magnitude: on small artifacts the saving is microseconds; on larger artifacts (typical build outputs of a few hundred KB) it's 1-5ms per hit. For a fully-cached 200-task run that's ~200ms-1s back.

claude added 2 commits May 16, 2026 13:59

Exelord mentioned this pull request May 16, 2026

perf(cache): manifest-based per-file skip on restore + in-process tar (PR-B of 5) #89

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cache): single tar decompress per cache hit (PR-A of 5)#88

perf(cache): single tar decompress per cache hit (PR-A of 5)#88
Exelord wants to merge 2 commits into
mainfrom
claude/cache-single-decompress

Exelord commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Exelord commented May 16, 2026

Summary

Fix

Behavioral test

Magnitude

Test plan

What's next

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants