perf(cache): single tar decompress per cache hit (PR-A of 5)#88
Open
Exelord wants to merge 2 commits into
Open
perf(cache): single tar decompress per cache hit (PR-A of 5)#88Exelord wants to merge 2 commits into
Exelord wants to merge 2 commits into
Conversation
18 wall-clock budget tests covering every cache hot-path step:
hash primitives:
xxh3hex(64B string) — median < 3µs
xxh3hex(64KB Uint8Array) — median < 100µs
xxh3 seed-chain (10 fields) — median < 20µs
hashFile (mtime+size fast path):
cold (fresh path each call) — median < 8ms
warm (1KB) — median < 30µs
warm (1MB) — median < 30µs
fast-path is ≥ 20× faster than cold (relative)
Cache.key:
empty inputs — median < 200µs
10 files (warm) — median < 2ms
100 files (warm) — median < 5ms
1000 files (warm) — median < 35ms
scales near-linearly (1000/100 ratio ≤ 30×)
save / restore (tar.zst):
save empty archive — median < 30ms
save 10 small outputs — median < 30ms
restore 10 small outputs — median < 30ms
SQLite writes:
recordRun single — median < 5ms
recordRuns batched (50) — median < 30ms
batched is ≥ 3× faster per row than single (relative)
Budgets are ~3-5× the observed p99 on the calibration box: generous
enough to absorb CI-runner variance, tight enough to fail loud if
someone accidentally swaps the hash back to SHA-256, drops a fast-
path early-return, or introduces an O(N²) into the input loop.
Median-over-N is the test signal (most noise-resistant); p99 / min /
max get printed for diagnostics on failure. `VX_PERF_SCALE` env var
multiplies every budget for slow CI runners; `VX_PERF=0` skips the
whole suite during local iteration.
Stable across 5 back-to-back runs on the calibration box.
`Cache.get()` and `Cache.restoreOutputs()` both decompressed the same tar.zst artifact independently. The orchestrator's cache-hit path (execute-task.ts:285-292) calls them back-to-back for the same hash — so every cache hit did two zstd decompress rounds where one would do. Fix: single-slot stash on the Cache instance. `get()` decompresses once and parks the bytes; `restoreOutputs()` consumes them when the hash matches, falling back to fresh decompress when it doesn't (so standalone callers and the test surface stay correct). The slot is keyed by hash, evicted on consume and on `close()`. Behavioral test added to `tests/cache-baseline.test.ts` monkey- patches `Bun.zstdDecompress` to count invocations across three scenarios: 1. get(h) → restoreOutputs(h) → 1 decompress (was 2) 2. restoreOutputs(h) standalone → 1 decompress 3. get(h1) → restoreOutputs(h2) stale → 2 decompresses (correct) Magnitude: on small artifacts the saving is microseconds; on larger artifacts (typical build outputs of a few hundred KB) it's 1-5ms per hit. For a fully-cached 200-task run that's ~200ms-1s back.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cache-hit path was decompressing the same tar.zst twice:
Cache.get(hash)reads + zstd-decompresses to extract stdout/stderr + entry listCache.restoreOutputs(hash, dir)reads + zstd-decompresses again to extractoutputs/The orchestrator (
execute-task.ts:285-292) calls these back-to-back for the same hash on every cache hit. So every hit paid 2× the decompress cost.Fix
Single-slot stash on
Cache:get()decompresses once and parks bytes keyed by hash;restoreOutputs()consumes the slot when the hash matches, else falls back to a fresh decompress (so standalone callers + tests stay correct). Slot is evicted on consume and onclose().Behavioral test
Counts
Bun.zstdDecompressinvocations across three scenarios:get(h) → restoreOutputs(h)(the hot path)restoreOutputs(h)standalone (no priorget)get(h1) → restoreOutputs(h2)(stale slot)Magnitude
Per-hit savings scale with artifact size: ~microseconds for tiny artifacts, ~1-5ms for typical build outputs (few hundred KB). On a fully-cached 200-task run that's ~200ms-1s back — a meaningful chunk of the gap to Turbo's 120ms baseline.
Test plan
What's next
This is PR-A of a 5-PR perf push:
git ls-filesinstead of per-projectprepareRunThe baseline test file (
tests/cache-baseline.test.ts) is included here as it was on the xxh3 branch but didn't make it into main via PR #87's merge.https://claude.ai/code/session_016HXj6HW6bxSn8EYuKcxTD9
Generated by Claude Code