Skip to content

perf(cache): single tar decompress per cache hit (PR-A of 5)#88

Open
Exelord wants to merge 2 commits into
mainfrom
claude/cache-single-decompress
Open

perf(cache): single tar decompress per cache hit (PR-A of 5)#88
Exelord wants to merge 2 commits into
mainfrom
claude/cache-single-decompress

Conversation

@Exelord
Copy link
Copy Markdown
Member

@Exelord Exelord commented May 16, 2026

Summary

Cache-hit path was decompressing the same tar.zst twice:

  • Cache.get(hash) reads + zstd-decompresses to extract stdout/stderr + entry list
  • Cache.restoreOutputs(hash, dir) reads + zstd-decompresses again to extract outputs/

The orchestrator (execute-task.ts:285-292) calls these back-to-back for the same hash on every cache hit. So every hit paid 2× the decompress cost.

Fix

Single-slot stash on Cache: get() decompresses once and parks bytes keyed by hash; restoreOutputs() consumes the slot when the hash matches, else falls back to a fresh decompress (so standalone callers + tests stay correct). Slot is evicted on consume and on close().

Behavioral test

Counts Bun.zstdDecompress invocations across three scenarios:

Scenario Decompresses
get(h) → restoreOutputs(h) (the hot path) 1 (was 2)
restoreOutputs(h) standalone (no prior get) 1
get(h1) → restoreOutputs(h2) (stale slot) 2 (correct — fresh for h2)

Magnitude

Per-hit savings scale with artifact size: ~microseconds for tiny artifacts, ~1-5ms for typical build outputs (few hundred KB). On a fully-cached 200-task run that's ~200ms-1s back — a meaningful chunk of the gap to Turbo's 120ms baseline.

Test plan

  • All 19 baseline perf tests pass
  • 477/477 total tests green
  • Lint + format clean
  • New zstdDecompress-counting test pins the fix as a behavioral invariant, so a future regression (e.g. someone removing the slot) fails loud

What's next

This is PR-A of a 5-PR perf push:

  • A (this PR) — single decompress per cache hit
  • B — manifest-based per-file skip on restore (Turbo parity, biggest win)
  • C — single root git ls-files instead of per-project
  • D — reverse-dependency scheduling sort (Nx parity)
  • E — batch cache-hit lookups in prepareRun

The baseline test file (tests/cache-baseline.test.ts) is included here as it was on the xxh3 branch but didn't make it into main via PR #87's merge.

https://claude.ai/code/session_016HXj6HW6bxSn8EYuKcxTD9


Generated by Claude Code

claude added 2 commits May 16, 2026 13:59
18 wall-clock budget tests covering every cache hot-path step:

  hash primitives:
    xxh3hex(64B string)          — median < 3µs
    xxh3hex(64KB Uint8Array)     — median < 100µs
    xxh3 seed-chain (10 fields)  — median < 20µs

  hashFile (mtime+size fast path):
    cold (fresh path each call)  — median < 8ms
    warm (1KB)                   — median < 30µs
    warm (1MB)                   — median < 30µs
    fast-path is ≥ 20× faster than cold (relative)

  Cache.key:
    empty inputs                 — median < 200µs
    10 files (warm)              — median < 2ms
    100 files (warm)             — median < 5ms
    1000 files (warm)            — median < 35ms
    scales near-linearly (1000/100 ratio ≤ 30×)

  save / restore (tar.zst):
    save empty archive           — median < 30ms
    save 10 small outputs        — median < 30ms
    restore 10 small outputs     — median < 30ms

  SQLite writes:
    recordRun single             — median < 5ms
    recordRuns batched (50)      — median < 30ms
    batched is ≥ 3× faster per row than single (relative)

Budgets are ~3-5× the observed p99 on the calibration box: generous
enough to absorb CI-runner variance, tight enough to fail loud if
someone accidentally swaps the hash back to SHA-256, drops a fast-
path early-return, or introduces an O(N²) into the input loop.

Median-over-N is the test signal (most noise-resistant); p99 / min /
max get printed for diagnostics on failure. `VX_PERF_SCALE` env var
multiplies every budget for slow CI runners; `VX_PERF=0` skips the
whole suite during local iteration.

Stable across 5 back-to-back runs on the calibration box.
`Cache.get()` and `Cache.restoreOutputs()` both decompressed the same
tar.zst artifact independently. The orchestrator's cache-hit path
(execute-task.ts:285-292) calls them back-to-back for the same hash —
so every cache hit did two zstd decompress rounds where one would do.

Fix: single-slot stash on the Cache instance. `get()` decompresses
once and parks the bytes; `restoreOutputs()` consumes them when the
hash matches, falling back to fresh decompress when it doesn't (so
standalone callers and the test surface stay correct). The slot is
keyed by hash, evicted on consume and on `close()`.

Behavioral test added to `tests/cache-baseline.test.ts` monkey-
patches `Bun.zstdDecompress` to count invocations across three
scenarios:
  1. get(h) → restoreOutputs(h)         → 1 decompress (was 2)
  2. restoreOutputs(h) standalone       → 1 decompress
  3. get(h1) → restoreOutputs(h2) stale → 2 decompresses (correct)

Magnitude: on small artifacts the saving is microseconds; on larger
artifacts (typical build outputs of a few hundred KB) it's 1-5ms per
hit. For a fully-cached 200-task run that's ~200ms-1s back.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants