Skip to content

perf(cache): swap SHA-256 → xxHash3 (CACHE_VERSION v14→v15)#87

Merged
Exelord merged 1 commit into
mainfrom
claude/xxh3-hasher
May 16, 2026
Merged

perf(cache): swap SHA-256 → xxHash3 (CACHE_VERSION v14→v15)#87
Exelord merged 1 commit into
mainfrom
claude/xxh3-hasher

Conversation

@Exelord
Copy link
Copy Markdown
Member

@Exelord Exelord commented May 16, 2026

Summary

  • Swap cache-key hash from SHA-256 (Bun.CryptoHasher) to xxHash3 (Bun.hash.xxHash3) at every derivation site. ~5× faster on the cache-warm path that hashes hundreds of input files; cache keys shrink 64 hex → 16 hex (Turbo parity — Turbo uses xxh64 at the same width).
  • xxHash3 has no streaming Hasher API in Bun, so Cache.key() chains via the seed parameter: each xxh3(part, prevDigest) folds one field into the running digest. Equivalent to the old CryptoHasher.update(...).digest() pattern with no intermediate buffer.
  • hashFileFromDisk reads the whole file before hashing — fine for source files (typically < 1MB); the throughput win dominates over streaming.
  • New shared helper at src/util/hash.ts exporting xxh3 / xxh3hex / xxh3hexOf so cache, orchestrator, and workspace modules can all consume it.
  • SCHEMA_VERSION v12 → v13: file_hashes.sha256 column renamed to content_hash, and the migration path now DROPs stale tables before CREATE TABLE IF NOT EXISTS runs so column renames actually take effect on existing DBs.

Sites swapped

File Function
src/cache/cache.ts Cache.key(), hashFileFromDisk()
src/orchestrator/execute-task.ts hashTaskConfig(), computeGroupHash()
src/orchestrator/fingerprint.ts computeWorkspaceFingerprint()
src/workspace/project-loader.ts config-load module cache-bust

Why xxHash3 (not Turbo's xxh64)

Both are non-cryptographic 64-bit hashes with 16-hex output. xxh3 is the newer variant, ~5-6× faster than xxh64 on a modern x86 core. The bytes are opaque to the remote cache server — it's just a key string — so there's no parity reason to match Turbo's exact algorithm. Faster wins.

Test plan

  • bun test src/ tests/ — 462/462 green
  • bun src/bin.ts run lint — clean (one pre-existing unused-vars warning in the hardlink test, unrelated)
  • bun src/bin.ts run format — clean
  • Hardcoded SHA-256 hex assertions in tests/cache.test.ts + tests/cache-perf.test.ts rewritten to xxh3 references
  • Schema migration path verified: existing v12 DBs DROP the stale file_hashes table and recreate with content_hash column on first open

Docs

  • docs/caching.md — version history extended with the v14 → v15 entry
  • CLAUDE.md — decision log entry at the top
  • src/util/hash.ts — module-level comment explains the seed-chain pattern and why we picked xxh3

https://claude.ai/code/session_016HXj6HW6bxSn8EYuKcxTD9


Generated by Claude Code

…ON v14→v15)

xxHash3 is ~5× faster than SHA-256 on modern x86 and produces enough
entropy for our non-cryptographic uses: cache-key derivation, file
content fingerprinting, config-load module cache-busting. None need
collision resistance against an adversary — just uniqueness across
honest input. Cache keys shrink 64 hex → 16 hex; Turbo uses the same
width (xxh64 → hex(u64.to_be_bytes())).

`Bun.hash.xxHash3` has no streaming Hasher API, so `Cache.key()` chains
updates via the seed parameter — each `xxh3(part, prevDigest)` folds
one field into the running digest, equivalent to the old
`CryptoHasher.update(...).digest()` pattern with no intermediate
buffer. `hashFileFromDisk` reads the whole file before hashing — fine
for source files (typically < 1MB each); the throughput win dominates
on the cache-warm path that hashes hundreds of them.

Sites swapped:
- src/cache/cache.ts: Cache.key(), hashFileFromDisk()
- src/orchestrator/execute-task.ts: hashTaskConfig(), computeGroupHash()
- src/orchestrator/fingerprint.ts: computeWorkspaceFingerprint()
- src/workspace/project-loader.ts: config-load module cache-bust

New shared helper at src/util/hash.ts exports `xxh3`, `xxh3hex`,
`xxh3hexOf`.

SCHEMA_VERSION bumps v12 → v13 in the same change: `file_hashes.sha256`
column renamed to `content_hash`, and the schema-mismatch path now
DROPs stale tables before CREATE TABLE IF NOT EXISTS runs so column
renames take effect on existing DBs.
@Exelord Exelord force-pushed the claude/xxh3-hasher branch from ac96db9 to 327abb5 Compare May 16, 2026 13:35
@Exelord Exelord merged commit 1acec50 into main May 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants