Skip to content

WIP: fast-index sidecar + parallel lstat for mega monorepos (9× faster git status)#1

Open
tobi wants to merge 3 commits into
masterfrom
fast-index-sidecar
Open

WIP: fast-index sidecar + parallel lstat for mega monorepos (9× faster git status)#1
tobi wants to merge 3 commits into
masterfrom
fast-index-sidecar

Conversation

@tobi
Copy link
Copy Markdown
Owner

@tobi tobi commented Apr 29, 2026

Results

Tested on Shopify monorepo (1.3M files, 236MB index, 219K directories):

Read operations

Command Before After Speedup
git status 5.74s 0.73s 7.9×
git ls-files 286ms 185ms 1.5×
git diff HEAD~10 2.18s 0.46s 4.7×
git diff HEAD~100 3.21s 1.60s 2.0×
Index load (isolated) 166ms 0.6ms 273×

Write operations

Command Before After Speedup
git commit 4.7s 1.2s 3.9×
git status after commit 2.6s 0.71s 3.7×
git add (single file) 1.1s 1.1s
Full add→commit→status 8.4s 3.0s 2.8×

What was slow and why

Bottleneck Root Cause Fix Savings
Index parsing (166ms) Big-endian byte-swap + alloc per entry × 1.3M Zero-copy native-endian sidecar 165ms
Tree traversal in diff (1.2s) Decompress all tree objects for diff-cached Load TREE extension → cache_tree has OIDs 1.2s
Untracked scan (3.9s) Serial dir lstat + no untracked cache Load UNTR extension + parallel dir validation 3.5s
cache_tree_update in commit (3.5s) TREE extension not loaded (wrong offset) Fix extension_offset to include TREE 3.5s
Tree sidecar write (1.8s) Rebuilt full HEAD tree after every status Skip when cache_tree loaded (redundant) 1.8s
fsmonitor overhead (129ms) O(n) loops for non-functional daemon Skip post_read_index_from for sidecar loads 129ms
name-hash-init (215ms) memihash computed on every load Pre-computed in sidecar 215ms
preload_index (340ms) Only 20 threads for 1.3M lstats 64 threads (machine has 63 CPUs) 165ms

Approach

.git/index.fast — Native-endian entry sidecar (273MB)

Store cache_entry structs in native format with an offset table. On read: mmap(MAP_PRIVATE) + one pointer-setup loop. Zero memcpy, zero byte-swapping, zero per-entry allocation.

  • Pre-computed memihash values for instant name-hash-init
  • Entries stored without CE_UPTODATE or CE_FSMONITOR_VALID (correctness)
  • Auto-generated on first access, invalidated + regenerated on index writes
  • Stat-based staleness check (index mtime + size)

.git/head-tree.fast — Flattened HEAD tree (26MB, optional)

Only generated when cache_tree is unavailable. Stores sorted (path, oid) array for O(n) diff-cached merge-scan. With TREE extension loaded correctly, this is rarely needed.

Extension loading

The sidecar stores the offset of the first extension (TREE) in the index file. On read, we mmap just the extension region (19MB: TREE 8MB + UNTR 11MB) instead of the full 237MB index. This loads:

  • TREE — cache_tree for instant cache_tree_update and fast traverse_trees
  • UNTR — untracked cache for skipping unchanged directories

Parallel lstat

  • preload_index: 64 threads (was 20)
  • Parallel untracked cache dir validation: collect 219K dir paths, dispatch to 64 threads

Write path

After every index write (git add, git commit, git status updating stat cache):

  • Sidecar is regenerated eagerly (273MB write, ~330ms)
  • Next read uses the fast sidecar path immediately
  • CE_UPTODATE and CE_FSMONITOR_VALID cleared in sidecar (must re-lstat on next read)

Design

.git/
├── index              (236MB — standard git format, untouched)
├── index.fast         (273MB — native-endian entry sidecar)
└── head-tree.fast     (26MB — optional, only when cache_tree unavailable)
  • Fully compatible: standard index format untouched, all commands work
  • Optional: sidecars auto-generate on first use, graceful fallback if missing/stale
  • Self-healing: stat-based staleness check, auto-regenerate when stale
  • Zero configuration: drop-in replacement binary

Key Insight

The original hypothesis was "index loading is the bottleneck." This was only 3% of git status time.

Phase Time % of status
Untracked file scanning 3.9s 68%
Tree traversal (diff-cached) 1.2s 21%
Index write-back 0.4s 7%
Index loading 0.17s 3%
fsmonitor overhead 0.13s 2%

The 273× index load speedup is impressive but only saves 165ms of a 5740ms operation. The real wins came from loading extensions (TREE, UNTR), fixing the extension offset bug, parallelizing lstats, and eliminating redundant tree sidecar writes.

Status

WIP — automated exploration (93 experiments, 30 kept). Code works and passes correctness checks. Needs:

  • Proper test coverage
  • Cold-cache benchmarks
  • Multi-platform testing (only tested Linux x86_64)
  • Review for edge cases (concurrent writers, NFS, partial clone)
  • Consider splitting into reviewable series

@tobi tobi force-pushed the fast-index-sidecar branch 4 times, most recently from cf008fa to 52cb91e Compare April 29, 2026 21:26
This is a WIP exploration of making git dramatically faster for mega
monorepos (1M+ files). On Shopify's monorepo (1.3M files, 236MB index):

  git status:   5.74s → 0.63s  (9.1× faster)
  git ls-files: 286ms → 147ms  (1.9× faster)
  index load:   166ms → 0.6ms  (273× faster)

Key changes:

1. Native-endian entry sidecar (.git/index.fast)
   - Zero-copy mmap: cache_entry structs stored in native format
   - Offset table for O(1) pointer setup (no entry parsing)
   - Pre-computed memihash (eliminates 215ms name-hash-init)
   - Auto-generated on first read, stat-based staleness check

2. HEAD tree sidecar (.git/head-tree.fast)
   - Flattened (path, oid) array for O(n) diff-cached
   - Replaces 1.2s tree object decompression with 8ms merge-scan

3. Status optimizations
   - Load untracked cache extension via partial mmap (11MB region)
   - Skip index write when only UNTR/FSMN extensions changed
   - Skip redundant tree sidecar write when HEAD unchanged
   - Skip post_read_index_from for sidecar loads (saves 129ms)
   - Skip FSMN extension (fsmonitor daemon unavailable)

4. Parallel lstat
   - preload_index: 20 → 64 threads (matches available CPUs)
   - Parallel untracked cache dir validation (219K dirs, 64 threads)

All sidecars are optional. If missing or stale, git falls back to
standard code paths and regenerates opportunistically. Standard git
commands remain fully compatible.

Signed-off-by: Tobi Lütke <tobi@shopify.com>
@tobi tobi force-pushed the fast-index-sidecar branch from e52f39a to 9516f9e Compare April 29, 2026 21:42
tobi added 2 commits April 29, 2026 18:13
MADV_HUGEPAGE is Linux-only (Transparent Huge Pages). Darwin's madvise(2)
has no equivalent — the kernel manages superpages on its own. Wrap the
hint in #ifdef so the build works on macOS.
Three bugs broke gitx for sparse-checkout + fsmonitor users (e.g. World):

1. mmap offset alignment was hardcoded to 4096. Apple Silicon uses 16K
   pages, so xmmap_gently() returned MAP_FAILED for any extension_offset
   not 16K-aligned and the entire extension block was skipped silently.
   Use sysconf(_SC_PAGESIZE) so it works on both 4K (Linux/x86_64) and
   16K (Apple Silicon) page sizes.

2. The sidecar load explicitly skipped the FSMN extension and set
   fsmonitor_has_run_once=1, which prevented post_read_index_from()
   from running tweak_fsmonitor(). With fsmonitor configured, status
   fell back to a full recursive readdir of every directory.

3. The skip also killed tweak_untracked_cache(), so UNTR was never
   attached to istate either — same recursive-readdir fallback even
   without fsmonitor.

After the fix on the Shopify monorepo (sparse, ~16% files present,
fsmonitor + untracked-cache enabled):

  git status:  340 ms
  gitx status: 355 ms (was 7.15 s — 20x improvement)

Also tightens the ext_end calculation that was assigned twice (the
first computation was dead code).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant