WIP: fast-index sidecar + parallel lstat for mega monorepos (9× faster git status)#1
Open
tobi wants to merge 3 commits into
Open
WIP: fast-index sidecar + parallel lstat for mega monorepos (9× faster git status)#1tobi wants to merge 3 commits into
tobi wants to merge 3 commits into
Conversation
cf008fa to
52cb91e
Compare
This is a WIP exploration of making git dramatically faster for mega monorepos (1M+ files). On Shopify's monorepo (1.3M files, 236MB index): git status: 5.74s → 0.63s (9.1× faster) git ls-files: 286ms → 147ms (1.9× faster) index load: 166ms → 0.6ms (273× faster) Key changes: 1. Native-endian entry sidecar (.git/index.fast) - Zero-copy mmap: cache_entry structs stored in native format - Offset table for O(1) pointer setup (no entry parsing) - Pre-computed memihash (eliminates 215ms name-hash-init) - Auto-generated on first read, stat-based staleness check 2. HEAD tree sidecar (.git/head-tree.fast) - Flattened (path, oid) array for O(n) diff-cached - Replaces 1.2s tree object decompression with 8ms merge-scan 3. Status optimizations - Load untracked cache extension via partial mmap (11MB region) - Skip index write when only UNTR/FSMN extensions changed - Skip redundant tree sidecar write when HEAD unchanged - Skip post_read_index_from for sidecar loads (saves 129ms) - Skip FSMN extension (fsmonitor daemon unavailable) 4. Parallel lstat - preload_index: 20 → 64 threads (matches available CPUs) - Parallel untracked cache dir validation (219K dirs, 64 threads) All sidecars are optional. If missing or stale, git falls back to standard code paths and regenerates opportunistically. Standard git commands remain fully compatible. Signed-off-by: Tobi Lütke <tobi@shopify.com>
e52f39a to
9516f9e
Compare
MADV_HUGEPAGE is Linux-only (Transparent Huge Pages). Darwin's madvise(2) has no equivalent — the kernel manages superpages on its own. Wrap the hint in #ifdef so the build works on macOS.
Three bugs broke gitx for sparse-checkout + fsmonitor users (e.g. World): 1. mmap offset alignment was hardcoded to 4096. Apple Silicon uses 16K pages, so xmmap_gently() returned MAP_FAILED for any extension_offset not 16K-aligned and the entire extension block was skipped silently. Use sysconf(_SC_PAGESIZE) so it works on both 4K (Linux/x86_64) and 16K (Apple Silicon) page sizes. 2. The sidecar load explicitly skipped the FSMN extension and set fsmonitor_has_run_once=1, which prevented post_read_index_from() from running tweak_fsmonitor(). With fsmonitor configured, status fell back to a full recursive readdir of every directory. 3. The skip also killed tweak_untracked_cache(), so UNTR was never attached to istate either — same recursive-readdir fallback even without fsmonitor. After the fix on the Shopify monorepo (sparse, ~16% files present, fsmonitor + untracked-cache enabled): git status: 340 ms gitx status: 355 ms (was 7.15 s — 20x improvement) Also tightens the ext_end calculation that was assigned twice (the first computation was dead code).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Results
Tested on Shopify monorepo (1.3M files, 236MB index, 219K directories):
Read operations
git statusgit ls-filesgit diff HEAD~10git diff HEAD~100Write operations
git commitgit statusafter commitgit add(single file)What was slow and why
Approach
.git/index.fast— Native-endian entry sidecar (273MB)Store
cache_entrystructs in native format with an offset table. On read:mmap(MAP_PRIVATE)+ one pointer-setup loop. Zero memcpy, zero byte-swapping, zero per-entry allocation.memihashvalues for instant name-hash-initCE_UPTODATEorCE_FSMONITOR_VALID(correctness).git/head-tree.fast— Flattened HEAD tree (26MB, optional)Only generated when cache_tree is unavailable. Stores sorted
(path, oid)array for O(n) diff-cached merge-scan. With TREE extension loaded correctly, this is rarely needed.Extension loading
The sidecar stores the offset of the first extension (TREE) in the index file. On read, we mmap just the extension region (19MB: TREE 8MB + UNTR 11MB) instead of the full 237MB index. This loads:
cache_tree_updateand fasttraverse_treesParallel lstat
preload_index: 64 threads (was 20)Write path
After every index write (
git add,git commit,git statusupdating stat cache):CE_UPTODATEandCE_FSMONITOR_VALIDcleared in sidecar (must re-lstat on next read)Design
Key Insight
The original hypothesis was "index loading is the bottleneck." This was only 3% of
git statustime.The 273× index load speedup is impressive but only saves 165ms of a 5740ms operation. The real wins came from loading extensions (TREE, UNTR), fixing the extension offset bug, parallelizing lstats, and eliminating redundant tree sidecar writes.
Status
WIP — automated exploration (93 experiments, 30 kept). Code works and passes correctness checks. Needs: