Skip to content

Commit 3cead87

Browse files
committed
Git browser: per-file Modified in snapshots
- Snapshot listings (`.git/branches/main/src/`, `.git/commits/<sha>/`, etc.) now show each entry's last-touched commit date instead of the snapshot's commit date. - New `git/snapshot_dates.rs` module: walks commits backwards from the snapshot, diffs each against its first parent, and attributes the committer time to any pending top-level entry the diff touches. Stops when every entry is dated, after 1000 commits, or when the rev-walk runs out. - Initial commits short-circuit (every entry gets the initial commit's date). - Subdirs get the date of the most recent commit that touched any file underneath. - Process-global FIFO cache (50 keys), content-addressable so it never invalidates. - Bench: 100 entries × 5000 commits cold p95=21 ms (budget 200 ms), warm p95=2 µs cache hit. New ignored bench at 50k commits stays inside the 500 ms budget. - Tests: per-file dates on three-files / dir / cache-hit / cap-fallback / initial-commit fixtures. - Falls back to the snapshot date when the cap fires so the cell never reads as blank.
1 parent 04a4e59 commit 3cead87

6 files changed

Lines changed: 767 additions & 9 deletions

File tree

apps/desktop/src-tauri/src/file_system/git/CLAUDE.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ three new error variants (`ShallowBoundary`, `MissingObject`,
2727
| `stash.rs` | `list_stashes(repo_root)`, `resolve_stash_commit(handle, n)` – shells out to `git stash list -z` and `git rev-parse stash@{n}` (gix has no public stash API). `list_stashes` doesn't take a `RepoHandle` because the shell-out only needs `repo_root` for `git -C` |
2828
| `worktrees.rs` | `list_worktrees` – gix `Repository::worktrees()`. Each entry sets `redirect_to_path` to the worktree's working dir |
2929
| `submodules.rs` | `list_submodules` – gix `Repository::submodules()`. Each entry sets `redirect_to_path` to `<repo_root>/<rel-path>` |
30-
| `tree.rs` | `list_tree`, `get_tree_entry`, `lookup_blob_id`, `read_blob` – gix tree walks. Permissions reflect `EntryKind::BlobExecutable` so cross-volume copy preserves the executable bit |
30+
| `tree.rs` | `list_tree`, `get_tree_entry`, `lookup_blob_id`, `read_blob` – gix tree walks. Permissions reflect `EntryKind::BlobExecutable` so cross-volume copy preserves the executable bit. `list_tree` calls `snapshot_dates::decode_per_file_dates` for per-file Modified dates, falling back to the snapshot date |
31+
| `snapshot_dates.rs` | `decode_per_file_dates(commit, dir_path)` walks commits backwards from `commit`, diffs each against its first parent, and attributes the committer time to any pending top-level entry the diff touches. Capped at `MAX_COMMITS_PER_WALK` (1000). FIFO-bounded process-global cache keyed on `(commit_id, dir_path)` — cache is content-addressable so it never goes stale |
3132
| `read_blob.rs` | `GitBlobReadStream` – owns the full `Vec<u8>` and yields 256 KB chunks. See *Honest blob streaming* below |
3233
| `status.rs` | `list_status(repo, dir)` runs a full-repo `git status --porcelain=v2 -z` once per `.git/index` mtime, caches the result in a process-global `RwLock<HashMap<RepoRoot, CachedStatus>>`, and slices it by `dir`. The watcher invalidates the snapshot whenever `.git/*` changes. Parses porcelain v2 in `parse_porcelain_v2`. |
3334
| `watcher.rs` | `GitWatcherRegistry` – per-repo notify-rs debouncer. `subscribe(app, root)` returns the current `RepoInfo` synchronously and emits `git-state-changed` on relevant `.git/*` mutations. 200 ms debounce. M2: also calls `notify_directory_changed(.., FullRefresh)` for any cached `.git/{branches,tags}/` listings on the local volume |
@@ -134,8 +135,8 @@ Every virtual entry carries a real `modified_at` and most carry a `display_size`
134135
| `stash/<n>/` | stash creation date | `on main` (parsed from stash subject) | 0 |
135136
| `worktrees/<name>` (redirect) | worktree HEAD date | `on feature-x` or short SHA | 0 |
136137
| `submodules/<name>` (redirect) | pinned commit date | short SHA | 0 |
137-
| inside snapshots — files | snapshot commit date | None (blob bytes) | blob bytes |
138-
| inside snapshots — subdirs | snapshot commit date | None (recursive bytes) | recursive blob bytes |
138+
| inside snapshots — files | most recent commit that touched the file (fallback: snapshot commit date) | None (blob bytes) | blob bytes |
139+
| inside snapshots — subdirs | most recent commit that touched any file underneath (fallback: snapshot commit date) | None (recursive bytes) | recursive blob bytes |
139140

140141
Cross-category Size sort is meaningless (ahead-count vs files-changed vs item count); that's an honest tradeoff — each cell is self-explaining via `display_size_tooltip` (also used as the aria-label).
141142

@@ -146,6 +147,9 @@ The frontend reads `display_size` / `display_size_tooltip` from `FileEntry`; the
146147

147148
## Decisions
148149

150+
**Decision (M4 follow-up)**: Per-file Modified dates inside snapshot listings via walk-once batching
151+
**Why**: The snapshot date ("when this commit landed") is the same value for every file inside a `branches/main/`, `commits/<sha>/`, etc. listing — semantically correct as a "frozen point in time", but useless as a "when did I last work on this?" hint. We now run a single rev-walk per `(commit_id, dir_path)` listing: from the snapshot commit backwards by commit time, first-parent only, diffing each commit against its first parent (gix's `Tree::changes()::for_each_to_obtain_tree`). Each `Change.location` is matched against the directory's top-level entries; the first-seen commit's committer time wins. The walk stops early when every entry is dated, after `MAX_COMMITS_PER_WALK` (1000), or when the rev-walk exits. Initial commits short-circuit. Cache is process-global, FIFO-bounded at 50 keys, content-addressable so it never invalidates. Bench: 100 entries × 5000 commits cold p95=21 ms (budget 200 ms), warm p95=2 µs. 50k-commit fixture sits inside the 500 ms budget too. Entries that don't surface within the cap fall back to the snapshot date so the cell never reads as blank.
152+
149153
**Decision (M4 follow-up)**: Cache `list_status` results keyed by `.git/index` mtime
150154
**Why**: Status used to walk the worktree on every `listing-complete` (every nav,
151155
every diff). On a 50k-file repo that's ~75 ms per nav. We now run one full-repo

apps/desktop/src-tauri/src/file_system/git/bench.rs

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,116 @@ fn bench_list_branches_with_ahead_behind() {
215215
let _ = std::fs::remove_dir_all(&dir);
216216
}
217217

218+
// ── Per-file Modified dates inside snapshots ────────────────────────
219+
220+
/// Builds a deep-history fixture: `commits` commits, each touching one
221+
/// file in a 100-entry top-level dir. Used to bench the per-file date
222+
/// walk-once batching at "Cmdr-scale" (5000 commits) and "monorepo-scale"
223+
/// (50k commits).
224+
fn build_deep_history_fixture(commits: usize, top_level_files: usize, name: &str) -> PathBuf {
225+
let dir = std::env::temp_dir().join(format!("cmdr_bench_per_file_dates_{}_{}", name, std::process::id()));
226+
let _ = std::fs::remove_dir_all(&dir);
227+
std::fs::create_dir_all(&dir).unwrap();
228+
run(&dir, &["init", "-q", "-b", "main"]);
229+
run(&dir, &["config", "user.name", "Bench"]);
230+
run(&dir, &["config", "user.email", "bench@cmdr.local"]);
231+
// Seed `top_level_files` files in one commit so the listing has
232+
// something to walk over.
233+
for f in 0..top_level_files {
234+
std::fs::write(dir.join(format!("f{:04}.txt", f)), b"seed\n").unwrap();
235+
}
236+
run(&dir, &["add", "."]);
237+
run(&dir, &["commit", "-q", "-m", "seed"]);
238+
// Then `commits-1` follow-up commits. Each touches a single file in a
239+
// round-robin pattern so each top-level file gets its date set fairly
240+
// recently.
241+
for n in 1..commits {
242+
let f = n % top_level_files;
243+
std::fs::write(dir.join(format!("f{:04}.txt", f)), format!("change {}\n", n).as_bytes()).unwrap();
244+
run(&dir, &["add", "."]);
245+
run(&dir, &["commit", "-q", "-m", &format!("c{}", n)]);
246+
}
247+
dir
248+
}
249+
250+
#[test]
251+
#[ignore = "Slow: builds a 5000-commit fixture; opt-in via `cargo test -- --ignored`"]
252+
fn bench_per_file_dates_5k_commits_under_budget() {
253+
use super::path::Cat;
254+
use super::snapshot_dates;
255+
use super::virtual_listing;
256+
let dir = build_deep_history_fixture(5_000, 100, "5k");
257+
let (handle, _) = discover_repo(&dir).expect("discover");
258+
let commit = virtual_listing::resolve_ref_commit(&handle, Cat::Branches, "main").expect("resolve main");
259+
260+
// Cold: clear cache before each run.
261+
let mut cold = Vec::with_capacity(RUNS);
262+
for _ in 0..RUNS {
263+
snapshot_dates::clear_cache();
264+
let start = Instant::now();
265+
let dates = snapshot_dates::decode_per_file_dates(&handle, commit, "").expect("dates");
266+
cold.push(start.elapsed().as_micros());
267+
assert_eq!(dates.len(), 100, "every top-level entry dated");
268+
}
269+
let cold_p95 = percentile(cold.clone(), 95.0);
270+
let cold_p50 = percentile(cold.clone(), 50.0);
271+
eprintln!(
272+
"per_file_dates 100 entries / 5k commits (cold): p50={}ms p95={}ms (budget 200 ms)",
273+
cold_p50 / 1000,
274+
cold_p95 / 1000
275+
);
276+
assert!(cold_p95 / 1000 <= 200, "p95 over 200 ms budget: {}ms", cold_p95 / 1000);
277+
278+
// Warm: cache hit.
279+
snapshot_dates::clear_cache();
280+
let _ = snapshot_dates::decode_per_file_dates(&handle, commit, "").expect("warmup");
281+
let mut warm = Vec::with_capacity(RUNS);
282+
for _ in 0..RUNS {
283+
let start = Instant::now();
284+
let _ = snapshot_dates::decode_per_file_dates(&handle, commit, "").expect("dates");
285+
warm.push(start.elapsed().as_micros());
286+
}
287+
let warm_p95 = percentile(warm.clone(), 95.0);
288+
let warm_p50 = percentile(warm.clone(), 50.0);
289+
eprintln!(
290+
"per_file_dates (warm, cache hit): p50={}µs p95={}µs",
291+
warm_p50, warm_p95
292+
);
293+
assert!(warm_p95 <= 5_000, "warm p95 over 5 ms: {}µs", warm_p95);
294+
295+
let _ = std::fs::remove_dir_all(&dir);
296+
}
297+
298+
#[test]
299+
#[ignore = "Very slow: builds a 50k-commit fixture; opt-in via `cargo test -- --ignored`"]
300+
fn bench_per_file_dates_50k_commits_under_budget() {
301+
use super::path::Cat;
302+
use super::snapshot_dates;
303+
use super::virtual_listing;
304+
let dir = build_deep_history_fixture(50_000, 100, "50k");
305+
let (handle, _) = discover_repo(&dir).expect("discover");
306+
let commit = virtual_listing::resolve_ref_commit(&handle, Cat::Branches, "main").expect("resolve main");
307+
308+
let mut cold = Vec::with_capacity(RUNS);
309+
for _ in 0..RUNS {
310+
snapshot_dates::clear_cache();
311+
let start = Instant::now();
312+
let _ = snapshot_dates::decode_per_file_dates(&handle, commit, "").expect("dates");
313+
cold.push(start.elapsed().as_micros());
314+
}
315+
let cold_p95 = percentile(cold.clone(), 95.0);
316+
let cold_p50 = percentile(cold.clone(), 50.0);
317+
eprintln!(
318+
"per_file_dates 100 entries / 50k commits (cold, capped at {} commits walked): p50={}ms p95={}ms (budget 500 ms)",
319+
snapshot_dates::MAX_COMMITS_PER_WALK,
320+
cold_p50 / 1000,
321+
cold_p95 / 1000
322+
);
323+
assert!(cold_p95 / 1000 <= 500, "p95 over 500 ms budget: {}ms", cold_p95 / 1000);
324+
325+
let _ = std::fs::remove_dir_all(&dir);
326+
}
327+
218328
#[test]
219329
#[ignore = "Slow: builds a 200-commit fixture; opt-in via `cargo test -- --ignored`"]
220330
fn bench_list_commits_files_changed() {

apps/desktop/src-tauri/src/file_system/git/mod.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ pub mod log;
4242
pub mod path;
4343
pub mod read_blob;
4444
pub mod repo;
45+
pub mod snapshot_dates;
4546
pub mod stash;
4647
pub mod status;
4748
pub mod submodules;
@@ -59,6 +60,8 @@ mod m3_tests;
5960
#[cfg(test)]
6061
mod m4_tests;
6162
#[cfg(test)]
63+
mod snapshot_dates_tests;
64+
#[cfg(test)]
6265
mod tests;
6366

6467
#[allow(

0 commit comments

Comments
 (0)