vdavid
diff --git a/‎apps/desktop/src-tauri/src/indexing/CLAUDE.md‎
Lines changed: 5 additions & 4 deletions b/‎apps/desktop/src-tauri/src/indexing/CLAUDE.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎apps/desktop/src-tauri/src/indexing/aggregator.rs‎
Lines changed: 2 additions & 0 deletions b/‎apps/desktop/src-tauri/src/indexing/aggregator.rs‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎apps/desktop/src-tauri/src/indexing/event_loop.rs‎
Lines changed: 10 additions & 0 deletions b/‎apps/desktop/src-tauri/src/indexing/event_loop.rs‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎apps/desktop/src-tauri/src/indexing/mod.rs‎
Lines changed: 22 additions & 1 deletion b/‎apps/desktop/src-tauri/src/indexing/mod.rs‎
Lines changed: 22 additions & 1 deletion
@@ -12,10 +12,10 @@ Full design: `docs/specs/drive-indexing/plan.md`
 - **enrichment.rs** -- `ReadPool` (lock-free thread-local read connections for enrichment and verification), `enrich_entries_with_index()` (called when entries are stored in the listing cache — streaming, watcher update, re-sort — NOT on `get_file_range`; index freshness is handled by `index-dir-updated` → `refreshIndexSizes` → `getDirStatsBatch`). Integer-keyed fast path: resolve parent dir once → batch-fetch child dir stats by ID → match by name. Falls back to individual path resolution for edge cases.
 - **event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch`. All bounded-buffer constants live here.
 - **events.rs** -- Tauri event payload structs (`IndexScanStartedEvent`, `IndexScanProgressEvent`, `IndexScanCompleteEvent`, `IndexDirUpdatedEvent`, `IndexReplayProgressEvent`, `IndexReplayCompleteEvent`), `RescanReason` enum, `emit_rescan_notification()`, IPC response types (`IndexStatusResponse`, `IndexDebugStatusResponse`). Also: `ActivityPhase` enum (Replaying/Scanning/Aggregating/Reconciling/Live/Idle) and `PhaseRecord` for the phase timeline system tracked in `DebugStats`.
-- **store.rs** -- SQLite schema v7 (integer-keyed entries with `name_folded` column on macOS, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` uses the composite index directly: on macOS queries by `(parent_id, name_folded)`, on Linux/Windows by `(parent_id, name)`. Schema version check: mismatch triggers drop+rebuild. v7 adds dual sizes (logical + physical). Both path-keyed (backward compat) and integer-keyed APIs.
+- **store.rs** -- SQLite schema v8 (integer-keyed entries with `name_folded` column on macOS, `inode` column for hardlink dedup, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` uses the composite index directly: on macOS queries by `(parent_id, name_folded)`, on Linux/Windows by `(parent_id, name)`. Schema version check: mismatch triggers drop+rebuild. v7 added dual sizes (logical + physical). v8 adds `inode INTEGER` column and `idx_inode` index for hardlink dedup at write time. `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes. Both path-keyed (backward compat) and integer-keyed APIs.
 - **memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
 - **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` auto-propagates deltas on both insert and update: on insert, propagates the full size (+file_count or +dir_count); on update, reads the old entry first and propagates only the size difference. This means callers never need a separate `PropagateDeltaById` for upserted entries. For new directories, also initializes a zero-valued `dir_stats` row so enrichment always has a row. Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance — the composite indexes (`idx_parent_name_folded` on macOS, `idx_parent_name` on Linux) use binary collation and stay present during scans.
-- **scanner.rs** -- jwalk-based parallel directory walker. `scan_volume()` for full scan, `scan_subtree()` for targeted subtree rescans (used by post-replay background verification). Uses `ScanContext` (from store.rs) to assign integer IDs and parent IDs during the walk: maintains a `HashMap<PathBuf, i64>` mapping directory paths to assigned IDs. The scan root is mapped to `ROOT_ID` (1). Sends `InsertEntriesV2(Vec<EntryRow>)` batches to the writer. Platform-specific exclusion filters via `should_exclude` (`pub(super)`) — the single exclusion gate for all code paths (scanner, reconciler, event_loop verification, per-navigation verifier). `default_exclusions()` is `#[cfg(test)]` only. Physical sizes (`st_blocks * 512`). Hardlink inode dedup: files with `nlink > 1` are tracked in a `HashSet<u64>` by inode; only the first link's size is counted, subsequent links get `size = None`. Files with `nlink == 1` (vast majority) skip the set entirely.
+- **scanner.rs** -- jwalk-based parallel directory walker. `scan_volume()` for full scan, `scan_subtree()` for targeted subtree rescans (used by post-replay background verification). Uses `ScanContext` (from store.rs) to assign integer IDs and parent IDs during the walk: maintains a `HashMap<PathBuf, i64>` mapping directory paths to assigned IDs. The scan root is mapped to `ROOT_ID` (1). Sends `InsertEntriesV2(Vec<EntryRow>)` batches to the writer. Platform-specific exclusion filters via `should_exclude` (`pub(super)`) — the single exclusion gate for all code paths (scanner, reconciler, event_loop verification, per-navigation verifier). `default_exclusions()` is `#[cfg(test)]` only. Physical sizes (`st_blocks * 512`). Hardlink inode dedup: files with `nlink > 1` are tracked in a `HashSet<u64>` by inode; only the first link's size is counted, subsequent links get `size = None`. Files with `nlink == 1` (vast majority) skip the set entirely. All files store `inode` in `EntryRow.inode` (from `MetadataExt::ino()` on Unix, `None` on non-Unix). Directories and symlinks get `inode: None`.
 - **aggregator.rs** -- Dir stats computation. Bottom-up after full scan (O(N) single pass), per-subtree after subtree rescans, incremental delta propagation up ancestor chain for watcher events. Two entry points for full aggregation: `compute_all_aggregates_reported` (loads maps from SQL) and `compute_all_aggregates_with_maps` (accepts pre-built maps from the writer). Both accept an `on_progress: &mut dyn FnMut(AggregationProgress)` callback and delegate to `compute_and_write()` for the shared topological sort + bottom-up computation + batch write. Progress is reported at phase transitions and every ~1% during compute/write loops. `AggregationPhase` enum: `SavingEntries` (flushing writer channel), `LoadingDirectories`, `Sorting`, `Computing`, `Writing`. (The former `RebuildingIndex` phase was removed when the composite `idx_parent_name` index with `platform_case` collation was replaced — now uses binary-collation composite indexes that don't need rebuilding.) `backfill_missing_dir_stats` is a catch-up pass that finds directories without `dir_stats` rows and computes their stats bottom-up; triggered after reconciler replay and cold-start replay via `BackfillMissingDirStats` writer message.
 - **watcher.rs** -- Drive-level filesystem watcher. macOS: FSEvents via `cmdr-fsevent-stream` with event IDs and `sinceWhen` replay. Linux: `notify` crate (inotify backend) with recursive watching and synthetic event counter. Other platforms: stub. `supports_event_replay()` lets callers branch on whether journal replay is available.
 - **reconciler.rs** -- Buffers FSEvents during scan (capped at 500K events; overflow sets `buffer_overflow` flag forcing full rescan), replays after scan completes using event IDs to skip stale events. Processes live events for file creates/removes/modifies using integer-keyed write messages (`UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`). Resolves filesystem paths to entry IDs via `store::resolve_path()` using a read connection passed by callers. Key functions (`process_fs_event`, `emit_dir_updated`) are `pub(super)` so `mod.rs` can call them directly during cold-start replay. `reconcile_subtree()` handles MustScanSubDirs by diffing filesystem vs DB directory-by-directory instead of delete-then-reinsert, making it safe to interrupt at any point.
@@ -76,14 +76,14 @@ All writes go through a dedicated `std::thread` via a bounded `sync_channel` (20
 
 Reads happen on separate WAL connections (any thread). A `ReadPool` provides thread-local read connections for enrichment and verification without contending on the `INDEXING` state-machine mutex.
 
-### SQLite schema (v7: integer-keyed, platform-conditional composite index)
+### SQLite schema (v8: integer-keyed, platform-conditional composite index, inode for hardlink dedup)
 
 One DB per volume. **Dev and prod use separate directories** (see AGENTS.md § Debugging):
 - **Prod**: `~/Library/Application Support/com.veszelovszki.cmdr/index-{volume_id}.db`
 - **Dev**: `~/Library/Application Support/com.veszelovszki.cmdr-dev/index-{volume_id}.db`
 
 Three tables:
-- `entries` (id INTEGER PK, parent_id, name COLLATE platform_case, [name_folded on macOS], is_directory, is_symlink, logical_size, physical_size, modified_at). Root sentinel: id=1, parent_id=0, name="".
+- `entries` (id INTEGER PK, parent_id, name COLLATE platform_case, [name_folded on macOS], is_directory, is_symlink, logical_size, physical_size, modified_at, inode). Root sentinel: id=1, parent_id=0, name="".
   - **macOS**: has a `name_folded TEXT NOT NULL` column storing `normalize_for_comparison(name)` (NFD + case fold). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)`.
   - **Linux/Windows**: no `name_folded` column. Index: `idx_parent_name ON entries (parent_id, name)`.
   - The old `idx_parent(parent_id)` from v5 is removed; the composite indexes replace it.
@@ -98,6 +98,7 @@ History of changes:
 - **Schema v5**: Replaced composite `UNIQUE INDEX idx_parent_name(parent_id, name)` with simple `INDEX idx_parent(parent_id)`. The composite index with `platform_case` collation was extremely slow to build (~25 min for 5.1M entries). A simple integer index needs no drop/recreate dance during scans.
 - **Schema v6**: Added `name_folded` column (macOS only) storing pre-computed `normalize_for_comparison(name)`. Replaced `idx_parent` with platform-conditional composite indexes: `idx_parent_name_folded(parent_id, name_folded)` on macOS, `idx_parent_name(parent_id, name)` on Linux/Windows. `resolve_component` now queries the index directly instead of fetching all children and matching in Rust.
 - **Schema v7**: Dual sizes. `entries.size` renamed to `entries.logical_size`, added `entries.physical_size`. `dir_stats.recursive_size` renamed to `dir_stats.recursive_logical_size`, added `dir_stats.recursive_physical_size`. Logical size = `meta.len()`, physical size = `st_blocks * 512` on Unix (both = `meta.len()` on non-Unix). The IPC boundary (`DirStats` struct) still exposes `recursive_size` mapped from `recursive_logical_size` to avoid frontend churn. `AccumulatorMaps.direct_stats` changed to 4-tuple `(logical_size_sum, physical_size_sum, file_count, dir_count)`.
+- **Schema v8**: Added `inode INTEGER` column to `entries` (after `modified_at`) for hardlink dedup. Added `idx_inode ON entries (inode)` index. `EntryRow` gains `inode: Option<u64>`. The scanner populates inode from `MetadataExt::ino()` for all files on Unix; dirs/symlinks get `None`. `has_sized_entry_for_inode()` enables the writer to check at upsert time whether another entry for the same inode already has non-NULL sizes, preventing overcounting when reconciler/verifier events overwrite the scanner's NULL-size dedup.
 
 ## How to test
 
 
@@ -643,6 +643,7 @@ mod tests {
             logical_size: None,
             physical_size: None,
             modified_at: None,
+            inode: None,
         }
     }
 
@@ -656,6 +657,7 @@ mod tests {
             logical_size: Some(size),
             physical_size: Some(size),
             modified_at: None,
+            inode: None,
         }
     }
 
 
@@ -997,6 +997,14 @@ fn verify_affected_dirs(affected_paths: &HashSet<String>, writer: &IndexWriter)
                 reconciler::entry_size_and_mtime(&metadata)
             };
 
+            #[cfg(unix)]
+            let (inode, nlink) = {
+                use std::os::unix::fs::MetadataExt;
+                (Some(metadata.ino()), Some(metadata.nlink()))
+            };
+            #[cfg(not(unix))]
+            let (inode, nlink) = (None, None);
+
             let _ = writer.send(WriteMessage::UpsertEntryV2 {
                 parent_id: *parent_id,
                 name,
@@ -1005,6 +1013,8 @@ fn verify_affected_dirs(affected_paths: &HashSet<String>, writer: &IndexWriter)
                 logical_size,
                 physical_size,
                 modified_at,
+                inode,
+                nlink,
             });
 
             // UpsertEntryV2 auto-propagates deltas in the writer.
 
@@ -1403,6 +1403,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1413,6 +1414,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 4,
@@ -1423,6 +1425,7 @@ mod tests {
                 logical_size: Some(100),
                 physical_size: Some(100),
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 5,
@@ -1433,6 +1436,7 @@ mod tests {
                 logical_size: Some(200),
                 physical_size: Some(200),
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 6,
@@ -1443,6 +1447,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 7,
@@ -1453,6 +1458,7 @@ mod tests {
                 logical_size: Some(300),
                 physical_size: Some(300),
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 8,
@@ -1463,6 +1469,7 @@ mod tests {
                 logical_size: Some(50),
                 physical_size: Some(50),
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert entries");
@@ -1538,6 +1545,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1548,6 +1556,7 @@ mod tests {
                 logical_size: Some(500),
                 physical_size: Some(500),
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert");
@@ -1596,6 +1605,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1606,6 +1616,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 4,
@@ -1616,6 +1627,7 @@ mod tests {
                 logical_size: Some(10),
                 physical_size: Some(10),
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert");
@@ -1644,6 +1656,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1654,6 +1667,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 4,
@@ -1664,6 +1678,7 @@ mod tests {
                 logical_size: Some(1000),
                 physical_size: Some(1000),
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert");
@@ -1684,7 +1699,7 @@ mod tests {
         assert_eq!(listing[0].recursive_dir_count, Some(0));
 
         // Phase 3: Simulate a watcher event (new file added via reconciler)
-        IndexStore::insert_entry_v2(&conn, 3, "notes.txt", false, false, Some(500), Some(500), None)
+        IndexStore::insert_entry_v2(&conn, 3, "notes.txt", false, false, Some(500), Some(500), None, None)
             .expect("insert new file");
 
         // Simulate delta propagation (as the writer would do)
@@ -1736,6 +1751,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1746,6 +1762,7 @@ mod tests {
                 logical_size: Some(5000),
                 physical_size: Some(5000),
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 4,
@@ -1756,6 +1773,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 5,
@@ -1766,6 +1784,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert");
@@ -1806,6 +1825,7 @@ mod tests {
                 logical_size: None,
                 physical_size: None,
                 modified_at: None,
+                inode: None,
             },
             EntryRow {
                 id: 3,
@@ -1816,6 +1836,7 @@ mod tests {
                 logical_size: Some(42),
                 physical_size: Some(42),
                 modified_at: None,
+                inode: None,
             },
         ];
         IndexStore::insert_entries_v2_batch(&conn, &entries).expect("insert");
Original file line number	Diff line number	Diff line change
`@@ -643,6 +643,7 @@ mod tests {`
`643`	`643`	`logical_size: None,`
`644`	`644`	`physical_size: None,`
`645`	`645`	`modified_at: None,`
	`646`	`+ inode: None,`
`646`	`647`	`}`
`647`	`648`	`}`
`648`	`649`
`@@ -656,6 +657,7 @@ mod tests {`
`656`	`657`	`logical_size: Some(size),`
`657`	`658`	`physical_size: Some(size),`
`658`	`659`	`modified_at: None,`
	`660`	`+ inode: None,`
`659`	`661`	`}`
`660`	`662`	`}`
`661`	`663`