You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `PathResolver` (50K-entry LRU cache for path→ID resolution) was redundant — `enrich_entries_with_index` already resolves paths uncached on every page fetch via `store::resolve_path`. The cache also had a latent staleness bug: invalidation methods were `#[cfg(test)]` only, so deleted/renamed paths returned stale IDs until LRU eviction.
- Delete `path_resolver.rs` (435 lines) and remove `lru` crate dependency
- `get_dir_stats`/`get_dir_stats_batch` now use `store::resolve_path` directly, signatures changed from `&mut self` to `&self`
- Module-level wrappers no longer need `&mut` on the `INDEXING` mutex guard
- Update CLAUDE.md: remove all `PathResolver` references
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/indexing/CLAUDE.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,11 @@ Full design: `docs/specs/drive-indexing/plan.md`
8
8
9
9
### Module structure
10
10
11
-
-**mod.rs** -- Public API (`init()`, `start_indexing()`, `stop_indexing()`, `clear_index()`), `IndexPhase` state machine, `IndexManager` (coordinates all subsystems, owns `PathResolver` for LRU-cached path→ID mapping), `DebugStats` (shared atomic counters for the debug window).
11
+
-**mod.rs** -- Public API (`init()`, `start_indexing()`, `stop_indexing()`, `clear_index()`), `IndexPhase` state machine, `IndexManager` (coordinates all subsystems), `DebugStats` (shared atomic counters for the debug window).
12
12
-**enrichment.rs** -- `ReadPool` (lock-free thread-local read connections for enrichment and verification), `enrich_entries_with_index()` (called every `get_file_range`). Integer-keyed fast path: resolve parent dir once → batch-fetch child dir stats by ID → match by name. Falls back to individual path resolution for edge cases.
13
13
-**event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch`. All bounded-buffer constants live here.
-**store.rs** -- SQLite schema v2 (integer-keyed entries, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. Schema version check: mismatch triggers drop+rebuild. Both path-keyed (backward compat) and integer-keyed APIs.
16
-
-**path_resolver.rs** -- `PathResolver`: resolves filesystem paths to integer entry IDs via component-by-component walk with full-path LRU cache (50K entries). Case-aware `CacheKey` on macOS (NFD + case fold). Prefix-based invalidation for deletes/renames.
17
16
-**memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
18
17
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` initializes a zero-valued `dir_stats` row when inserting a NEW directory, so enrichment always has a row (subsequent `PropagateDeltaById` calls update it incrementally). Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle).
19
18
-**scanner.rs** -- jwalk-based parallel directory walker. `scan_volume()` for full scan, `scan_subtree()` for targeted subtree rescans (used by post-replay background verification). Uses `ScanContext` (from store.rs) to assign integer IDs and parent IDs during the walk: maintains a `HashMap<PathBuf, i64>` mapping directory paths to assigned IDs. The scan root is mapped to `ROOT_ID` (1). Sends `InsertEntriesV2(Vec<EntryRow>)` batches to the writer. Platform-specific exclusion filters (macOS system paths, Linux virtual filesystems). Physical sizes (`st_blocks * 512`).
@@ -82,7 +81,7 @@ Three tables:
82
81
WAL mode, 16 MB page cache, `auto_vacuum = INCREMENTAL` (free pages reclaimed via `PRAGMA incremental_vacuum` after truncation). Custom `platform_case` collation registered on every connection: case-insensitive + NFD normalization on macOS, binary on Linux. **Opening the DB with the sqlite3 CLI will fail** on queries touching the name column (the collation isn't registered).
83
82
84
83
History of changes:
85
-
-**Schema v3**: Bumped from v2 to force DB rebuild after fixing orphan entry bug. Scanner, writer, aggregator, reconciler, enrichment, and IPC commands all fully migrated to integer keys. `IndexManager` owns a `PathResolver` for LRU-cached path→ID resolution in IPC commands (`get_dir_stats`, `get_dir_stats_batch`). Enrichment uses integer-keyed fast path: resolve parent once → batch child dir stats by ID. Reconciler sends integer-keyed messages exclusively. Old path-keyed `WriteMessage` variants and backward-compat shims (`ScannedEntry`, `DirStats`) still exist for post-replay verification — cleanup in milestone 6.
84
+
-**Schema v3**: Bumped from v2 to force DB rebuild after fixing orphan entry bug. Scanner, writer, aggregator, reconciler, enrichment, and IPC commands all fully migrated to integer keys. Enrichment uses integer-keyed fast path: resolve parent once → batch child dir stats by ID. Reconciler sends integer-keyed messages exclusively. Old path-keyed `WriteMessage` variants and backward-compat shims (`ScannedEntry`, `DirStats`) still exist for post-replay verification — cleanup in milestone 6.
86
85
-**Schema v4**: Bumped from v3 to enable `auto_vacuum = INCREMENTAL` (requires DB rebuild since the pragma must be set before table creation).
87
86
88
87
## How to test
@@ -98,8 +97,7 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
98
97
- Scanner: full scan with temp dir trees, exclusion filtering, cancellation
@@ -109,7 +107,7 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
109
107
110
108
**Enrichment uses integer-keyed batch lookup**: Instead of N individual `resolve_path()` calls (one per directory in the listing), `enrich_entries_with_index` resolves the parent directory once, queries `list_child_dir_ids_and_names(parent_id)` for all child dir IDs, then `get_dir_stats_batch_by_ids()`. Two indexed queries total instead of N. Falls back to individual path resolution for edge cases (for example, mixed-parent entries).
111
109
112
-
**IPC boundary stays path-based**: Frontend sends filesystem paths, backend resolves path→ID internally via `PathResolver`. No frontend changes needed. `IndexManager.get_dir_stats()` and `get_dir_stats_batch()` use the `PathResolver`'s LRU cache for efficient resolution.
110
+
**IPC boundary stays path-based**: Frontend sends filesystem paths, backend resolves path→ID internally via `store::resolve_path()`. No frontend changes needed.
113
111
114
112
**Physical sizes (`st_blocks * 512`)**: More meaningful for disk usage than logical size. May overcount ~10-20% for APFS clones (shared blocks). Volume usage bar uses `statfs()` for true totals.
115
113
@@ -155,15 +153,15 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
155
153
156
154
**Scan cancellation leaves partial data**: By design. `scan_completed_at` not set in meta, so next startup detects incomplete scan and runs fresh. No cleanup needed.
157
155
158
-
**`ReadPool` replaces `INDEXING` lock for all read-only DB access**: Enrichment (`enrich_entries_with_index` in `enrichment.rs`), verification Phase 1 (`verify_affected_dirs` in `event_loop.rs`), and background verification dir-stat reads all use `get_read_pool()` + `pool.with_conn()` — thread-local SQLite connections with no lock contention. The `INDEXING` mutex now guards only lifecycle transitions and IPC commands that need `PathResolver`. `with_conn` uses `thread_local!` storage, so callers must not have `.await` points between obtaining the pool and completing the closure (async task migration would break thread affinity).
156
+
**`ReadPool` replaces `INDEXING` lock for all read-only DB access**: Enrichment (`enrich_entries_with_index` in `enrichment.rs`), verification Phase 1 (`verify_affected_dirs` in `event_loop.rs`), and background verification dir-stat reads all use `get_read_pool()` + `pool.with_conn()` — thread-local SQLite connections with no lock contention. The `INDEXING` mutex now guards only lifecycle transitions and IPC commands needing the `IndexManager`'s read connection. `with_conn` uses `thread_local!` storage, so callers must not have `.await` points between obtaining the pool and completing the closure (async task migration would break thread affinity).
159
157
160
158
**Progress events use `tauri::async_runtime::spawn`**: Not `tokio::spawn`, because indexing can start from Tauri's synchronous `setup()` hook where no Tokio runtime context exists.
161
159
162
-
**`platform_case` collation must be registered on every connection**: The custom collation is not persisted in the DB file. Both `IndexStore::open()` and `open_write_connection()` register it. Forgetting to register before querying causes `no such collation sequence: platform_case` errors. On macOS it uses NFD normalization + case folding (matching APFS). On Linux it's binary (zero overhead). The `PathResolver`'s `CacheKey` uses the same normalization via `store::normalize_for_comparison()`.
160
+
**`platform_case` collation must be registered on every connection**: The custom collation is not persisted in the DB file. Both `IndexStore::open()` and `open_write_connection()` register it. Forgetting to register before querying causes `no such collation sequence: platform_case` errors. On macOS it uses NFD normalization + case folding (matching APFS). On Linux it's binary (zero overhead).
163
161
164
162
**Backward-compat shims resolve paths via component walk**: Old path-keyed functions (`get_entry`, `delete_entry`, `upsert_entry`, etc.) internally call `resolve_path()` which walks the tree component-by-component. This means parent directories MUST exist before inserting children. The aggregator's path-keyed `propagate_delta` and `compute_subtree_aggregates` also resolve paths internally. The reconciler no longer uses these shims -- it sends integer-keyed messages directly (milestone 4). Enrichment no longer uses the path-keyed `get_dir_stats_batch` -- it uses integer-keyed batch lookups via `list_child_dir_ids_and_names` + `get_dir_stats_batch_by_ids` (milestone 5). Remaining users of path-keyed shims: `verify_affected_dirs` (post-replay verification). Cleanup in milestone 6.
165
163
166
-
**Reconciler holds a read connection**: `process_fs_event`, `replay`, and `process_live_event` all require a `&Connection` parameter for path-to-ID resolution. Callers (event loops in `event_loop.rs`) open a read connection via `IndexStore::open_write_connection(writer.db_path())` at loop start and pass it through. This is a WAL-mode connection so it doesn't block the writer. The `IndexManager` also owns a `PathResolver` with LRU cache, used by IPC commands (`get_dir_stats`, `get_dir_stats_batch`) for cached resolution. The event loops don't use the `PathResolver` yet because they run in separate async tasks -- could be migrated in a future optimization pass.
164
+
**Reconciler holds a read connection**: `process_fs_event`, `replay`, and `process_live_event` all require a `&Connection` parameter for path-to-ID resolution. Callers (event loops in `event_loop.rs`) open a read connection via `IndexStore::open_write_connection(writer.db_path())` at loop start and pass it through. This is a WAL-mode connection so it doesn't block the writer.
167
165
168
166
**ScanContext maps scan root to ROOT_ID**: Both `scan_volume` and `scan_subtree` create a `ScanContext` that maps the scan root directory to `ROOT_ID` (1). This means all top-level entries under any scan root get `parent_id = ROOT_ID` in the DB. For subtree scans, the root is resolved to its existing entry ID (not ROOT_ID), and `DeleteDescendantsById` is sent before the scan starts. The `ScanContext` opens a temporary read connection to the DB to fetch `next_id` via `get_next_id()`.
0 commit comments