Skip to content

Commit d125a24

Browse files
committed
Indexing: Eliminate enrichment lock contention
- `enrich_entries_with_index` no longer uses `try_lock` on `INDEXING`. Instead, a new `ReadPool` provides thread-local SQLite read connections — enrichment never blocks on the state-machine mutex. - `verify_affected_dirs` Phase 1 and `run_background_verification` dir-stat reads also migrated to `ReadPool`, removing the two biggest contention sources. - `ReadPool` uses a generation counter for invalidation: `stop_indexing` and `clear_index` bump the generation, causing thread-local connections to reopen on next use. - Removed now-dead `IndexPhase::store()` and `IndexManager::store()`. - `IndexStore::list_children` gated behind `#[cfg(test)]` (only used in tests; production code uses `list_children_on`). - 5 new tests: enrichment under contention, connection reuse, generation invalidation, cross-thread reads, shutdown safety. - Bugfix: `drag_image_detection::install(app.handle().clone())` → `app.clone()` (`app` is already `&AppHandle`).
1 parent 2680bae commit d125a24

5 files changed

Lines changed: 647 additions & 118 deletions

File tree

apps/desktop/src-tauri/src/indexing/CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Full design: `docs/specs/drive-indexing/plan.md`
88

99
### Module structure
1010

11-
- **mod.rs** -- Public API: `init()`, `start_indexing()`, `stop_indexing()`, `clear_index()`, `enrich_entries_with_index()`. `IndexManager` coordinates all subsystems, owns a `PathResolver` (LRU-cached path→ID mapping) for IPC commands. Global read-only store for enrichment. Enrichment uses an integer-keyed fast path: resolve parent dir once → batch-fetch child dir stats by ID → match by name. Falls back to individual path resolution for edge cases.
11+
- **mod.rs** -- Public API: `init()`, `start_indexing()`, `stop_indexing()`, `clear_index()`, `enrich_entries_with_index()`. `IndexManager` coordinates all subsystems, owns a `PathResolver` (LRU-cached path→ID mapping) for IPC commands. `ReadPool` provides lock-free thread-local read connections for enrichment and verification. Enrichment uses an integer-keyed fast path: resolve parent dir once → batch-fetch child dir stats by ID → match by name. Falls back to individual path resolution for edge cases.
1212
- **store.rs** -- SQLite schema v2 (integer-keyed entries, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. Schema version check: mismatch triggers drop+rebuild. Both path-keyed (backward compat) and integer-keyed APIs.
1313
- **path_resolver.rs** -- `PathResolver`: resolves filesystem paths to integer entry IDs via component-by-component walk with full-path LRU cache (50K entries). Case-aware `CacheKey` on macOS (NFD + case fold). Prefix-based invalidation for deletes/renames.
1414
- **memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
@@ -64,7 +64,7 @@ Enrichment (every get_file_range call):
6464

6565
All writes go through a dedicated `std::thread` via a bounded `sync_channel` (20K capacity). When the channel is full, senders block (backpressure). The writer thread owns the write connection and processes messages in order, prioritizing `UpdateDirStats` over `InsertEntries` for responsive micro-scan results.
6666

67-
Reads happen on separate WAL connections (any thread). The global read-only store (`GLOBAL_INDEX_STORE`) provides enrichment without passing `AppHandle` through the listing pipeline.
67+
Reads happen on separate WAL connections (any thread). A `ReadPool` provides thread-local read connections for enrichment and verification without contending on the `INDEXING` state-machine mutex.
6868

6969
### SQLite schema (v4: integer-keyed, incremental vacuum)
7070

@@ -140,15 +140,15 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
140140

141141
**Writer-side delete-with-propagation**: Both path-keyed (`DeleteEntry`/`DeleteSubtree`) and integer-keyed (`DeleteEntryById`/`DeleteSubtreeById`) handlers in the writer automatically read old data before deleting and propagate accurate negative deltas. The integer-keyed variants use `propagate_delta_by_id` which walks the `parent_id` chain via `get_parent_id` lookups. This means every deletion -- replay, live, verification -- gets correct dir_stats updates without callers needing to send separate `PropagateDelta` messages.
142142

143-
**Post-replay verification is bidirectional**: `verify_affected_dirs` checks both directions: (1) stale entries in DB but not on disk (sends `DeleteEntry`/`DeleteSubtree`), and (2) missing entries on disk but not in DB (sends `UpsertEntry` + `PropagateDelta` for files, collects directory paths for `scan_subtree`). New directories are scanned and their subtree totals propagated up the ancestor chain. Uses a two-phase pattern: Phase 1 holds the `GLOBAL_INDEX_STORE` lock for bulk SQLite reads into a `HashMap` (can take seconds with hundreds of affected dirs), Phase 2 does all disk I/O without any lock. `enrich_entries_with_index` uses `try_lock` to avoid blocking on Phase 1.
143+
**Post-replay verification is bidirectional**: `verify_affected_dirs` checks both directions: (1) stale entries in DB but not on disk (sends `DeleteEntry`/`DeleteSubtree`), and (2) missing entries on disk but not in DB (sends `UpsertEntry` + `PropagateDelta` for files, collects directory paths for `scan_subtree`). New directories are scanned and their subtree totals propagated up the ancestor chain. Uses a two-phase pattern: Phase 1 uses `ReadPool` for lock-free bulk SQLite reads into a `HashMap`, Phase 2 does all disk I/O without any lock. `run_background_verification`'s dir-stat reads also use `ReadPool`. No `INDEXING` lock is held during verification.
144144

145145
**Schema version mismatch drops the DB**: If `schema_version` in meta doesn't match what the code expects, the entire DB is deleted and rebuilt. No migration path (it's a cache, not user data).
146146

147147
**`verifier.rs` is a placeholder**: Per-navigation readdir diff is a future milestone. Currently just a TODO comment.
148148

149149
**Scan cancellation leaves partial data**: By design. `scan_completed_at` not set in meta, so next startup detects incomplete scan and runs fresh. No cleanup needed.
150150

151-
**Global read-only store uses `std::sync::Mutex`**: Not `RwLock`, because `rusqlite::Connection` is `Send` but not `Sync`. `enrich_entries_with_index` uses `try_lock` to avoid blocking the listing pipeline when `verify_affected_dirs` Phase 1 holds the lock during startup (hundreds of serial SQLite queries for affected dirs). If the lock is busy, enrichment is skipped and retried on subsequent `get_file_range` calls.
151+
**`ReadPool` replaces `INDEXING` lock for all read-only DB access**: Enrichment (`enrich_entries_with_index`), verification Phase 1 (`verify_affected_dirs`), and background verification dir-stat reads all use `get_read_pool()` + `pool.with_conn()` — thread-local SQLite connections with no lock contention. The `INDEXING` mutex now guards only lifecycle transitions and IPC commands that need `PathResolver`. `with_conn` uses `thread_local!` storage, so callers must not have `.await` points between obtaining the pool and completing the closure (async task migration would break thread affinity).
152152

153153
**Progress events use `tauri::async_runtime::spawn`**: Not `tokio::spawn`, because indexing can start from Tauri's synchronous `setup()` hook where no Tokio runtime context exists.
154154

0 commit comments

Comments
 (0)