You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Schema v6 → v7: `entries` table now has `logical_size` and `physical_size` columns (was single `size`), `dir_stats` has `recursive_logical_size` and `recursive_physical_size`
- Scanner collects both `meta.len()` (logical) and `st_blocks * 512` (physical) per file
- All propagation paths updated: full aggregation, subtree, delta, backfill, live events, verification
- IPC boundary (`DirStats`) keeps `recursive_size` field name, mapped from `recursive_logical_size` — zero frontend changes
- Logical size is now the default display, fixing overcounting from hardlinks/APFS clones that affected physical-only sizes
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/indexing/CLAUDE.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Full design: `docs/specs/drive-indexing/plan.md`
12
12
-**enrichment.rs** -- `ReadPool` (lock-free thread-local read connections for enrichment and verification), `enrich_entries_with_index()` (called when entries are stored in the listing cache — streaming, watcher update, re-sort — NOT on `get_file_range`; index freshness is handled by `index-dir-updated` → `refreshIndexSizes` → `getDirStatsBatch`). Integer-keyed fast path: resolve parent dir once → batch-fetch child dir stats by ID → match by name. Falls back to individual path resolution for edge cases.
13
13
-**event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch`. All bounded-buffer constants live here.
14
14
-**events.rs** -- Tauri event payload structs (`IndexScanStartedEvent`, `IndexScanProgressEvent`, `IndexScanCompleteEvent`, `IndexDirUpdatedEvent`, `IndexReplayProgressEvent`, `IndexReplayCompleteEvent`), `RescanReason` enum, `emit_rescan_notification()`, IPC response types (`IndexStatusResponse`, `IndexDebugStatusResponse`). Also: `ActivityPhase` enum (Replaying/Scanning/Aggregating/Reconciling/Live/Idle) and `PhaseRecord` for the phase timeline system tracked in `DebugStats`.
15
-
-**store.rs** -- SQLite schema v6 (integer-keyed entries with `name_folded` column on macOS, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` uses the composite index directly: on macOS queries by `(parent_id, name_folded)`, on Linux/Windows by `(parent_id, name)`. Schema version check: mismatch triggers drop+rebuild. Both path-keyed (backward compat) and integer-keyed APIs.
15
+
-**store.rs** -- SQLite schema v7 (integer-keyed entries with `name_folded` column on macOS, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` uses the composite index directly: on macOS queries by `(parent_id, name_folded)`, on Linux/Windows by `(parent_id, name)`. Schema version check: mismatch triggers drop+rebuild. v7 adds dual sizes (logical + physical). Both path-keyed (backward compat) and integer-keyed APIs.
16
16
-**memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
17
17
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` initializes a zero-valued `dir_stats` row when inserting a NEW directory, so enrichment always has a row (subsequent `PropagateDeltaById` calls update it incrementally). Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance — the composite indexes (`idx_parent_name_folded` on macOS, `idx_parent_name` on Linux) use binary collation and stay present during scans.
18
18
-**scanner.rs** -- jwalk-based parallel directory walker. `scan_volume()` for full scan, `scan_subtree()` for targeted subtree rescans (used by post-replay background verification). Uses `ScanContext` (from store.rs) to assign integer IDs and parent IDs during the walk: maintains a `HashMap<PathBuf, i64>` mapping directory paths to assigned IDs. The scan root is mapped to `ROOT_ID` (1). Sends `InsertEntriesV2(Vec<EntryRow>)` batches to the writer. Platform-specific exclusion filters via `should_exclude` (`pub(super)`) — the single exclusion gate for all code paths (scanner, reconciler, event_loop verification, per-navigation verifier). `default_exclusions()` is `#[cfg(test)]` only. Physical sizes (`st_blocks * 512`).
@@ -76,18 +76,18 @@ All writes go through a dedicated `std::thread` via a bounded `sync_channel` (20
76
76
77
77
Reads happen on separate WAL connections (any thread). A `ReadPool` provides thread-local read connections for enrichment and verification without contending on the `INDEXING` state-machine mutex.
-`entries` (id INTEGER PK, parent_id, name COLLATE platform_case, [name_folded on macOS], is_directory, is_symlink, size, modified_at). Root sentinel: id=1, parent_id=0, name="".
86
+
-`entries` (id INTEGER PK, parent_id, name COLLATE platform_case, [name_folded on macOS], is_directory, is_symlink, logical_size, physical_size, modified_at). Root sentinel: id=1, parent_id=0, name="".
87
87
-**macOS**: has a `name_folded TEXT NOT NULL` column storing `normalize_for_comparison(name)` (NFD + case fold). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)`.
88
88
-**Linux/Windows**: no `name_folded` column. Index: `idx_parent_name ON entries (parent_id, name)`.
89
89
- The old `idx_parent(parent_id)` from v5 is removed; the composite indexes replace it.
WAL mode, 16 MB page cache, `auto_vacuum = INCREMENTAL` (free pages reclaimed via `PRAGMA incremental_vacuum` after truncation). Custom `platform_case` collation registered on every connection: case-insensitive + NFD normalization on macOS, binary on Linux. **Opening the DB with the sqlite3 CLI will fail** on queries touching the name column (the collation isn't registered).
@@ -97,6 +97,7 @@ History of changes:
97
97
-**Schema v4**: Bumped from v3 to enable `auto_vacuum = INCREMENTAL` (requires DB rebuild since the pragma must be set before table creation).
98
98
-**Schema v5**: Replaced composite `UNIQUE INDEX idx_parent_name(parent_id, name)` with simple `INDEX idx_parent(parent_id)`. The composite index with `platform_case` collation was extremely slow to build (~25 min for 5.1M entries). A simple integer index needs no drop/recreate dance during scans.
99
99
-**Schema v6**: Added `name_folded` column (macOS only) storing pre-computed `normalize_for_comparison(name)`. Replaced `idx_parent` with platform-conditional composite indexes: `idx_parent_name_folded(parent_id, name_folded)` on macOS, `idx_parent_name(parent_id, name)` on Linux/Windows. `resolve_component` now queries the index directly instead of fetching all children and matching in Rust.
100
+
-**Schema v7**: Dual sizes. `entries.size` renamed to `entries.logical_size`, added `entries.physical_size`. `dir_stats.recursive_size` renamed to `dir_stats.recursive_logical_size`, added `dir_stats.recursive_physical_size`. Logical size = `meta.len()`, physical size = `st_blocks * 512` on Unix (both = `meta.len()` on non-Unix). The IPC boundary (`DirStats` struct) still exposes `recursive_size` mapped from `recursive_logical_size` to avoid frontend churn. `AccumulatorMaps.direct_stats` changed to 4-tuple `(logical_size_sum, physical_size_sum, file_count, dir_count)`.
100
101
101
102
## How to test
102
103
@@ -124,7 +125,7 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
124
125
125
126
**IPC boundary stays path-based**: Frontend sends filesystem paths, backend resolves path→ID internally via `store::resolve_path()`. No frontend changes needed. IPC dir stats queries (`get_dir_stats`, `get_dir_stats_batch`) use `ReadPool` for lock-free reads, same as enrichment.
126
127
127
-
**Physical sizes (`st_blocks * 512`)**: More meaningful for disk usage than logical size. May overcount ~10-20% for APFS clones (shared blocks). Volume usage bar uses `statfs()` for true totals.
128
+
**Dual sizes (logical + physical)**: Both `meta.len()` (logical) and `st_blocks * 512` (physical) are stored. Logical size is displayed by default (mapped to `recursive_size` at the IPC boundary). Physical size is stored in DB but not yet exposed to the frontend. Physical sizes may overcount ~10-20% for APFS clones (shared blocks). Volume usage bar uses `statfs()` for true totals.
128
129
129
130
**MustScanSubDirs uses reconciliation, not delete-then-reinsert**: `reconcile_subtree()` diffs the filesystem against the DB directory-by-directory, only inserting/deleting/updating entries that changed. This is safe to interrupt at any point (no bulk delete phase that could leave the DB empty). For brand-new directories discovered during reconciliation, a `flush_blocking()` + re-resolve cycle ensures their IDs are available before recursing into them. `scanner::scan_subtree` (which uses destructive `DeleteDescendantsById`) is used by post-replay background verification for newly discovered directories.
0 commit comments