Skip to content

Commit da74290

Browse files
committed
Indexing: Speed up full scan bulk inserts
- Drop `idx_parent_name` before bulk inserts, recreate after - Eliminates ~110M `platform_case` collation calls from incremental B-tree maintenance during `INSERT OR REPLACE` - Index is rebuilt in a single sort pass after all entries are inserted - `RecreateNameIndex` sent unconditionally (even on cancellation) so live-mode writes always have the index
1 parent 2808249 commit da74290

4 files changed

Lines changed: 44 additions & 0 deletions

File tree

apps/desktop/src-tauri/src/indexing/CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
117117

118118
**MustScanSubDirs uses reconciliation, not delete-then-reinsert**: `reconcile_subtree()` diffs the filesystem against the DB directory-by-directory, only inserting/deleting/updating entries that changed. This is safe to interrupt at any point (no bulk delete phase that could leave the DB empty). For brand-new directories discovered during reconciliation, a `flush_blocking()` + re-resolve cycle ensures their IDs are available before recursing into them. `scanner::scan_subtree` (which uses destructive `DeleteDescendantsById`) is still used by micro-scans but no longer by MustScanSubDirs.
119119

120+
**Bulk insert drops `idx_parent_name`, recreates after**: During a full scan the table is truncated first, so the unique index serves no purpose during inserts — IDs are pre-assigned by `ScanContext` and there can't be conflicts. Dropping the index before bulk inserts eliminates ~110M `platform_case` collation calls (NFD + case fold on macOS) from incremental B-tree maintenance. The index is rebuilt in a single pass after all entries are inserted. `RecreateNameIndex` is sent unconditionally (even on cancellation) to ensure live-mode writes always have the index available. Enrichment queries during the scan may be slower (no `parent_id` prefix index), but the table starts empty and the UI shows "Scanning..." during this window.
121+
120122
**In-memory accumulation eliminates aggregation SQL queries**: During a full scan, the writer thread accumulates two HashMaps in `AccumulatorMaps` as `InsertEntriesV2` batches arrive: `direct_stats` (parent_id -> file size/count/dir count) and `child_dirs` (parent_id -> child dir IDs). When `ComputeAllAggregates` fires, these maps are passed to `compute_all_aggregates_with_maps()`, skipping the two expensive full-table-scan SQL queries (`bulk_get_children_stats_by_id` and `bulk_get_child_dir_ids`) that previously dominated aggregation time (~70%). Maps are cleared on `TruncateData` and after aggregation completes. Falls back to SQL queries if maps are empty.
121123

122124
**Subtree aggregation uses scoped queries**: `scoped_get_children_stats_by_id` and `scoped_get_child_dir_ids` in `aggregator.rs` use recursive CTEs scoped to the target subtree, not full-table scans. This keeps subtree aggregation O(subtree_size) regardless of total DB size.

apps/desktop/src-tauri/src/indexing/mod.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,13 @@ impl IndexManager {
402402
log::warn!("Failed to flush after TruncateData: {e}");
403403
}
404404

405+
// Drop the unique name index before bulk inserts. Without it, each INSERT
406+
// only touches the integer PK B-tree. The index is recreated by the scanner
407+
// thread after all entries are inserted (or after cancellation).
408+
if let Err(e) = self.writer.send(WriteMessage::DropNameIndex) {
409+
log::warn!("Failed to send DropNameIndex: {e}");
410+
}
411+
405412
// Step 1: Start the FSEvents watcher BEFORE the scan so we don't miss events
406413
let (event_tx, event_rx) = tokio::sync::mpsc::channel(WATCHER_CHANNEL_CAPACITY);
407414
let scan_start_event_id = watcher::current_event_id();

apps/desktop/src-tauri/src/indexing/scanner.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,12 @@ pub fn scan_volume(
201201
true, // volume scan: root always maps to ROOT_ID
202202
);
203203

204+
// Always recreate the name index after bulk inserts, whether the
205+
// scan completed or was cancelled. Live-mode writes need it.
206+
if let Err(e) = writer.send(WriteMessage::RecreateNameIndex) {
207+
log::warn!("Scanner: failed to send RecreateNameIndex: {e}");
208+
}
209+
204210
// Trigger full aggregation if scan completed without cancellation
205211
if let Ok(ref s) = summary
206212
&& !s.was_cancelled

apps/desktop/src-tauri/src/indexing/writer.rs

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,14 @@ pub enum WriteMessage {
9292
/// Used before a full rescan on a stale DB to avoid slow `INSERT OR REPLACE`
9393
/// on a populated table with the expensive `platform_case` collation.
9494
TruncateData,
95+
/// Drop `idx_parent_name` before a full scan's bulk inserts.
96+
/// Eliminates expensive `platform_case` collation during B-tree maintenance.
97+
/// The scanner sends `RecreateNameIndex` after all entries are inserted.
98+
DropNameIndex,
99+
/// Recreate `idx_parent_name` after bulk inserts complete (or scan cancellation).
100+
/// Must be sent before live-mode writes resume, since `UpsertEntryV2` and
101+
/// `resolve_component` depend on this index.
102+
RecreateNameIndex,
95103
/// Begin an explicit SQLite transaction.
96104
/// All subsequent writes are batched until `CommitTransaction`.
97105
/// Dramatically reduces fsync overhead for bulk operations (replay).
@@ -572,6 +580,27 @@ fn process_message(
572580
Err(e) => log::warn!("Writer: truncate failed: {e}"),
573581
}
574582
}
583+
WriteMessage::DropNameIndex => {
584+
let t = Instant::now();
585+
if let Err(e) = conn.execute_batch("DROP INDEX IF EXISTS idx_parent_name") {
586+
log::warn!("Writer: DROP INDEX idx_parent_name failed: {e}");
587+
} else {
588+
log::info!("Writer: dropped idx_parent_name ({}ms)", t.elapsed().as_millis());
589+
}
590+
}
591+
WriteMessage::RecreateNameIndex => {
592+
let t = Instant::now();
593+
if let Err(e) = conn.execute_batch(
594+
"CREATE UNIQUE INDEX IF NOT EXISTS idx_parent_name ON entries (parent_id, name)",
595+
) {
596+
log::warn!("Writer: CREATE INDEX idx_parent_name failed: {e}");
597+
} else {
598+
log::info!(
599+
"Writer: recreated idx_parent_name ({}ms)",
600+
t.elapsed().as_millis(),
601+
);
602+
}
603+
}
575604
WriteMessage::ComputeAllAggregates => {
576605
let t = Instant::now();
577606
let use_maps = !accumulator.direct_stats.is_empty();

0 commit comments

Comments
 (0)