You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Replay (Phase 1 of `run_replay_event_loop`) now deduplicates FSEvents by normalized path before processing, same pattern as the live event loop
- Events accumulate in a `HashMap` and flush every 1,000 raw events via `flush_replay_batch()`
- High-churn files (SQLite journals, browser caches) that generated hundreds of identical events per second now collapse to a single `symlink_metadata()` + `resolve_path()` call per batch
- Added `REPLAY_DEDUP_BATCH_SIZE` constant (1,000) matching the existing `UpdateLastEventId` cadence
- Logs dedup ratio after replay completes (for example, "deduplicated 10000 raw events to 71 unique (99% reduction)")
- Four new tests covering single-path collapse, multi-path preservation, mixed event merging, and realistic event storm scenarios
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/indexing/CLAUDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -147,7 +147,7 @@ Key test files are alongside each module (test functions within `#[cfg(test)]` b
147
147
148
148
**FSEvents `item_removed` must be verified against disk**: macOS FSEvents can deliver `item_removed` for paths that still exist (atomic file swaps by editors/git, coalesced events with OR'd flags, `merge_fs_events` discarding `item_created` when `item_removed` is present). `handle_removal()` stats the path before deleting: if the file exists, it delegates to `handle_creation_or_modification()` (upsert) instead. Without this, false removals progressively delete live entries from the DB — especially damaging for directories since `DeleteSubtreeById` is recursive. `handle_creation_or_modification()` already has the inverse pattern: if stat fails, it deletes.
149
149
150
-
**Live events are deduplicated and batched with a 1s window**: Both `run_live_event_loop` and the Phase 3 live loop in `run_replay_event_loop` (both in `event_loop.rs`) collect incoming eventsinto a `HashMap<String, FsChangeEvent>` keyed by normalized path. On each 1s flush tick, only the deduplicated set is processed through `process_live_event`. `merge_fs_events` keeps the most significant flags when events collide: `must_scan_sub_dirs` always wins, then `removed`, then `created`, then `modified`. `UpdateLastEventId` is sent once per batch (in `process_live_batch`) instead of per-event, reducing writer channel pressure during event storms.
150
+
**Events are deduplicated and batched in all modes**: Live events (both `run_live_event_loop` and Phase 3 of `run_replay_event_loop`) use a 1s flush window. Replay events (Phase 1 of `run_replay_event_loop`) use `REPLAY_DEDUP_BATCH_SIZE` (1,000 events). Both collect into a `HashMap<String, FsChangeEvent>` keyed by normalized path and flush via `merge_fs_events`. Flag priority: `must_scan_sub_dirs` always wins, then `removed`, then `created`, then `modified`. `UpdateLastEventId` is sent once per batch. The replay dedup is critical for performance: high-churn files (SQLite journals, browser caches) can generate hundreds of identical FSEvents per second; without dedup, each event triggers a `symlink_metadata()` syscall and a `resolve_path()` component walk.
151
151
152
152
**Writer-side delete-with-propagation**: Both path-keyed (`DeleteEntry`/`DeleteSubtree`) and integer-keyed (`DeleteEntryById`/`DeleteSubtreeById`) handlers in the writer automatically read old data before deleting and propagate accurate negative deltas. The integer-keyed variants use `propagate_delta_by_id` which walks the `parent_id` chain via `get_parent_id` lookups. This means every deletion -- replay, live, verification -- gets correct dir_stats updates without callers needing to send separate `PropagateDelta` messages.
0 commit comments