Skip to content

Commit 4d65dcd

Browse files
committed
Refactor: split indexing/store.rs into cohesive submodules (no behavior change)
The monolithic `store.rs` (2935 lines) was a grab-bag: schema/DDL, the `platform_case` collation, all entry/dir-stats/meta CRUD, the data types, and a 1515-line test module in one file. Split the `impl IndexStore` block (pure code movement, byte-identical method bodies verified) into a `store/` submodule by concern: - `store/mod.rs`: shared core (schema, collation, path helpers, `IndexStore` struct + `with_savepoint`, the data types, and the `tests` module). - `store/connection.rs`: open/recreate, connection factories, DB-size + status reads (`read_meta_value` promoted to `pub(super)` so the meta cluster can reach it). - `store/entries.rs`: entry-tree reads and writes (listings, lookups by id/inode/component, insert/update/rename/move/delete, counts). - `store/dir_stats.rs`: `dir_stats` reads/writes plus `recompute_min_subtree_epoch`. - `store/meta.rs`: meta-table + epoch helpers, `mark_dirs_listed`, `get_all_directory_paths`, `clear_all`. Each submodule is an `impl IndexStore { … }` over the struct in `mod.rs`, pulling shared items via `use super::*`. Only change beyond movement is the one `pub(super)` on `read_meta_value`. `indexing/CLAUDE.md` + `DETAILS.md` updated to name the new layout; `file-length` allowlist auto-dropped the gone `store.rs` entry. `mod.rs` stays over the 800-line warn (it still carries the 1515-line `tests` module); left as a warn pending a separate test-extraction pass.
1 parent a6a2f58 commit 4d65dcd

8 files changed

Lines changed: 944 additions & 911 deletions

File tree

apps/desktop/src-tauri/src/indexing/CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Per-file roles: DETAILS § "Module structure" or `codegraph_search`. The load-be
1111
- **Write path**: `writer/`, `scanner.rs` (jwalk, LOCAL only), `volume_scanner.rs` (`Volume`-trait scan, SMB/MTP),
1212
`aggregator.rs`, `reconciler.rs` + `event_loop.rs`.
1313
- **SMB / MTP / freshness**: `freshness.rs`, `smb_index.rs` / `mtp_index.rs`, `smb_watch.rs` / `mtp_watch.rs`.
14-
- **Read path**: `enrichment.rs` (`ReadPool`), `store.rs`, `verifier.rs`, `expected_totals.rs`, `pending_sizes.rs`.
14+
- **Read path**: `enrichment.rs` (`ReadPool`), `store/`, `verifier.rs`, `expected_totals.rs`, `pending_sizes.rs`.
1515
- **Support**: `partial_agg.rs`, `metadata.rs`, `firmlinks.rs`, `watcher.rs`, `memory_watchdog.rs`, `events.rs`,
1616
`retention.rs`.
1717

apps/desktop/src-tauri/src/indexing/DETAILS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The key UX win: showing directory sizes in file listings. Design history is in g
2121
- **expected_totals.rs** -- `expected_totals_for_sources()` returns the index-derived `(file count, byte total)` for a set of source paths so write operations (copy/move/delete) can render a real scan-phase progress bar before the foolproof re-scan completes. Per source: resolve_path → get_entry_by_id → if dir use `dir_stats`, if file use the entry's `logical_size`. Returns `None` if *any* source isn't covered by the index (no pool, no entry, no `dir_stats`, no `logical_size`). Partial totals would let the progress bar overshoot 100%. Uses the same `ReadPool` as enrichment for lock-free reads. Used by `scan_preview.rs` and `scan.rs` in `write_operations/`.
2222
- **event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch` (three-phase: directory creations first sorted by depth + flush, then `detect_renames_by_inode` rename pre-pass + flush, then remaining events). The rename pre-pass turns `item_renamed` events whose new path's inode already lives in the DB at a different `(parent_id, name)` into a single `MoveEntryV2`, preserving the entry's `dir_stats`. All bounded-buffer constants live here.
2323
- **events.rs** -- Tauri event payload structs (`IndexScanStartedEvent`, `IndexScanProgressEvent`, `IndexScanCompleteEvent`, `IndexDirUpdatedEvent`, `IndexReplayProgressEvent`, `IndexReplayCompleteEvent`, `IndexAggregationCompleteEvent` (payloadless unit struct), `IndexMemoryWarningEvent`), `RescanReason` enum, `emit_rescan_notification()`, IPC response types (`IndexStatusResponse`, `IndexDebugStatusResponse`). Also: `ActivityPhase` enum (Replaying/Scanning/Aggregating/Reconciling/Live/Idle), `PhaseRecord` for the phase timeline, and `DebugStats` (shared atomic counters for the debug window + phase timeline via `set_phase()`/`close_phase_with_stats()`). The event payload structs derive `tauri_specta::Event` with `#[tauri_specta(event_name = "…")]` (kebab wire name pinned, since the `…Event` suffix wouldn't kebab-case to the existing string) and emit via `payload.emit(app)`; they're registered in `ipc.rs`'s `collect_events!` and consumed via typed `on*` wrappers in `tauri-commands/indexing.ts`. The `index-aggregation-progress` payload (`AggregationProgressEvent`) lives in `writer/mod.rs`, and `search-index-ready` (`SearchIndexReadyEvent`) lives in `commands/search.rs`.
24-
- **store.rs** -- SQLite schema (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, `dir_stats` by entry_id, `meta`), `platform_case` collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite **UNIQUE** index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
24+
- **store/** -- The `IndexStore` read/write handle and SQLite schema, split into a `store/` submodule by concern. `mod.rs` holds the shared core: the schema (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, `dir_stats` by entry_id, `meta`), `platform_case` collation, DDL/pragmas/reset, the path helpers (`resolve_path`, `reconstruct_path*`), the `IndexStore` struct + `with_savepoint`, the data types (`EntryRow`, `DirStats`, `DirStatsById`, `ScanContext`, `IndexStatus`, `ScanCalibration`, `IndexStoreError`), and the whole `tests` module. The `impl IndexStore` block is divided into four sibling files (each `impl IndexStore { … }` over the struct above, pulling shared items via `use super::*`): `connection.rs` (open/recreate, connection factories, DB-size + status reads, the `pub(super)` `read_meta_value` helper), `entries.rs` (entry-tree reads and writes: child listings, lookups by id / inode / component, insert/update/rename/move/delete, counts, `get_next_id`), `dir_stats.rs` (`dir_stats` reads and writes plus `recompute_min_subtree_epoch`), and `meta.rs` (meta-table + epoch helpers, `mark_dirs_listed`, `get_all_directory_paths`, `clear_all`). `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite **UNIQUE** index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
2525
- **metadata.rs** -- `MetadataSnapshot` struct and `extract_metadata()` function. Single location for all platform-specific metadata extraction (logical/physical size, mtime, inode, nlink). Used by scanner, reconciler, verifier, and event_loop. Symlinks get `None` everywhere. Files get sizes + inode + nlink. Directories get inode but no sizes/nlink. The inode is what the live event loop's rename pre-pass matches against to detect dir renames in place.
2626
- **memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
2727

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
//! `IndexStore` lifecycle: open/recreate, connection factories, DB-size and
2+
//! status reads. Pure code movement from the former monolithic `store.rs`.
3+
4+
use super::*;
5+
6+
impl IndexStore {
7+
/// Open (or create) the index database at `db_path`.
8+
///
9+
/// Registers the `platform_case` collation, runs WAL pragmas, creates tables
10+
/// if missing, and checks the schema version. On version mismatch or corruption
11+
/// the DB file is deleted and recreated.
12+
pub fn open(db_path: &Path) -> Result<Self, IndexStoreError> {
13+
match Self::try_open(db_path) {
14+
Ok(store) => Ok(store),
15+
Err(e) => {
16+
log::warn!("Index DB open failed ({e}), deleting and recreating");
17+
Self::delete_and_recreate(db_path)
18+
}
19+
}
20+
}
21+
22+
/// Attempt to open the DB without the delete-and-recreate fallback.
23+
fn try_open(db_path: &Path) -> Result<Self, IndexStoreError> {
24+
let conn = Connection::open(db_path)?;
25+
register_platform_case_collation(&conn)?;
26+
apply_pragmas(&conn, false)?;
27+
create_tables(&conn)?;
28+
29+
// Check schema version
30+
let version = Self::read_meta_value(&conn, "schema_version")?;
31+
match version {
32+
Some(v) if v == SCHEMA_VERSION => { /* all good */ }
33+
Some(v) => {
34+
log::warn!("Schema version mismatch (expected {SCHEMA_VERSION}, found {v}), resetting");
35+
reset_schema(&conn)?;
36+
}
37+
None => {
38+
// Fresh DB, stamp the version
39+
conn.execute(
40+
"INSERT OR REPLACE INTO meta (key, value) VALUES (?1, ?2)",
41+
params!["schema_version", SCHEMA_VERSION],
42+
)?;
43+
}
44+
}
45+
46+
Ok(Self {
47+
db_path: db_path.to_path_buf(),
48+
read_conn: conn,
49+
})
50+
}
51+
52+
/// Delete the DB file and create a fresh one.
53+
fn delete_and_recreate(db_path: &Path) -> Result<Self, IndexStoreError> {
54+
// Remove the main DB file
55+
if db_path.exists() {
56+
std::fs::remove_file(db_path)?;
57+
}
58+
// Always attempt to remove WAL and SHM sidecars (they can be stale even
59+
// if the base DB was already deleted).
60+
let wal = db_path.with_extension("db-wal");
61+
let shm = db_path.with_extension("db-shm");
62+
if wal.exists() {
63+
let _ = std::fs::remove_file(&wal);
64+
}
65+
if shm.exists() {
66+
let _ = std::fs::remove_file(&shm);
67+
}
68+
69+
let conn = Connection::open(db_path)?;
70+
register_platform_case_collation(&conn)?;
71+
apply_pragmas(&conn, false)?;
72+
create_tables(&conn)?;
73+
conn.execute(
74+
"INSERT OR REPLACE INTO meta (key, value) VALUES (?1, ?2)",
75+
params!["schema_version", SCHEMA_VERSION],
76+
)?;
77+
Ok(Self {
78+
db_path: db_path.to_path_buf(),
79+
read_conn: conn,
80+
})
81+
}
82+
83+
/// Open a separate write connection with WAL pragmas and `platform_case` collation.
84+
///
85+
/// Used by the writer thread; callers own the returned connection.
86+
pub fn open_write_connection(db_path: &Path) -> Result<Connection, IndexStoreError> {
87+
let conn = Connection::open(db_path)?;
88+
register_platform_case_collation(&conn)?;
89+
apply_pragmas(&conn, false)?;
90+
Ok(conn)
91+
}
92+
93+
/// Open a read-only connection with per-connection pragmas and `platform_case` collation.
94+
///
95+
/// Never contends with the writer thread's write lock.
96+
pub fn open_read_connection(db_path: &Path) -> Result<Connection, IndexStoreError> {
97+
let conn = Connection::open_with_flags(db_path, rusqlite::OpenFlags::SQLITE_OPEN_READ_ONLY)?;
98+
register_platform_case_collation(&conn)?;
99+
apply_pragmas(&conn, true)?;
100+
Ok(conn)
101+
}
102+
103+
/// Read all meta keys and return the index status.
104+
pub fn get_index_status(&self) -> Result<IndexStatus, IndexStoreError> {
105+
Ok(IndexStatus {
106+
schema_version: Self::read_meta_value(&self.read_conn, "schema_version")?,
107+
volume_path: Self::read_meta_value(&self.read_conn, "volume_path")?,
108+
scan_completed_at: Self::read_meta_value(&self.read_conn, "scan_completed_at")?,
109+
scan_duration_ms: Self::read_meta_value(&self.read_conn, "scan_duration_ms")?,
110+
total_entries: Self::read_meta_value(&self.read_conn, "total_entries")?,
111+
total_physical_bytes: Self::read_meta_value(&self.read_conn, "total_physical_bytes")?,
112+
last_event_id: Self::read_meta_value(&self.read_conn, "last_event_id")?,
113+
})
114+
}
115+
116+
/// Read the previous completed scan's calibration from `meta` on the given
117+
/// connection. Missing or unparseable keys map to `None`. Takes a connection
118+
/// (rather than `&self`) so `start_scan` can read it off a fresh connection
119+
/// before truncating; the keys survive `TruncateData` (it preserves `meta`).
120+
pub fn read_scan_calibration(conn: &Connection) -> Result<ScanCalibration, IndexStoreError> {
121+
let read_u64 = |key: &str| -> Result<Option<u64>, IndexStoreError> {
122+
Ok(Self::read_meta_value(conn, key)?.and_then(|v| v.parse::<u64>().ok()))
123+
};
124+
Ok(ScanCalibration {
125+
total_entries: read_u64("total_entries")?,
126+
total_physical_bytes: read_u64("total_physical_bytes")?,
127+
scan_duration_ms: read_u64("scan_duration_ms")?,
128+
})
129+
}
130+
131+
/// Return the path to the DB file.
132+
pub fn db_path(&self) -> &Path {
133+
&self.db_path
134+
}
135+
136+
/// Borrow the underlying read connection for direct queries.
137+
///
138+
/// Used by `enrich_entries_with_index` for integer-keyed lookups on the
139+
/// global read-only store. The connection is WAL-mode, so reads don't
140+
/// block the writer.
141+
pub fn read_conn(&self) -> &Connection {
142+
&self.read_conn
143+
}
144+
145+
/// Return the total DB size on disk (main file + WAL + SHM sidecars).
146+
pub fn db_file_size(&self) -> Result<u64, IndexStoreError> {
147+
let main = std::fs::metadata(&self.db_path)?.len();
148+
let wal = std::fs::metadata(format!("{}-wal", self.db_path.display()))
149+
.map(|m| m.len())
150+
.unwrap_or(0);
151+
let shm = std::fs::metadata(format!("{}-shm", self.db_path.display()))
152+
.map(|m| m.len())
153+
.unwrap_or(0);
154+
Ok(main + wal + shm)
155+
}
156+
157+
/// Return the main DB file size (excluding WAL/SHM).
158+
pub fn db_main_size(&self) -> Result<u64, IndexStoreError> {
159+
Ok(std::fs::metadata(&self.db_path)?.len())
160+
}
161+
162+
/// Return the WAL file size.
163+
pub fn db_wal_size(&self) -> Result<u64, IndexStoreError> {
164+
Ok(std::fs::metadata(format!("{}-wal", self.db_path.display()))
165+
.map(|m| m.len())
166+
.unwrap_or(0))
167+
}
168+
169+
/// Return SQLite page_count and freelist_count.
170+
pub fn db_page_stats(conn: &Connection) -> Result<(u64, u64), IndexStoreError> {
171+
let page_count: u64 = conn.pragma_query_value(None, "page_count", |r| r.get(0))?;
172+
let freelist: u64 = conn.pragma_query_value(None, "freelist_count", |r| r.get(0))?;
173+
Ok((page_count, freelist))
174+
}
175+
176+
// ── Internal helpers ─────────────────────────────────────────────
177+
178+
/// Read a single value from the meta table.
179+
pub(super) fn read_meta_value(conn: &Connection, key: &str) -> Result<Option<String>, IndexStoreError> {
180+
let mut stmt = conn.prepare_cached("SELECT value FROM meta WHERE key = ?1")?;
181+
let mut rows = stmt.query_map(params![key], |row| row.get::<_, String>(0))?;
182+
match rows.next() {
183+
Some(Ok(val)) => Ok(Some(val)),
184+
Some(Err(e)) => Err(e.into()),
185+
None => Ok(None),
186+
}
187+
}
188+
}

0 commit comments

Comments
 (0)