Skip to content

Commit ba20ca3

Browse files
committed
Write ops: oracle-aware scan walker + scan-preview reuse (M2a)
- Add `scan_subtree_with_oracle` in `scan.rs`: oracle-first recursion that falls through to `volume.list_directory` on miss, skips recursion into symlinked dirs, respects cancellation. - Add `walk_cached_entries` helper inside `walk_dir_recursive`: when `volume_id` is passed and the oracle reports a watcher-backed listing, hydrate that level from cache instead of `fs::read_dir`. Local-FS scan paths now thread `Some("root")` through so an open pane's FSEvents-watched listing short-circuits the disk read. - Rewrite `run_volume_scan_preview` around `run_oracle_aware_batch_scan`: group input sources by parent dir, check the oracle per parent. On hit, build the per-path `BatchScanResult` slice from cached entries (size + `is_directory` for top-level files; recurse via `scan_subtree_with_oracle` for top-level dirs). On miss, fall through to the existing `volume.scan_for_copy_batch_with_progress` so MTP's parent-grouping and SMB's pipelined-stat optimizations still run for cold-cache parents. - Thread `source_volume_id: String` through `start_scan_preview` and the Tauri command wrapper. - Add 5 integration tests in a sibling `scan_preview_oracle_tests.rs` (volume_copy_tests.rs was already past its file-length allowlist): oracle hit skips `list_directory`, watcher-dead falls through, subfolder cached in another pane is reused, cached symlinked dirs aren't recursed into, listing closed mid-walk falls through to a real list. - MTP `scan_for_copy_batch_with_progress` and SMB scan override left unchanged: M2a only adds an oracle short-circuit before the volume call. - Docs: Decision entry "Scan preview reuses watched listings" in `write_operations/CLAUDE.md`, plus two Gotchas (hardlink dedup across the cache/walk boundary, volume-disconnect mid-walk race). `volume/CLAUDE.md` Tier 3 now mentions `listing_is_watched` and what it gates.
1 parent 9d43463 commit ba20ca3

7 files changed

Lines changed: 859 additions & 17 deletions

File tree

apps/desktop/src-tauri/src/commands/file_system/write_ops.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,15 @@ pub async fn start_scan_preview(
301301
};
302302

303303
let progress_interval = progress_interval_ms.unwrap_or(500);
304-
ops_start_scan_preview(app, sources, source_volume, sort_column, sort_order, progress_interval)
304+
ops_start_scan_preview(
305+
app,
306+
sources,
307+
source_volume,
308+
volume_id,
309+
sort_column,
310+
sort_order,
311+
progress_interval,
312+
)
305313
}
306314

307315
#[tauri::command]

apps/desktop/src-tauri/src/file_system/volume/CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ Everything below is optional per the trait (methods default to `Err(NotSupported
8484
### Tier 3: integrate with the wider app (optional, but mostly expected)
8585

8686
- [ ] If the backend has its own change-notification channel, set `supports_watching() = true` and implement a watcher task that calls `notify_directory_changed` when things move. If you rely on the OS mount's FSEvents (like SmbVolume currently does), leave it `false`.
87+
- [ ] Implement `listing_is_watched(path)` if your backend can answer "is the cached listing for this path being kept in sync by a live watcher?" cheaply. Returning `true` from this gate opts the volume into the fresh-listing oracle: write-op pre-flight scans (copy/move scan preview) reuse cached entries from `LISTING_CACHE` for any path your volume reports as watched, skipping the redundant `list_directory` round-trip. Default `false` is the safe choice — without a real watcher, the cache may be arbitrarily stale. Path-level (LocalPosixVolume) is the most accurate signal; volume-level (MTP "device connected", SMB "Direct + watcher running") is fine when the underlying notification channel is volume-wide. Be honest about the per-backend debounce window in the doc comment; see `try_get_watched_listing` for the freshness contract.
8788
- [ ] If `std::fs` operations work on the volume's paths (you're a local FS with extra flavor), leave `supports_local_fs_access()` at the default `true`. Otherwise override to `false` so the legacy synthetic-diff path is skipped.
8889
- [ ] If `std::fs::copy` can target this volume's paths directly, return `Some(root)` from `local_path()`. The copy path will prefer `copyfile(3)` / `copy_file_range(2)` for same-device copies. Otherwise return `None` (the default).
8990
- [ ] Override `space_poll_interval()` to whatever polling cadence your backend can afford (local 2 s, network 5 s, none = don't poll).

apps/desktop/src-tauri/src/file_system/write_operations/CLAUDE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,9 @@ exits, partial files or staging directories may remain on disk. These use the `.
230230
**Decision**: `copy_files_with_progress_inner` aligns `scan_result.files` to the driver's `&[PathBuf]` API via a paired `Vec<&FileInfo>` and a closure-captured `slice::Iter` advanced in lock-step with the driver iteration.
231231
**Why**: The sync driver iterates a generic `&[PathBuf]`, but the local-FS copy loop needs the full `FileInfo` (for `dest_path`, `is_symlink`, `size`, and the `SourceItemTracker` key). Three alternatives were rejected: (a) indexing into `scan_result.files` by `ctx.files_done_so_far` — wrong, the cumulative counter is bytes-affecting and includes bulk-skipped files, so the index would shift; (b) extending `TransferContext` with a generic associated payload — couples the driver to local-FS specifics; (c) cloning the `FileInfo` slice for `sources` — copies on the hot path. The iterator approach is O(0) memory beyond the path vec and matches the driver's iteration order exactly (`pre_skip_paths` is empty because we pre-filter `scan_result.files` ourselves, so the driver invokes the closure once per surviving file). The `.expect()` is justified inline; if the driver ever stopped calling the closure once per source the test suite would break.
232232

233+
**Decision**: Scan preview reuses watched listings (the "fresh-listing oracle").
234+
**Why**: Pre-flight scans for copy/move on MTP (and to a lesser degree SMB and big local trees) used to duplicate work the backend already had in `LISTING_CACHE`. Selecting 135 photos in a watched `/DCIM/Camera` (~15k entries) and pressing F5 would re-list the parent dir over USB just to look up size by name — ~17 s of "Verifying before copy…" while the listing was already fresh on the pane behind the dialog. `run_volume_scan_preview` now groups input sources by parent dir and consults `try_get_watched_listing(volume_id, parent)` first. On hit, sizes and `is_directory` flags come from the cached `FileEntry` for top-level files; top-level directories recurse via `scan_subtree_with_oracle`, which re-applies the oracle at every level (so a subfolder open in another pane also short-circuits). On miss, the call falls through to `volume.scan_for_copy_batch_with_progress(paths_in_group, ...)` — same code path as before — so MTP's parent-grouping and SMB's pipelined-stat optimizations still run for cold-cache parents. The local-FS walker (`walk_dir_recursive` in `scan.rs`) also takes an oracle check at the top of each recursive call, with `volume_id = "root"` plumbed through from `scan_sources_internal` and `run_scan_preview`. The freshness contract is bright-line at the watcher boundary: no "5 seconds is fresh enough" TTL, just "the volume's `listing_is_watched(path)` returned true." See `file_system/listing/caching.rs::try_get_watched_listing` for the per-backend debounce windows that contract tolerates.
235+
233236
**Decision**: `delete_files_start` routes to either `delete_files_with_progress` (local, uses `walkdir` + `fs::remove_file`) or `delete_volume_files_with_progress` (non-local, uses `Volume` trait) based on `volume_id`.
234237
**Why**: MTP volumes can't use `walkdir` or `fs::remove_*`. Rather than refactoring the existing local delete to go through the Volume trait (which would add overhead for local ops), we keep the fast local path and add a parallel volume-aware path. Both emit identical events so the frontend progress dialog works unchanged.
235238

@@ -262,6 +265,12 @@ and `statfs.f_fstypename` for APFS. See `copy_strategy.rs` for the implementatio
262265
**Gotcha**: On macOS, never use `statvfs` alone for disk space checks; use `NSURLVolumeAvailableCapacityForImportantUsageKey`
263266
**Why**: `statvfs` reports only physically free blocks. On APFS, purgeable space (iCloud caches, APFS snapshots) can account for tens of GB that macOS will reclaim on demand. Using `statvfs` causes the "insufficient space" error to reject copies that would actually succeed, and shows a different available-space number than the status bar (which uses the NSURL API). `validate_disk_space` in `helpers.rs` calls `crate::volumes::get_volume_space()` on macOS and falls back to `statvfs` on Linux.
264267

268+
**Gotcha**: Hardlink dedup doesn't straddle the oracle/walk boundary.
269+
**Why**: `walk_dir_recursive` dedupes hardlinks by inode for `total_bytes` (so a hardlink-heavy tree like cargo's `target/` reports correct bytes-to-free). `FileEntry` doesn't carry inode, so when the oracle supplies one half of a hardlink pair from the cached listing and a real walk supplies the other half, the dedup misses and bytes get over-counted. Direction is safe: over-counting → pessimistic ETA + conservative disk-space reject, never the other way. The walker's existing `walker_dedupes_hardlinks_by_inode` test still pins the same-side dedup. If true cross-boundary dedup ever becomes worth it, add `inode: Option<u64>` to `FileEntry`; not in this milestone.
270+
271+
**Gotcha**: Volume disconnect mid-walk races with the oracle.
272+
**Why**: The oracle returns `Some(entries)` when `listing_is_watched` is true at the moment of the check. Between that read and the recursive walk consuming the entries (and then issuing real `list_directory` calls for any sub-subfolders that aren't cached), the watcher can die (cable yanked, network drop). The synthesized totals for the cached level are correct — they reflect what the listing held — but recursion into now-disconnected sub-subfolders fails per-call, and the per-file copy/delete later then hits `DeviceDisconnected`-shaped errors instead of a single "device gone" message at the scan level. Same race that `scan_for_copy_batch` already had; the oracle doesn't widen it. Documented here so future investigation knows where to look.
273+
265274
**Gotcha**: Skip-All on volume copy/move with a top-level dir conflict still skips the entire dir subtree, even after the local-FS bulk-skip fix.
266275
**Why**: `build_pre_skip_set` now excludes top-level directories so non-conflicting children inside a conflicting dir get a chance to copy. For LOCAL copy this works because `copy_files_with_progress_inner` flattens dirs to per-file `FileInfo` entries pre-loop, and per-iter conflict resolution then evaluates each child individually. For VOLUME copy/move, the driver iterates top-level paths directly, and `resolve_volume_conflict` returns `Ok(None)` (= Skip) for ANY dir-vs-dir conflict under Skip mode without recursing — so the whole subtree is still dropped. Fixing this requires teaching `resolve_volume_conflict` (or the volume-side closure) to recurse-and-merge for dir conflicts under Skip, the same way `apply_volume_conflict_resolution` already does for Overwrite. Pinned by the Playwright spec `conflict-copy.spec.ts::Copy with Skip All preserves destination files` (local-FS path, currently green) — a volume-side equivalent test would catch the residual.
267276

apps/desktop/src-tauri/src/file_system/write_operations/mod.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,8 @@ mod delete_integration_test;
368368
#[cfg(test)]
369369
mod move_integration_test;
370370
#[cfg(test)]
371+
mod scan_preview_oracle_tests;
372+
#[cfg(test)]
371373
mod tests;
372374
#[cfg(test)]
373375
mod transaction_integration_test;

0 commit comments

Comments
 (0)