Skip to content

Commit 03215d2

Browse files
committed
File ops: real scan progress + hardlink dedup
- `DeleteDialog` and `TransferProgressDialog` scan phase shows source path, running tallies, throughput, current directory, and a real progress bar capped at 100% when the drive index covers all sources ("X% of estimated"). Falls back to tallies-only when the index doesn't cover the sources (MTP, SMB, not-yet-indexed local). - Scan titles named per op: "Verifying before copy...", "Counting items to delete...", etc. - `walk_dir_recursive` dedupes hardlinks by inode (Unix). Mirrors `indexing/scanner.rs`'s pattern: fast path for `nlink == 1`, per-operation `HashSet<u64>` shared across source roots. Fixes `target/debug` showing 70 GB during the live scan when the indexer reports 49 GB - they now agree. - New `indexing::expected_totals` reads per-source `recursive_file_count` + `recursive_logical_size` from `dir_stats` via the lock-free `ReadPool`. Returns `None` if any source isn't covered. - `WriteProgressEvent::new` + `with_scan_meta` builder keeps the 20+ active-phase emit sites unchanged; the scan emit sites set `current_dir`, `expected_files_total`, `expected_bytes_total` via the builder. - FE `ScanThroughput` computes `files/s` and `bytes/s` from tally deltas with a 2 s rolling window. - 13 new tests covering the walker (current_dir + hardlink dedup variants), the event builder, expected-totals lookup edge cases, and the throughput helper.
1 parent 3112801 commit 03215d2

14 files changed

Lines changed: 1012 additions & 46 deletions

File tree

apps/desktop/src-tauri/src/file_system/write_operations/CLAUDE.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ network mounts, cross-filesystem moves, and name/path length limits.
1616
| `types.rs` | All serializable types: events, config, errors, results. `WriteOperationConfig`, `ConflictResolution`, `WriteOperationError`, `DryRunResult`, scan preview events. Also: `OperationEventSink` trait (decouples event emission from `tauri::AppHandle`), `TauriEventSink` (production), `CollectorEventSink` (test-only). |
1717
| `state.rs` | Two `LazyLock<RwLock<HashMap>>` caches (`WRITE_OPERATION_STATE`, `OPERATION_STATUS_CACHE`). `WriteOperationState`, `CopyTransaction`, `ScanResult`, `FileInfo`. |
1818
| `helpers.rs` | Validation (`validate_sources`, `validate_destination_writable` via `libc::access`, `validate_disk_space` via `statvfs`). Conflict resolution (`tokio::sync::oneshot` channel wait for Stop mode). `safe_overwrite_file`/`safe_overwrite_dir` (temp+rename). `find_unique_name`. `run_cancellable`. `is_same_filesystem` (device IDs). Background cleanup helpers: `remove_file_in_background`, `remove_dir_all_in_background`. |
19-
| `scan.rs` | `scan_sources` (recursive walk, emits progress), `dry_run_scan`, shared `walk_dir_recursive` walker. |
20-
| `scan_preview.rs` | Scan preview subsystem for Copy dialog live stats: `start_scan_preview`, `cancel_scan_preview`, `is_scan_preview_complete`. Background scans (local and volume-based) with result caching. |
19+
| `scan.rs` | `scan_sources` (recursive walk, emits progress), `dry_run_scan`, shared `walk_dir_recursive` walker. The `on_progress` callback receives `(files, dirs, bytes, current_file, current_dir)`; the walker reads `current_dir` from `path.parent()` so the UI can show "in directory: …" alongside the filename. Scan emit sites populate `WriteProgressEvent.current_dir` plus index-derived `expected_files_total` / `expected_bytes_total` (via `WriteProgressEvent::with_scan_meta`) so the frontend renders a real progress bar during the foolproof re-scan. Expected totals come from `crate::indexing::expected_totals::expected_totals_for_sources``None` when the index doesn't cover all sources; the FE falls back to a tally-only display. |
20+
| `scan_preview.rs` | Scan preview subsystem for Copy dialog live stats: `start_scan_preview`, `cancel_scan_preview`, `is_scan_preview_complete`. Background scans (local and volume-based) with result caching. Emits `expected_files_total` / `expected_bytes_total` (sampled once at scan start from the drive index) on every `scan-preview-progress` event, alongside the running tallies and `current_dir`. |
2121
| `copy.rs` | `copy_files_with_progress`: scan → disk space check → per-file copy via `copy_single_item`. `CopyTransaction` for rollback. |
2222
| `move_op.rs` | Same-fs: `fs::rename`. Cross-fs: copy to `.cmdr-staging-<uuid>`, atomic rename, delete sources. |
2323
| `delete.rs` | Scan, delete files first, then directories in reverse/deepest-first order. Not rollbackable. Also contains `delete_volume_files_with_progress` for non-local volumes (MTP): scans via `volume.list_directory()`, deletes via `volume.delete()` per item. |
@@ -206,6 +206,12 @@ exits, partial files or staging directories may remain on disk. These use the `.
206206

207207
## Key decisions
208208

209+
**Decision**: `walk_dir_recursive` dedupes hardlinks by inode when summing `total_bytes`.
210+
**Why**: A naïve `*total_bytes += metadata.len()` per direntry over-counts on hardlink-heavy trees (cargo `target/`, sccache caches, deduplicated backups). Without dedup, a 49 GB `target/debug` reported 70+ GB to the scan UI, and the "X% of estimated" progress bar (denominator from the indexer's `dir_stats`, which already inode-dedupes) couldn't converge to 100%. Mirrors `indexing/scanner.rs`'s `seen_inodes: HashSet<u64>` pattern, with the same `nlink == 1` fast path. The set is operation-scoped (shared across all source roots in one scan, dropped when the scan ends) — so hardlinks crossing source roots still count once. **Unix-only**: `std::fs::Metadata` has no `nlink()` accessor outside Unix; non-Unix falls back to the old naïve sum. Doesn't apply to `dry_run_scan_recursive` (that path reports for conflict counts, not for a progress denominator).
211+
212+
**Decision**: `WriteProgressEvent::with_scan_meta` is the only path that sets the scan-only fields (`current_dir`, `expected_files_total`, `expected_bytes_total`).
213+
**Why**: 20+ emit sites construct `WriteProgressEvent` literals for active-phase events. Adding three optional fields to the struct would force every site to write `current_dir: None, expected_files_total: None, expected_bytes_total: None,` — pure mechanical noise. The `new(...)` constructor takes the eight core counter fields and defaults the scan meta to `None`; the scan emit sites in `scan.rs` and `scan_preview.rs` opt in via `.with_scan_meta(...)`. Future scan-related fields go through the same builder. If a real refactor of the 20 literals to `new(...)` ever happens, the builder pattern still composes cleanly on top.
214+
209215
**Decision**: `copy_volumes_with_progress` scan phase calls `scan_for_copy_batch` once instead of `scan_for_copy` per source (Phase 4 Fix 4)
210216
**Why**: Network-backed volumes (SMB) pay 1 RTT per top-level source in the scan phase. Looping over sources made that serial — for 100 tiny files at ~60 ms RTT, ~5 s of pure stat latency before the copy phase started. `scan_for_copy_batch` surfaces both the aggregate (file/dir counts, total bytes) and a per-path vec (is_directory, size) in a single trait call; the copy engine folds the per-path vec into its `source_hints` map and skips the old per-source re-stat. `SmbVolume` overrides `scan_for_copy_batch` to pipeline N stats over one SMB session — measured 6.5× wall-clock win at 100 files (6.11 s → 947 ms) on a Tailscale link. `LocalPosixVolume` / `InMemoryVolume` inherit the default serial per-path loop; it's cheap for them. See `docs/notes/phase4-rtt-investigation.md`.
211217

0 commit comments

Comments
 (0)