Skip to content

Commit 77ea6e8

Browse files
committed
Speedup: Pipeline SMB scan-phase stats (Phase 4 Fix 4)
- Override `Volume::scan_for_copy_batch` on `SmbVolume` to fan out all per-path stats over one SMB session instead of N sequential RTTs. Each path gets a cloned `smb2::Connection` under a brief client-mutex acquire (cheap `Arc::clone` — all clones multiplex over the same TCP session), then drives `tree.stat` inside a `FuturesUnordered`. - Trait change: `scan_for_copy_batch` now returns `BatchScanResult { aggregate, per_path }`. The copy engine's `source_hints` map is populated from `per_path`, saving the old per-source re-stat. `MtpVolume` override updated; default impl preserves the serial per-path loop for local backends. - Single-path batches fall through to `scan_recursive` (regression guard for drag-drop-one-file). Empty root paths skip the stat and route into the directory-recursion list directly. - Wire trace measured at N=10 on the Tailscale link: all 10 stat compounds dispatched back-to-back before any response arrives, then 10 compound reads dispatched the same way. - Bench result (100 × 10 KB, QNAP → local, ~60 ms RTT): 6.11 s → 947 ms, a 6.5× wall-clock win. Per-file: 61 ms → 9.5 ms. Effective concurrency 156 / 9.5 ≈ 16×. - Docs: Updated `volume/CLAUDE.md` + `write_operations/CLAUDE.md` with the `scan_for_copy_batch` signature change and the SMB override's rationale. Appended a "Phase 4 Fix 4 results (2026-04-21)" section to `docs/notes/phase4-rtt-investigation.md` with the bench numbers, wire trace, and effective-concurrency analysis. - Tooling: Marked `env_logger` as a `#[cfg(test)] use ... as _;` shim in `lib.rs` so the dev-dep doesn't trigger `unused_crate_dependencies` when the bench's optional `try_init` is inactive.
1 parent f7823a0 commit 77ea6e8

10 files changed

Lines changed: 511 additions & 58 deletions

File tree

apps/desktop/src-tauri/src/file_system/volume/CLAUDE.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,12 @@ spawned detached task. This is safe because the stream always lives in an async
363363
**Gotcha**: SMB write streaming fallback still holds the client mutex for the whole upload
364364
**Why**: `SmbClient::create_file_writer` / `Tree::create_file_writer` both return `FileWriter<'a>` which borrows `&'a mut Connection`. We can't release the client mutex while the writer is alive, so the streaming (large-file) write path serializes concurrent writes on one `SmbVolume`. This is acceptable because the compound fast-path in `write_from_stream` handles every file ≤ `max_write_size` (typically 1 MB on QNAP, the SMB2 spec ceiling is 8 MB) without touching the client mutex for the actual write. Large files are rare in the hot copy path; if this ever becomes a bottleneck, the fix is a future `smb2` release that exposes a `FileWriter` built from a cloned `Connection` + `Arc<Tree>` (both owned inside the writer) rather than borrowing.
365365

366+
**Decision**: `SmbVolume` overrides `scan_for_copy_batch` to pipeline per-path stats over a single SMB session
367+
**Why**: The copy pipeline's scan phase used to loop `scan_for_copy` per top-level source — N sequential RTTs on the wire before the copy phase could even start. For a 100-file copy over a ~60 ms Tailscale link that's ~5 s of serial stats. Fix 4 overrides `scan_for_copy_batch` to clone `smb2::Connection` per path under a brief client-mutex acquire (cheap `Arc::clone` — all clones multiplex over the same SMB session), release the lock, then drive `tree.stat(&mut conn, path)` on each clone inside a `FuturesUnordered`. Empty root paths skip the stat. Single-path batches fall through to `scan_recursive` so one-file drag-drops don't pay the batch machinery cost. Directories found during the stat phase recurse sequentially afterward — parallel directory recursion is a future "Fix 5". Measured 6.5× wall-clock win at 100 × 10 KB: 6.11 s → 947 ms. See `docs/notes/phase4-rtt-investigation.md` for the wire trace.
368+
369+
**Decision**: `Volume::scan_for_copy_batch` returns `BatchScanResult { aggregate, per_path }` (changed in Phase 4 Fix 4)
370+
**Why**: The copy engine needs per-source type+size hints (`is_directory`, `total_bytes`) for its `source_hints` map, which seeds conflict detection and feeds the SMB compound fast-path's size hint. Pre-Fix-4 it paid N separate `scan_for_copy` calls to collect both aggregate stats and per-path info. Returning a `BatchScanResult` lets the batch scan surface both at once — one trait call, one round-trip to each backend. Scan-preview callers that only want the aggregate just read `.aggregate`. `LocalPosixVolume` and `InMemoryVolume` inherit the default (serial per-path loop, cheap); `MtpVolume` preserves its "group by parent dir" batch; `SmbVolume` overrides with the pipelined stat path.
371+
366372
**Decision**: `SmbVolume` has a compound fast-path in `open_read_stream_with_hint` and `write_from_stream` for files ≤ `max_read_size` / `max_write_size`
367373
**Why**: The streaming open+read+close sequence costs 3 RTTs per file. For small files (typical 10 KB copies on a NAS) that dominates wall-clock at high-latency links (~60 ms RTT → ~180 ms/file just for protocol overhead, not data). `smb2` already exposes `Tree::read_file_compound` (CREATE+READ+CLOSE in a single compound frame = 1 RTT) and `Tree::write_file_compound` (CREATE+WRITE+FLUSH+CLOSE = 1 RTT). The copy pipeline feeds per-file size hints from the pre-copy scan; when the size is known and fits in one READ/WRITE, we take the compound path. Falls back cleanly to the streaming reader/writer when the hint is missing or the file is too big. Small compound reads return a `Vec<u8>` wrapped as a single-chunk `InlineReadStream` so the consumer API stays shaped the same. See `docs/notes/phase4-rtt-investigation.md` for the measurement.
368374

apps/desktop/src-tauri/src/file_system/volume/in_memory_test.rs

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -517,9 +517,10 @@ async fn test_scan_for_copy_batch_multiple_files_same_dir() {
517517
PathBuf::from("/photos/c.jpg"),
518518
];
519519
let result = volume.scan_for_copy_batch(&paths).await.unwrap();
520-
assert_eq!(result.file_count, 3);
521-
assert_eq!(result.dir_count, 0);
522-
assert_eq!(result.total_bytes, 600);
520+
assert_eq!(result.aggregate.file_count, 3);
521+
assert_eq!(result.aggregate.dir_count, 0);
522+
assert_eq!(result.aggregate.total_bytes, 600);
523+
assert_eq!(result.per_path.len(), 3);
523524
}
524525

525526
#[tokio::test]
@@ -538,18 +539,34 @@ async fn test_scan_for_copy_batch_mixed_files_and_dirs() {
538539

539540
let paths = vec![PathBuf::from("/stuff/readme.txt"), PathBuf::from("/stuff/subdir")];
540541
let result = volume.scan_for_copy_batch(&paths).await.unwrap();
541-
assert_eq!(result.file_count, 2); // readme.txt + deep.txt
542-
assert_eq!(result.dir_count, 0); // subdir's children don't include extra dirs
543-
assert_eq!(result.total_bytes, 55); // 5 + 50
542+
assert_eq!(result.aggregate.file_count, 2); // readme.txt + deep.txt
543+
assert_eq!(result.aggregate.dir_count, 0); // subdir's children don't include extra dirs
544+
assert_eq!(result.aggregate.total_bytes, 55); // 5 + 50
545+
assert_eq!(result.per_path.len(), 2);
546+
// The file entry should report top_level_is_directory=false; the dir one true.
547+
let readme = result
548+
.per_path
549+
.iter()
550+
.find(|(p, _)| p == Path::new("/stuff/readme.txt"))
551+
.unwrap();
552+
assert!(!readme.1.top_level_is_directory);
553+
assert_eq!(readme.1.total_bytes, 5);
554+
let subdir = result
555+
.per_path
556+
.iter()
557+
.find(|(p, _)| p == Path::new("/stuff/subdir"))
558+
.unwrap();
559+
assert!(subdir.1.top_level_is_directory);
544560
}
545561

546562
#[tokio::test]
547563
async fn test_scan_for_copy_batch_empty_input() {
548564
let volume = InMemoryVolume::new("Test");
549565
let result = volume.scan_for_copy_batch(&[]).await.unwrap();
550-
assert_eq!(result.file_count, 0);
551-
assert_eq!(result.dir_count, 0);
552-
assert_eq!(result.total_bytes, 0);
566+
assert_eq!(result.aggregate.file_count, 0);
567+
assert_eq!(result.aggregate.dir_count, 0);
568+
assert_eq!(result.aggregate.total_bytes, 0);
569+
assert!(result.per_path.is_empty());
553570
}
554571

555572
#[tokio::test]
@@ -563,9 +580,10 @@ async fn test_scan_for_copy_batch_single_item_matches_single_scan() {
563580
.scan_for_copy_batch(&[PathBuf::from("/docs/a.txt")])
564581
.await
565582
.unwrap();
566-
assert_eq!(single.file_count, batch.file_count);
567-
assert_eq!(single.dir_count, batch.dir_count);
568-
assert_eq!(single.total_bytes, batch.total_bytes);
583+
assert_eq!(single.file_count, batch.aggregate.file_count);
584+
assert_eq!(single.dir_count, batch.aggregate.dir_count);
585+
assert_eq!(single.total_bytes, batch.aggregate.total_bytes);
586+
assert_eq!(batch.per_path.len(), 1);
569587
}
570588

571589
#[tokio::test]
@@ -578,8 +596,9 @@ async fn test_scan_for_copy_batch_files_from_different_dirs() {
578596

579597
let paths = vec![PathBuf::from("/a/file1.txt"), PathBuf::from("/b/file2.txt")];
580598
let result = volume.scan_for_copy_batch(&paths).await.unwrap();
581-
assert_eq!(result.file_count, 2);
582-
assert_eq!(result.total_bytes, 30);
599+
assert_eq!(result.aggregate.file_count, 2);
600+
assert_eq!(result.aggregate.total_bytes, 30);
601+
assert_eq!(result.per_path.len(), 2);
583602
}
584603

585604
// ============================================================================

apps/desktop/src-tauri/src/file_system/volume/mod.rs

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,24 @@ pub struct CopyScanResult {
7272
pub top_level_is_directory: bool,
7373
}
7474

75+
/// Result of a batch scan over multiple source paths.
76+
///
77+
/// Returned by `Volume::scan_for_copy_batch`. Bundles the aggregate stats that
78+
/// the pre-flight / scan-preview callers want with a per-path breakdown that
79+
/// the copy engine uses to seed its `source_hints` map (without re-issuing N
80+
/// stat probes). `per_path[i].0` is the caller's input path verbatim; `.1`
81+
/// carries `top_level_is_directory` and `total_bytes` (for top-level files,
82+
/// that's the file size — used by the SMB compound fast-path).
83+
#[derive(Debug, Clone)]
84+
pub struct BatchScanResult {
85+
/// Aggregate stats across all input paths.
86+
pub aggregate: CopyScanResult,
87+
/// Per-input-path result, in the same order as the `paths` slice the
88+
/// caller passed in. Paths that failed to scan won't appear — on a
89+
/// per-path failure the method returns `Err` without partial data.
90+
pub per_path: Vec<(PathBuf, CopyScanResult)>,
91+
}
92+
7593
/// A conflict detected during pre-copy scanning: a source item that already exists at the destination.
7694
#[derive(Debug, Clone, Serialize, Deserialize)]
7795
#[serde(rename_all = "camelCase")]
@@ -497,33 +515,41 @@ pub trait Volume: Send + Sync {
497515
Box::pin(async { Err(VolumeError::NotSupported) })
498516
}
499517

500-
/// Scans multiple paths to get aggregate copy statistics.
518+
/// Scans multiple paths to get aggregate + per-path copy statistics.
501519
///
502520
/// The default iterates over `scan_for_copy` per path, which is correct for
503521
/// volumes where per-path I/O is cheap (local FS, in-memory). Volume types
504522
/// with expensive per-path I/O (MTP, SMB, FTP, S3) should override this to
505-
/// batch — typically by grouping paths by parent directory, listing each
506-
/// parent once, and resolving files from the listing.
523+
/// batch — typically by pipelining per-path stats over a shared session
524+
/// (SMB) or grouping paths by parent directory and listing each parent
525+
/// once (MTP).
526+
///
527+
/// The returned `BatchScanResult` carries both the rolled-up `aggregate`
528+
/// (what the scan-preview / pre-flight checks want) and a `per_path` vec
529+
/// (what the copy engine uses to seed its per-source hints, so it doesn't
530+
/// have to re-probe each source's type and size with a separate stat).
507531
fn scan_for_copy_batch<'a>(
508532
&'a self,
509533
paths: &'a [PathBuf],
510-
) -> Pin<Box<dyn Future<Output = Result<CopyScanResult, VolumeError>> + Send + 'a>> {
534+
) -> Pin<Box<dyn Future<Output = Result<BatchScanResult, VolumeError>> + Send + 'a>> {
511535
Box::pin(async move {
512-
let mut result = CopyScanResult {
536+
let mut aggregate = CopyScanResult {
513537
file_count: 0,
514538
dir_count: 0,
515539
total_bytes: 0,
516540
// Aggregate over multiple paths — meaningless for a batch.
517-
// Callers that need per-path type should use `scan_for_copy`.
541+
// Callers that need per-path type should read `per_path`.
518542
top_level_is_directory: false,
519543
};
544+
let mut per_path = Vec::with_capacity(paths.len());
520545
for path in paths {
521546
let scan = self.scan_for_copy(path).await?;
522-
result.file_count += scan.file_count;
523-
result.dir_count += scan.dir_count;
524-
result.total_bytes += scan.total_bytes;
547+
aggregate.file_count += scan.file_count;
548+
aggregate.dir_count += scan.dir_count;
549+
aggregate.total_bytes += scan.total_bytes;
550+
per_path.push((path.clone(), scan));
525551
}
526-
Ok(result)
552+
Ok(BatchScanResult { aggregate, per_path })
527553
})
528554
}
529555

apps/desktop/src-tauri/src/file_system/volume/mtp.rs

Lines changed: 39 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
//! the standard file listing pipeline (same icons, sorting, view modes as local files).
55
66
use super::{
7-
CopyScanResult, MutationEvent, ScanConflict, SourceItemInfo, SpaceInfo, Volume, VolumeError, VolumeReadStream,
7+
BatchScanResult, CopyScanResult, MutationEvent, ScanConflict, SourceItemInfo, SpaceInfo, Volume, VolumeError,
8+
VolumeReadStream,
89
};
910
use crate::file_system::listing::FileEntry;
1011
use crate::mtp::connection::{MtpConnectionError, connection_manager};
@@ -461,14 +462,17 @@ impl Volume for MtpVolume {
461462
fn scan_for_copy_batch<'a>(
462463
&'a self,
463464
paths: &'a [PathBuf],
464-
) -> Pin<Box<dyn Future<Output = Result<CopyScanResult, VolumeError>> + Send + 'a>> {
465+
) -> Pin<Box<dyn Future<Output = Result<BatchScanResult, VolumeError>> + Send + 'a>> {
465466
Box::pin(async move {
466467
if paths.is_empty() {
467-
return Ok(CopyScanResult {
468-
file_count: 0,
469-
dir_count: 0,
470-
total_bytes: 0,
471-
top_level_is_directory: false,
468+
return Ok(BatchScanResult {
469+
aggregate: CopyScanResult {
470+
file_count: 0,
471+
dir_count: 0,
472+
total_bytes: 0,
473+
top_level_is_directory: false,
474+
},
475+
per_path: Vec::new(),
472476
});
473477
}
474478

@@ -487,7 +491,12 @@ impl Volume for MtpVolume {
487491
by_parent.len()
488492
);
489493

490-
let mut result = CopyScanResult {
494+
// Stage per-path results in a map so the final per_path vec
495+
// preserves the caller's input order.
496+
let mut per_path_results: std::collections::HashMap<PathBuf, CopyScanResult> =
497+
std::collections::HashMap::with_capacity(paths.len());
498+
499+
let mut aggregate = CopyScanResult {
491500
file_count: 0,
492501
dir_count: 0,
493502
total_bytes: 0,
@@ -507,18 +516,34 @@ impl Volume for MtpVolume {
507516
if let Some(entry) = entries.iter().find(|e| e.name == name) {
508517
if entry.is_directory {
509518
let scan = self.scan_for_copy(child_path).await?;
510-
result.file_count += scan.file_count;
511-
result.dir_count += scan.dir_count;
512-
result.total_bytes += scan.total_bytes;
519+
aggregate.file_count += scan.file_count;
520+
aggregate.dir_count += scan.dir_count;
521+
aggregate.total_bytes += scan.total_bytes;
522+
per_path_results.insert((*child_path).clone(), scan);
513523
} else {
514-
result.file_count += 1;
515-
result.total_bytes += entry.size.unwrap_or(0);
524+
let size = entry.size.unwrap_or(0);
525+
aggregate.file_count += 1;
526+
aggregate.total_bytes += size;
527+
per_path_results.insert(
528+
(*child_path).clone(),
529+
CopyScanResult {
530+
file_count: 1,
531+
dir_count: 0,
532+
total_bytes: size,
533+
top_level_is_directory: false,
534+
},
535+
);
516536
}
517537
}
518538
}
519539
}
520540

521-
Ok(result)
541+
let per_path = paths
542+
.iter()
543+
.filter_map(|p| per_path_results.remove(p).map(|r| (p.clone(), r)))
544+
.collect();
545+
546+
Ok(BatchScanResult { aggregate, per_path })
522547
})
523548
}
524549

0 commit comments

Comments
 (0)