Skip to content

Commit 432d13f

Browse files
committed
SMB watcher: bump smb2 to 0.10, drop in-watcher reconnect, keep dedicated session
`smb2 = "0.10.0"`. `Watcher` is now `'static` (owns a `Connection` clone) and keeps one CHANGE_NOTIFY pre-issued on the wire, closing the response→re-arm loss window that drops events between consecutive `next_events()` calls. In `smb_watcher.rs`: - Drop the internal reconnect-with-backoff loop. On any `next_events` error other than `NOTIFY_ENUM_DIR`, the task returns. `SmbVolume::attempt_reconnect` is now the single source of truth for session recovery — the FE backoff cycle calls it on the next mutation that hits the dead session, and it respawns the watcher. Removes the divergence risk between the two reconnect state machines (the watcher's used to swallow real disconnections the FE reconnect manager would have surfaced). - Keep the watcher on its **own** dedicated smb2 session (separate TCP from the volume's primary client). Tried sharing the volume's session via `clone_session`; empirically that wedges Samba when the watcher's CHANGE_NOTIFY long-polls multiplex with heavy concurrent writes on the same TCP — `smb_integration_concurrent_streaming_writes_no_deadlock` against `smb-consumer-maxreadsize` (64 KB max read/write, 8 concurrent × 200 × 1 MB) went from 1.8 s to 120 s timeout. The dedicated session matches the pre-0.10 isolation; the trade-off is one extra TCP+auth per share, which is fine. `spawn_watcher` keeps its `(addr, share, username, password, …)` signature for the dedicated session. CLAUDE.md updated with the empirical "shared session wedges Samba under load" rationale so future agents don't try the simplification again without re-running the deadlock test. Tested: full `./scripts/check.sh` (1983 unit + 32 integration + every linter), plus `smb_integration_concurrent_streaming_writes_no_deadlock` × 5 (all pass, 1.8–3.3 s) and `smb_integration_listing_is_watched_flips_with_connection` + both `smb_integration_attempt_reconnect_*` tests.
1 parent d00ba5b commit 432d13f

5 files changed

Lines changed: 199 additions & 231 deletions

File tree

Cargo.lock

Lines changed: 15 additions & 15 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

apps/desktop/src-tauri/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ mdns-sd = { version = "0.19", features = ["logging"] }
170170
# SMB2/3 protocol client for share enumeration (pure Rust, pipelined I/O).
171171
# When bumping: also re-vendor the test containers per
172172
# apps/desktop/test/smb-servers/.compose/VENDORED.md
173-
smb2 = "0.9.1"
173+
smb2 = "0.10.0"
174174
# NFD normalization for APFS collation and SMB path normalization
175175
unicode-normalization = "0.1"
176176

apps/desktop/src-tauri/src/file_system/volume/CLAUDE.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Every file system operation (listing, copy, rename, delete, indexing, watching)
1717
| `local_posix.rs` | `LocalPosixVolume`: real filesystem; delegates listing to `file_system::listing`, indexing to `indexing::scanner`, watching to `indexing::watcher` (FSEvents), copy scanning via `walkdir`. Uses `libc::statvfs` FFI for space info. |
1818
| `mtp.rs` | `MtpVolume`: MTP device storage; async `Volume` trait with direct async MTP calls. Uses `MtpReadStream` for streaming (calls `FileDownload::next_chunk().await` directly). Gated with `#[cfg(any(target_os = "macos", target_os = "linux"))]`. |
1919
| `smb.rs` | `SmbVolume`: SMB share storage; async `Volume` trait with direct async smb2 calls. Splits session storage into `Arc<Mutex<Option<SmbClient>>>` + `Arc<RwLock<Option<Arc<Tree>>>>` so the hot read/write paths can clone `Connection` under a brief lock and drive compound / download ops without serializing on the client mutex. `AtomicU8` connection state. Caches `SmbConnectionParams` (host, share, port, credentials) so `attempt_reconnect` can rebuild the session in place after a transient disconnect, single-flighted via `reconnect_lock`. Holds a global `AppHandle` (`set_app_handle` in `lib.rs::setup`) for emitting `smb-connection-changed` events. Also contains `connect_smb_volume()`. Gated with `#[cfg(any(target_os = "macos", target_os = "linux"))]`. |
20-
| `smb_watcher.rs` | Background SMB change watcher (`run_smb_watcher`). Owns a dedicated smb2 connection for `CHANGE_NOTIFY`, debounces events, feeds `notify_directory_changed`. Spawned by `connect_smb_volume()`. |
20+
| `smb_watcher.rs` | Background SMB change watcher (`run_smb_watcher`). Owns a dedicated smb2 session (separate TCP connection from the volume's primary client) and uses smb2 0.10's `'static` `Watcher` with pipelined CHANGE_NOTIFY (one request kept pre-issued on the wire so events arriving during consumer processing don't fall in a re-arm gap). Debounces events, feeds `notify_directory_changed`. Spawned by `connect_smb_volume()` and respawned by `attempt_reconnect`. No internal reconnect — bails on `next_events` errors and lets `attempt_reconnect` handle session recovery. |
2121
| `in_memory.rs` | `InMemoryVolume`: `RwLock<HashMap>` store for tests; also used for stress tests (`with_file_count`) |
2222

2323
## Architecture
@@ -436,11 +436,14 @@ spawned detached task. This is safe because the stream always lives in an async
436436
**Decision**: `on_unmount()` trait method instead of `Any` downcasting
437437
**Why**: Avoids runtime type checking, extensible for future volume types (S3, FTP might also need cleanup), consistent with the trait's design of optional methods with default no-ops.
438438

439-
**Decision**: SmbVolume background watcher uses a dedicated smb2 connection, not the main one
440-
**Why**: `smb2::Watcher<'a>` borrows `&'a mut Connection` for its lifetime (long-poll blocks until server reports changes). Using the main client would block all file operations. The watcher task owns its own `SmbClient` + `Tree`, and stats new/modified files through the main client via `VolumeManager::get(volume_id)`.
439+
**Decision**: SmbVolume background watcher runs on a dedicated smb2 session, not a clone of the volume's main connection
440+
**Why**: smb2 0.10 made `Watcher` `'static` (owns a `Connection` clone), so technically the watcher could share the volume's session via `clone_session`. Empirically it can't: stacking the watcher's CHANGE_NOTIFY long-polls on the same TCP session as heavy concurrent writes wedges Samba — `smb_integration_concurrent_streaming_writes_no_deadlock` hangs against `smb-consumer-maxreadsize` (64 KB max read/write, 8 concurrent writers, 200 × 1 MB files). The dedicated session keeps the watcher's traffic out of the writers' way at the cost of a separate TCP+auth, which is the same shape we had pre-0.10. What we *do* keep from the new API: the watcher is now `'static` (no borrow on the watcher task's `client`), and the pipelining (one CHANGE_NOTIFY pre-issued so events during consumer processing don't fall in a re-arm gap). Stat calls for new/modified files still go through `VolumeManager::get(volume_id).get_metadata(...)` (the main session), so the cmdr-side `notify_mutation` cache patch from our own writes lands first regardless.
441441

442442
**Decision**: Watcher task is not stored on `SmbVolume`, only the cancel sender is
443-
**Why**: `Watcher<'a>` borrows `&'a mut Connection`. Storing both the client and watcher on the struct would require self-referential types. Instead, the `tokio::spawn`ed task owns the client, creates the watcher, and runs the loop. The `watcher_cancel: Mutex<Option<oneshot::Sender<()>>>` on the struct provides clean shutdown.
443+
**Why**: The spawned task owns its own `Watcher` and `SmbClient`. Storing them on the struct alongside the cancel sender would just duplicate ownership without buying anything — `watcher.next_events()` is `&mut self`, so the task is the only thing that can drive it anyway. The `watcher_cancel: Mutex<Option<oneshot::Sender<()>>>` on the struct provides clean shutdown.
444+
445+
**Decision**: Watcher doesn't reconnect itself; it bails on connection errors (changed in 0.10 bump)
446+
**Why**: Pre-0.10 the watcher had its own reconnect-with-backoff loop, separate from `SmbVolume::attempt_reconnect`. Two state machines tracking the same "is the session alive" question is a recipe for divergence — the watcher's internal retries swallowed real disconnections the FE reconnect manager would have surfaced. New model: when `next_events` errors with anything but `NOTIFY_ENUM_DIR`, the watcher's task returns. The next hot-path op on the volume hits the dead main session, `handle_smb_result` flips to `Disconnected`, the FE backoff cycle calls `attempt_reconnect`, which respawns the watcher (with a fresh dedicated session). One reconnect path, one source of truth. The watcher's session being separate from the main session means a watcher-only failure (e.g., a TCP hiccup on the watcher's connection) doesn't surface as a volume disconnect until the next mutation; that's the trade-off for keeping the connections independent.
444447

445448
**Decision**: Watcher debounces 200ms per batch, `FullRefresh` above 50 events per directory
446449
**Why**: Prevents 1000 individual stat calls when 1000 files are copied. The 200ms window collects events that arrive in rapid succession. The 50-event threshold for `FullRefresh` avoids O(n) stat calls for bulk operations.

apps/desktop/src-tauri/src/file_system/volume/smb.rs

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -736,9 +736,23 @@ impl SmbVolume {
736736
}
737737
}
738738

739-
/// Spawns the background watcher task using the current `params`. Replaces
740-
/// any prior `watcher_cancel`. Called from `connect_smb_volume` (initial
741-
/// setup) and from `attempt_reconnect` (after a session rebuild).
739+
/// Spawns the background watcher task on its own dedicated smb2 session.
740+
/// Replaces any prior `watcher_cancel`. Called from `connect_smb_volume`
741+
/// (initial setup) and from `attempt_reconnect` (after a session rebuild).
742+
///
743+
/// We could share the volume's session with the watcher (smb2 0.10's
744+
/// `Watcher` is `'static`, owns a `Connection` clone), but in practice
745+
/// stacking the watcher's CHANGE_NOTIFY long-polls on the same TCP
746+
/// session as heavy concurrent writes wedges Samba — the
747+
/// `smb_integration_concurrent_streaming_writes_no_deadlock` test
748+
/// hangs against `smb-consumer-maxreadsize` (64 KB max read/write) when
749+
/// the watcher shares the connection. Keeping the watcher on its own
750+
/// TCP+session matches the pre-smb2-0.10 isolation; what we *do* keep
751+
/// from the new API is the pipelining (`Watcher` keeps one CHANGE_NOTIFY
752+
/// pre-issued, closing the response→re-arm loss window) and the lack
753+
/// of internal reconnect (single source of truth is
754+
/// `SmbVolume::attempt_reconnect`; the watcher bails on errors and we
755+
/// respawn here on the next successful reconnect).
742756
fn spawn_watcher(&self, params: &SmbConnectionParams) {
743757
use crate::network::smb_connection::build_smb_addr;
744758

0 commit comments

Comments
 (0)