You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SMB watcher: bump smb2 to 0.10, drop in-watcher reconnect, keep dedicated session
`smb2 = "0.10.0"`. `Watcher` is now `'static` (owns a `Connection` clone) and keeps one CHANGE_NOTIFY pre-issued on the wire, closing the response→re-arm loss window that drops events between consecutive `next_events()` calls.
In `smb_watcher.rs`:
- Drop the internal reconnect-with-backoff loop. On any `next_events` error other than `NOTIFY_ENUM_DIR`, the task returns. `SmbVolume::attempt_reconnect` is now the single source of truth for session recovery — the FE backoff cycle calls it on the next mutation that hits the dead session, and it respawns the watcher. Removes the divergence risk between the two reconnect state machines (the watcher's used to swallow real disconnections the FE reconnect manager would have surfaced).
- Keep the watcher on its **own** dedicated smb2 session (separate TCP from the volume's primary client). Tried sharing the volume's session via `clone_session`; empirically that wedges Samba when the watcher's CHANGE_NOTIFY long-polls multiplex with heavy concurrent writes on the same TCP — `smb_integration_concurrent_streaming_writes_no_deadlock` against `smb-consumer-maxreadsize` (64 KB max read/write, 8 concurrent × 200 × 1 MB) went from 1.8 s to 120 s timeout. The dedicated session matches the pre-0.10 isolation; the trade-off is one extra TCP+auth per share, which is fine.
`spawn_watcher` keeps its `(addr, share, username, password, …)` signature for the dedicated session. CLAUDE.md updated with the empirical "shared session wedges Samba under load" rationale so future agents don't try the simplification again without re-running the deadlock test.
Tested: full `./scripts/check.sh` (1983 unit + 32 integration + every linter), plus `smb_integration_concurrent_streaming_writes_no_deadlock` × 5 (all pass, 1.8–3.3 s) and `smb_integration_listing_is_watched_flips_with_connection` + both `smb_integration_attempt_reconnect_*` tests.
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/file_system/volume/CLAUDE.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ Every file system operation (listing, copy, rename, delete, indexing, watching)
17
17
|`local_posix.rs`|`LocalPosixVolume`: real filesystem; delegates listing to `file_system::listing`, indexing to `indexing::scanner`, watching to `indexing::watcher` (FSEvents), copy scanning via `walkdir`. Uses `libc::statvfs` FFI for space info. |
18
18
|`mtp.rs`|`MtpVolume`: MTP device storage; async `Volume` trait with direct async MTP calls. Uses `MtpReadStream` for streaming (calls `FileDownload::next_chunk().await` directly). Gated with `#[cfg(any(target_os = "macos", target_os = "linux"))]`. |
19
19
|`smb.rs`|`SmbVolume`: SMB share storage; async `Volume` trait with direct async smb2 calls. Splits session storage into `Arc<Mutex<Option<SmbClient>>>` + `Arc<RwLock<Option<Arc<Tree>>>>` so the hot read/write paths can clone `Connection` under a brief lock and drive compound / download ops without serializing on the client mutex. `AtomicU8` connection state. Caches `SmbConnectionParams` (host, share, port, credentials) so `attempt_reconnect` can rebuild the session in place after a transient disconnect, single-flighted via `reconnect_lock`. Holds a global `AppHandle` (`set_app_handle` in `lib.rs::setup`) for emitting `smb-connection-changed` events. Also contains `connect_smb_volume()`. Gated with `#[cfg(any(target_os = "macos", target_os = "linux"))]`. |
20
-
|`smb_watcher.rs`| Background SMB change watcher (`run_smb_watcher`). Owns a dedicated smb2 connection for `CHANGE_NOTIFY`, debounces events, feeds `notify_directory_changed`. Spawned by `connect_smb_volume()`. |
20
+
|`smb_watcher.rs`| Background SMB change watcher (`run_smb_watcher`). Owns a dedicated smb2 session (separate TCP connection from the volume's primary client) and uses smb2 0.10's `'static``Watcher` with pipelined CHANGE_NOTIFY (one request kept pre-issued on the wire so events arriving during consumer processing don't fall in a re-arm gap). Debounces events, feeds `notify_directory_changed`. Spawned by `connect_smb_volume()` and respawned by `attempt_reconnect`. No internal reconnect — bails on `next_events` errors and lets `attempt_reconnect` handle session recovery. |
21
21
|`in_memory.rs`|`InMemoryVolume`: `RwLock<HashMap>` store for tests; also used for stress tests (`with_file_count`) |
22
22
23
23
## Architecture
@@ -436,11 +436,14 @@ spawned detached task. This is safe because the stream always lives in an async
436
436
**Decision**: `on_unmount()` trait method instead of `Any` downcasting
437
437
**Why**: Avoids runtime type checking, extensible for future volume types (S3, FTP might also need cleanup), consistent with the trait's design of optional methods with default no-ops.
438
438
439
-
**Decision**: SmbVolume background watcher uses a dedicated smb2 connection, not the main one
440
-
**Why**: `smb2::Watcher<'a>` borrows `&'a mut Connection` for its lifetime (long-poll blocks until server reports changes). Using the main client would block all file operations. The watcher task owns its own `SmbClient` + `Tree`, and stats new/modified files through the main client via `VolumeManager::get(volume_id)`.
439
+
**Decision**: SmbVolume background watcher runs on a dedicated smb2 session, not a clone of the volume's main connection
440
+
**Why**: smb2 0.10 made `Watcher` `'static` (owns a `Connection` clone), so technically the watcher could share the volume's session via `clone_session`. Empirically it can't: stacking the watcher's CHANGE_NOTIFY long-polls on the same TCP session as heavy concurrent writes wedges Samba — `smb_integration_concurrent_streaming_writes_no_deadlock` hangs against `smb-consumer-maxreadsize` (64 KB max read/write, 8 concurrent writers, 200 × 1 MB files). The dedicated session keeps the watcher's traffic out of the writers' way at the cost of a separate TCP+auth, which is the same shape we had pre-0.10. What we *do* keep from the new API: the watcher is now `'static` (no borrow on the watcher task's `client`), and the pipelining (one CHANGE_NOTIFY pre-issued so events during consumer processing don't fall in a re-arm gap). Stat calls for new/modified files still go through `VolumeManager::get(volume_id).get_metadata(...)` (the main session), so the cmdr-side `notify_mutation` cache patch from our own writes lands first regardless.
441
441
442
442
**Decision**: Watcher task is not stored on `SmbVolume`, only the cancel sender is
443
-
**Why**: `Watcher<'a>` borrows `&'a mut Connection`. Storing both the client and watcher on the struct would require self-referential types. Instead, the `tokio::spawn`ed task owns the client, creates the watcher, and runs the loop. The `watcher_cancel: Mutex<Option<oneshot::Sender<()>>>` on the struct provides clean shutdown.
443
+
**Why**: The spawned task owns its own `Watcher` and `SmbClient`. Storing them on the struct alongside the cancel sender would just duplicate ownership without buying anything — `watcher.next_events()` is `&mut self`, so the task is the only thing that can drive it anyway. The `watcher_cancel: Mutex<Option<oneshot::Sender<()>>>` on the struct provides clean shutdown.
444
+
445
+
**Decision**: Watcher doesn't reconnect itself; it bails on connection errors (changed in 0.10 bump)
446
+
**Why**: Pre-0.10 the watcher had its own reconnect-with-backoff loop, separate from `SmbVolume::attempt_reconnect`. Two state machines tracking the same "is the session alive" question is a recipe for divergence — the watcher's internal retries swallowed real disconnections the FE reconnect manager would have surfaced. New model: when `next_events` errors with anything but `NOTIFY_ENUM_DIR`, the watcher's task returns. The next hot-path op on the volume hits the dead main session, `handle_smb_result` flips to `Disconnected`, the FE backoff cycle calls `attempt_reconnect`, which respawns the watcher (with a fresh dedicated session). One reconnect path, one source of truth. The watcher's session being separate from the main session means a watcher-only failure (e.g., a TCP hiccup on the watcher's connection) doesn't surface as a volume disconnect until the next mutation; that's the trade-off for keeping the connections independent.
444
447
445
448
**Decision**: Watcher debounces 200ms per batch, `FullRefresh` above 50 events per directory
446
449
**Why**: Prevents 1000 individual stat calls when 1000 files are copied. The 200ms window collects events that arrive in rapid succession. The 50-event threshold for `FullRefresh` avoids O(n) stat calls for bulk operations.
0 commit comments