Commit a003f00
committed
Drive indexing: scan SMB/MTP volumes with bounded concurrency (~N× faster walks)
The Volume-trait scanner listed directories strictly one-at-a-time: pop a dir, await its full open+query+close round trips, then the next. Directory listing is latency-bound (each dir is a few LAN round trips over an otherwise-idle link), so a real NAS first-scan crawled at ~28 dirs/sec — 575k entries in 17 min and still going — purely from serialization, not the NAS or the link.
Both the fresh scan (`scan_volume_via_trait`) and the reconcile walk (`reconcile_volume_via_trait`) now keep up to `SCAN_CONCURRENCY` (32) `list_directory` round trips in flight via a `FuturesUnordered` pump. SMB2 multiplexes many in-flight requests over one session (credits + per-message IDs; `SmbVolume` already supports concurrent use), so overlapping the idle-link latency is a near-linear speedup until credits saturate — minutes-long scans drop to seconds.
Only the network I/O overlaps; result processing stays serial on the walk task, so the data-integrity guarantees hold unchanged:
- `ScanContext` id allocation (fresh) and the DB read connection + diff (reconcile) stay single-owner — no locking — and the "a dir's id is registered before its children are listed" invariant still holds (a child is enqueued only after its parent's result is processed).
- Cancel drops the in-flight set (smb2/MTP tolerate a dropped request waiter); a typed terminal disconnect stops topping up and runs the partial-preserving finish; the consecutive-failure backstop still trips on a real disconnect (failures pile up with no successes to reset the counter), now spanning up to `SCAN_CONCURRENCY` in-flight failures.
- The reconcile path resolves new-dir ids at a WAVE boundary (queue AND in-flight both drained) instead of per BFS level.
Tests: new `walk_lists_directories_concurrently` proves max-in-flight > 1, capped at `SCAN_CONCURRENCY` (a serial revert would record 1). The disconnect/backstop tests now assert a bounded stop (no full-queue churn) instead of an exact serial call count; the reconcile-correctness suite still proves the concurrent reconcile yields an index identical to a from-scratch scan. Full check green.1 parent 6c33dfb commit a003f00
2 files changed
Lines changed: 245 additions & 89 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
152 | 156 | | |
153 | 157 | | |
154 | 158 | | |
| |||
0 commit comments