You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace `getFolderSuggestions` (await-array) with `streamFolderSuggestions`
in `NewFolderDialog`. The first suggestion now appears in <500 ms; the rest
trickle in as the LLM emits them. Direct application of two design
principles: "show progress, communicate what's actually happening" and
"long operations are immediately cancelable, stopping background work too."
Local LLM is the strongest beneficiary (3B-param model on Apple Silicon
emits ~30-60 tok/s, so the perceived speedup is dramatic). Reasoning cloud
models (`gpt-5*`, `o3*`) gain almost as much — time-to-first-token can be
1-3 s, but with streaming the user sees text the moment it starts.
Architecture
- New backend: `client::chat_completion_stream` returns
`BoxStream<Result<String, AiError>>` of content chunks, filtering out
reasoning / thought-signature / tool-call chunks.
- New `StreamingSanitizer` line-buffers across chunk boundaries, runs the
existing `sanitize_one_line` per completed line, dedupes case-insensitively
against existing names + already-emitted, caps at MAX_SUGGESTIONS.
- Two new Tauri commands in `ai/suggestions.rs`:
- `stream_folder_suggestions(request_id, listing_id, current_path,
include_hidden, on_event: Channel<SuggestionStreamEvent>)`. Always
returns Ok(()); all signaling goes through the Channel.
- `cancel_folder_suggestions(request_id)`. Idempotent.
- Cancellation registry (`STREAM_CANCEL_TOKENS`) in `manager.rs` keyed by
request id, using `tokio_util::sync::CancellationToken`. Token is
registered synchronously in the command body before any await so cancel
arriving before registration is impossible.
- `tokio::select!` between the stream and the token cancels mid-stream;
dropping the genai stream closes the reqwest body, cuts billing, frees
local-LLM compute.
- Frontend: `streamFolderSuggestions(...)` returns `{ promise, cancel }`.
`NewFolderDialog.onDestroy` calls cancel — the explicit signal Tauri 2
needs because `Channel::send` is fire-and-forget and can't detect
frontend drop.
UX
- "Loading..." text gone. Replaced with a pulsing skeleton chip at the end
of the list (same dimensions; no reflow on completion). Live region
(`aria-live="polite"`) so screen readers announce each new suggestion.
- Existing names are clickable while later ones still stream. Empty
stream collapses the section silently (graceful degradation).
Tests
- 11 new sanitizer unit tests (chunked splits, dedupe, cap, halt-on-emit-
false, finish() flushing trailing-no-newline).
- 5 registry tests covering concurrent ids, idempotent double-cancel,
unknown-id no-op.
- 4 integration tests against an axum-based mock SSE server (chunk order,
empty stream, drop-mid-stream, HTTP 500 → ServerError). Wiremock can't
chunk-deliver SSE bodies; axum can.
- 5 frontend vitest cases for the dialog (incremental render, failed/
cancelled keep already-streamed visible, empty hides section, unmount
cancels).
- 4 new real-API #[ignore]-gated smokes: 3 OpenAI streaming variants
(gpt-4o-mini, gpt-5-mini, o3-mini) plus claude-3-5-haiku for Anthropic
native streaming protocol coverage. All 3 OpenAI smokes pass live.
Plan: docs/specs/ai-streaming-suggestions-plan.md (3 review rounds).
|`manager.rs`| Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + cloud-AI config (`cloud_api_key`/`cloud_base_url`/`cloud_model`). Exposes `resolve_backend() -> BackendResolution` so callers don't reinvent provider routing. |
15
+
|`manager.rs`| Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + cloud-AI config (`cloud_api_key`/`cloud_base_url`/`cloud_model`). Exposes `resolve_backend() -> BackendResolution` so callers don't reinvent provider routing. Also owns the `STREAM_CANCEL_TOKENS` registry (`register_stream`/`unregister_stream`/`cancel_stream`) for in-flight `stream_folder_suggestions` cancellation. |
16
16
|`download.rs`| HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). |
17
17
|`extract.rs`| Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. |
18
18
|`process.rs`| Spawns child process with `DYLD_LIBRARY_PATH` set. Instant SIGKILL to stop (llama-server is stateless; macOS reclaims all GPU/mmap resources). `kill_process` for fire-and-forget (quit, orphans), `kill_and_reap_in_background` for normal operation (reaps zombie in bg thread). `kill_stale_llama_servers` for belt-and-suspenders orphan cleanup by process name. Port discovery via `bind(:0)`. |
19
-
|`client.rs`|`genai`-backed chat client. `AiBackend` is a struct bundling a long-lived `genai::Client` with a model name; built via `AiBackend::local(port)` or `AiBackend::remote(api_key, base_url, model)`. The model name picks the adapter (`claude-*` → Anthropic native, `gemini-*` → Gemini native, `gpt-5*`/`*-pro`/`*-codex` → OpenAI Responses API, etc.). Auto-omits `temperature`/`top_p` for OpenAI Responses adapter and for chat-completions reasoning models (`o1*`, `o3*`, `o4*`, `chatgpt-*`, `gpt-5*` defense-in-depth) and substitutes `ReasoningEffort::Low`. Local backend forces the OpenAI adapter via a `ServiceTargetResolver` pinning endpoint to `http://127.0.0.1:<port>/v1/`. |
19
+
|`client.rs`|`genai`-backed chat client. `AiBackend` is a struct bundling a long-lived `genai::Client` with a model name; built via `AiBackend::local(port)` or `AiBackend::remote(api_key, base_url, model)`. The model name picks the adapter (`claude-*` → Anthropic native, `gemini-*` → Gemini native, `gpt-5*`/`*-pro`/`*-codex` → OpenAI Responses API, etc.). Auto-omits `temperature`/`top_p` for OpenAI Responses adapter and for chat-completions reasoning models (`o1*`, `o3*`, `o4*`, `chatgpt-*`, `gpt-5*` defense-in-depth) and substitutes `ReasoningEffort::Low`. Local backend forces the OpenAI adapter via a `ServiceTargetResolver` pinning endpoint to `http://127.0.0.1:<port>/v1/`. Exposes both `chat_completion` (full response) and `chat_completion_stream` (returns a `BoxStream<Result<String, AiError>>` of content chunks; reasoning/thought-signature/tool-call chunks filtered out). |
20
20
|`client_integration_test.rs`|`wiremock`-based tests covering request shape per adapter (chat completions vs Responses API), parsing, error mapping. Always run in CI. |
21
-
|`client_real_openai_test.rs`|`#[ignore]`-gated smoke tests against `api.openai.com`. Run with `OPENAI_API_KEY=$(security find-generic-password -a "$USER" -s "OPENAI_API_KEY" -w) cargo nextest run --lib --run-ignored only ai::client_real_openai_test`. Costs ~$0.001 per full run. Use after refactors that touch `client.rs`. |
22
-
|`suggestions.rs`| Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. |
21
+
|`client_streaming_test.rs`|`axum`-based SSE mock server tests for `chat_completion_stream`: chunks arrive in order, empty streams end cleanly, drop-mid-stream closes the connection, HTTP 5xx maps to `ServerError`. Always run in CI. (Wiremock can't chunk-deliver SSE bodies — see Gotchas.) |
22
+
|`client_real_openai_test.rs`|`#[ignore]`-gated smoke tests against `api.openai.com`, including streaming variants for `gpt-4o-mini`, `gpt-5-mini`, `o3-mini`. Run with `OPENAI_API_KEY=$(security find-generic-password -a "$USER" -s "OPENAI_API_KEY" -w) cargo nextest run --lib --run-ignored only ai::client_real_openai_test`. Costs ~$0.001 per full run. |
23
+
|`client_real_anthropic_test.rs`|`#[ignore]`-gated smoke tests against `api.anthropic.com` (chat + streaming variants of `claude-3-5-haiku-latest`). Anthropic's native streaming protocol differs from OpenAI's SSE shape; without this we'd only test the OpenAI lineage. Run with `ANTHROPIC_API_KEY=$(security find-generic-password -a "$USER" -s "ANTHROPIC_API_KEY" -w) cargo nextest run --lib --run-ignored only ai::client_real_anthropic_test`. |
24
+
|`suggestions.rs`| Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. Also exposes `stream_folder_suggestions` + `cancel_folder_suggestions` Tauri commands and a `StreamingSanitizer` that runs the per-line sanitizer on streamed chunks (line-buffers across chunk boundaries, dedupes case-insensitively against existing names + already-emitted, caps at `MAX_SUGGESTIONS`). |
25
+
|`suggestions_streaming_test.rs`| Tests for the `manager::register_stream`/`unregister_stream`/`cancel_stream` registry — concurrent ids don't interfere, double-cancel is idempotent, missing id is a no-op. |
23
26
24
27
### Tauri commands
25
28
26
-
Core: `get_ai_status`, `get_ai_model_info`, `get_ai_runtime_status`, `configure_ai`, `start_ai_server`, `stop_ai_server`, `check_ai_connection`, `start_ai_download`, `cancel_ai_download`, `get_folder_suggestions`. Note: `get_system_memory_info` moved to top-level `system_memory.rs`.
29
+
Core: `get_ai_status`, `get_ai_model_info`, `get_ai_runtime_status`, `configure_ai`, `start_ai_server`, `stop_ai_server`, `check_ai_connection`, `start_ai_download`, `cancel_ai_download`, `get_folder_suggestions`, `stream_folder_suggestions`, `cancel_folder_suggestions`. Note: `get_system_memory_info` moved to top-level `system_memory.rs`.
27
30
Legacy (still wired, used by toast): `uninstall_ai`, `dismiss_ai_offer`, `opt_out_ai`, `opt_in_ai`, `is_ai_opted_out`.
28
31
29
32
## Startup flow
@@ -116,6 +119,21 @@ privacy-focused users. The architecture doesn't fight this switch — it's just
116
119
**Decision**: Use `genai` crate as the chat client instead of hand-rolled `reqwest` JSON.
117
120
**Why**: We hit two production bugs that were per-provider quirks: (1) GPT-5/o-series chat models reject any non-default `temperature` (HTTP 400), and (2) `gpt-*-pro` / `*-codex` models only respond on `/v1/responses`, not `/v1/chat/completions` (HTTP 404). Each new model adds another quirk. `genai` normalizes ~20 providers, auto-routes Responses-API models, and gives us Anthropic / Gemini / xAI / OpenRouter for free with the same code path. Tradeoff: pinned at `0.5.3` (stable, ~3 months old) with a solo maintainer; mitigated by it being MIT/Apache-2.0 + small enough to fork if needed.
118
121
122
+
**Decision**: Streaming uses `tauri::ipc::Channel<T>` per call, not the global `app.emit` pattern that downloads use.
123
+
**Why**: User can open the new-folder dialog, cancel, and reopen quickly. Two streams could overlap if we used a global event — listeners from the second open would see chunks from the first. Channel scopes the events to a single command invocation, eliminating the race. Tauri 2 docs explicitly recommend `Channel<T>` for streaming events from a command.
124
+
125
+
**Decision**: Streaming command `stream_folder_suggestions` always returns `Ok(())`; all signaling (suggestions, completion, cancellation, failure) goes through `Channel<SuggestionStreamEvent>`.
126
+
**Why**: Mixing IPC `Result<_, String>` with channel events would split the error contract. One signaling path is simpler for both Rust and TypeScript callers. `#[tauri::command]` requires the `Result` return type purely for syntactic reasons here.
127
+
128
+
**Decision**: Line-buffering and sanitization happen in Rust (`StreamingSanitizer`), not in the frontend.
129
+
**Why**: AGENTS.md principle "smart backend, thin frontend." Sanitization rules (markdown stripping, numbering detection, dedupe by case-insensitive existing-names + emit-history) are non-trivial; replicating them in TypeScript would create two authorities that drift. Frontend just renders strings.
130
+
131
+
**Decision**: Cancellation via explicit `cancel_folder_suggestions` command + `tokio_util::sync::CancellationToken`, not implicit drop detection on the Channel.
132
+
**Why**: Tauri 2's `Channel::send` is fire-and-forget into the IPC queue. It does NOT report frontend handler GC or webview destruction back to the backend. Without an explicit cancel signal, the backend would keep streaming after the user closes the dialog — billing cloud providers and pegging local-LLM compute. `CancellationToken::cancel` is itself idempotent, so the same token can be canceled by an explicit cancel call AND by an implicit `Channel::send` failure in the same tick — both succeed.
133
+
134
+
**Decision**: Cancel-token registry (`STREAM_CANCEL_TOKENS`) is a separate `LazyLock<Mutex<HashMap>>` in `manager.rs`, not part of `ManagerState`.
135
+
**Why**: Streaming task lifecycle is orthogonal to file-manager AI state. Keeping it isolated lets us drop entries on task end without holding the wider `MANAGER` lock and without inflating `ManagerState`.
136
+
119
137
## Gotchas
120
138
121
139
**Gotcha**: `genai` requires `base_url` to end with `/`. Without the trailing slash, `Url::join("chat/completions")` strips the last segment and you'd hit `https://api.openai.com/chat/completions` (404) instead of `/v1/chat/completions`. `client.rs::build_client` normalizes by appending `/` if missing.
@@ -139,8 +157,14 @@ privacy-focused users. The architecture doesn't fight this switch — it's just
139
157
**Gotcha**: `wait_for_server_health` kills the process on timeout or early death — don't remove that cleanup.
140
158
**Why**: Without it, a process that fails health check would be orphaned (PID tracked but never cleaned up until explicit stop).
141
159
160
+
**Gotcha**: `Channel::send` returns `Err` only when the webview itself is gone (window closed); it succeeds silently after the JS-side handler is GC'd. Don't rely on send failure for liveness — use the explicit `cancel_folder_suggestions` command. Send-error in the streaming-suggestion `try_emit` callback triggers the cancel token as defense-in-depth implicit cancel.
161
+
162
+
**Gotcha**: Cancel via `tokio::select!` drops the in-flight `stream.next()` future. For `genai`'s reqwest-backed SSE this is the desired terminal action — closes the connection, cuts billing. Single-poll cancel-safety is the only model we rely on; we never resume a previously-canceled stream.
163
+
164
+
**Gotcha**: `wiremock` does not chunk-deliver SSE bodies in distinct frames; it writes the whole body in one HTTP response. That gives false confidence we'd be exercising multi-chunk parse paths. `client_streaming_test.rs` uses an `axum`-based mock SSE server with `tokio::time::sleep` between frames instead.
0 commit comments