Skip to content

Commit b41365b

Browse files
committed
Add AI settings: off / OpenAI-compat / local
- Add AI section to settings with three-way provider toggle and rich tooltips - `SettingPasswordInput` reusable component for API keys (masked with last-4-chars reveal) - OpenAI-compatible path: API key, base URL, model fields; routes via `AiBackend` enum in Rust - Local LLM: status card, start/stop server, download/delete model with confirmation, progress bar - Context window size selector with live memory estimate, 2s debounced server restart - `configure_ai` command: frontend pushes config to backend on startup and on change - `init()` no longer auto-starts server; frontend drives lifecycle via `configure_ai` - Rename `use_real_ai()` → `is_local_ai_supported()`, only gates local operations (not OpenAI path) - Remove dev gate (`CMDR_REAL_AI`): dev mode now works like prod - Toast respects `ai.provider`: hidden when off, sets provider to `local` on download accept - Intel Macs: "Local LLM" toggle disabled with explanatory tooltip, OpenAI path works fine
1 parent cc80d28 commit b41365b

26 files changed

Lines changed: 2104 additions & 242 deletions

apps/desktop/coverage-allowlist.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,12 @@
195195
},
196196
"file-operations/delete/DeleteDialog.svelte": {
197197
"reason": "UI modal, logic tested in delete-dialog-utils.test.ts"
198+
},
199+
"settings/components/SettingPasswordInput.svelte": {
200+
"reason": "UI component, simple show/hide toggle + settings store wiring"
201+
},
202+
"settings/sections/AiSection.svelte": {
203+
"reason": "UI section, depends on Tauri commands and event listeners"
198204
}
199205
}
200206
}

apps/desktop/src-tauri/src/ai/CLAUDE.md

Lines changed: 57 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,104 +1,107 @@
11
# AI subsystem
22

3-
Local on-device AI features powered by llama.cpp's `llama-server`. Currently used for folder name suggestions.
3+
AI features powered by local LLM (llama-server) or OpenAI-compatible APIs. Currently used for folder name suggestions.
44

5-
AI requires Apple Silicon (aarch64). Intel Macs are not supported — the bundled binary is ARM64-only.
5+
Three provider modes:
6+
- **Off**: No AI features.
7+
- **OpenAI-compatible** (BYOK): Any OpenAI-compatible API. Works on any hardware.
8+
- **Local LLM**: On-device llama-server. Requires Apple Silicon (aarch64).
69

710
## Key files
811

912
| File | Purpose |
1013
|---|---|
11-
| `mod.rs` | Types (`AiStatus`, `AiState`, `DownloadProgress`, `ModelInfo`), model registry (`AVAILABLE_MODELS`, `DEFAULT_MODEL_ID`), gate functions |
12-
| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. `get_folder_suggestions` is in `suggestions.rs`. Handles startup recovery. |
14+
| `mod.rs` | Types (`AiStatus`, `AiState`, `DownloadProgress`, `ModelInfo`), model registry (`AVAILABLE_MODELS`, `DEFAULT_MODEL_ID`), `is_local_ai_supported()` gate |
15+
| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + OpenAI config in `ManagerState`. |
1316
| `download.rs` | HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). |
1417
| `extract.rs` | Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. |
15-
| `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. SIGTERM 5s wait SIGKILL. Port discovery via `bind(:0)`. |
16-
| `client.rs` | reqwest client: `POST /v1/chat/completions` (15s timeout), `GET /health` (2s timeout). |
17-
| `suggestions.rs` | Builds few-shot prompt from listing cache, calls LLM, sanitizes response (strips bullets/markdown/numbering, rejects `/` and `\0`, deduplicates case-insensitively, enforces 255-char limit). Also hosts `get_folder_suggestions` Tauri command. |
18+
| `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. SIGTERM -> 5s wait -> SIGKILL. Port discovery via `bind(:0)`. Takes `ctx_size` param. |
19+
| `client.rs` | reqwest client with `AiBackend` enum: `Local { port }` or `OpenAi { api_key, base_url, model }`. Routes requests accordingly. |
20+
| `suggestions.rs` | Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. |
1821

19-
### Additional Tauri commands
22+
### Tauri commands
2023

21-
Beyond the core start/stop/status flow, the module also exposes: `uninstall_ai`, `dismiss_ai_offer`, `opt_out_ai`, `opt_in_ai`, `is_ai_opted_out`, `get_ai_model_info`.
24+
Core: `get_ai_status`, `get_ai_model_info`, `get_ai_runtime_status`, `configure_ai`, `start_ai_server`, `stop_ai_server`, `start_ai_download`, `cancel_ai_download`, `get_folder_suggestions`.
25+
Legacy (still wired, used by toast): `uninstall_ai`, `dismiss_ai_offer`, `opt_out_ai`, `opt_in_ai`, `is_ai_opted_out`.
2226

23-
## Dev gate
24-
25-
`use_real_ai()` returns `false` in debug builds unless `CMDR_REAL_AI=1` is set. In release builds it returns `true` on supported hardware. All Tauri commands check this at entry and return `Unavailable`/empty when false.
26-
27-
## Architecture / data flow
27+
## Startup flow
2828

2929
```
30-
Frontend manager.rs process.rs / download.rs / client.rs
31-
| |
32-
|-- get_ai_status --------> |
33-
|<- AiStatus ───────────── |
34-
| |
35-
|-- start_ai_download ----> |
36-
| |-- extract_bundled_llama_server()
37-
|<- ai-download-progress |-- download_file() (streams, emits events)
38-
|<- ai-installing |-- spawn_llama_server()
39-
| |-- poll /health (up to 60s)
40-
|<- ai-install-complete |
41-
| |
42-
|-- get_folder_suggestions | (suggestions.rs → client.rs → llama-server)
43-
|<- Vec<String> |
30+
Tauri setup()
31+
-> ai::manager::init() <- sets up dirs, cleans stale PIDs. Does NOT start server.
32+
33+
Frontend loads
34+
-> initializeSettings() <- loads settings from tauri-plugin-store
35+
-> configureAi({ <- pushes AI config to backend
36+
provider, contextSize,
37+
openaiApiKey, openaiBaseUrl, openaiModel
38+
})
39+
-> backend: if provider === 'local' && model installed && local AI supported
40+
-> start_server_inner(ctx_size)
41+
-> emit 'ai-server-ready' when healthy
4442
```
4543

44+
## Provider routing in suggestions
45+
46+
`get_folder_suggestions` reads `provider` from `ManagerState`:
47+
- `off` -> returns empty
48+
- `local` -> uses local llama-server (if running)
49+
- `openai-compatible` -> builds `AiBackend::OpenAi` from stored config, calls `chat_completion`
50+
4651
## Key patterns
4752

48-
- Two install flags: `AiState.installed` AND `AiState.model_download_complete` both must be true.
49-
- State persisted to `ai-state.json` in the app data dir (`~/Library/Application Support//ai/`).
50-
- Stale PIDs from previous sessions are stopped on startup (alive SIGTERM/SIGKILL, dead state cleared).
53+
- Two install flags: `AiState.installed` AND `AiState.model_download_complete` -- both must be true.
54+
- State persisted to `ai-state.json` in the app data dir (`~/Library/Application Support/.../ai/`).
55+
- Stale PIDs from previous sessions are stopped on startup (alive -> SIGTERM/SIGKILL, dead -> state cleared).
5156
- Stale partial downloads (>24 hours) cleaned up at startup.
5257
- Binary re-extraction is possible if model exists but binary is missing.
5358
- Download guard: `download_in_progress` flag prevents concurrent downloads.
5459
- Server logs written to `llama-server.log` in the AI dir for debugging.
60+
- `opted_out` field in `AiState` is legacy. `ai.provider` in frontend settings store is the source of truth.
61+
- OpenAI config (api_key, base_url, model) stored in `ManagerState` so suggestions.rs can read without settings files.
62+
- `configure_ai` is idempotent -- frontend calls it on startup and whenever any AI setting changes.
63+
- `ModelInfo` includes `kv_bytes_per_token` and `base_overhead_bytes` for frontend memory estimation.
5564

5665
## Adding a new model
5766

5867
1. Find the GGUF on HuggingFace.
5968
2. Get exact file size: `curl -sIL "<url>" | grep -i content-length`
60-
3. Add entry to `AVAILABLE_MODELS` in `mod.rs`.
69+
3. Add entry to `AVAILABLE_MODELS` in `mod.rs` (including `kv_bytes_per_token` and `base_overhead_bytes`).
6170
4. Update `DEFAULT_MODEL_ID` if it should be the new default.
6271

6372
## Key decisions
6473

6574
**Decision**: Global `Mutex<Option<ManagerState>>` singleton instead of Tauri managed state.
66-
**Why**: AI state needs to be accessed from both Tauri commands and internal init/shutdown paths. Tauri managed state requires an `AppHandle` to access, but `shutdown()` is called from the quit handler where threading constraints make it simpler to use a plain global. The `Option` allows lazy init `None` until `init()` runs.
75+
**Why**: AI state needs to be accessed from both Tauri commands and internal init/shutdown paths. Tauri managed state requires an `AppHandle` to access, but `shutdown()` is called from the quit handler where threading constraints make it simpler to use a plain global. The `Option` allows lazy init -- `None` until `init()` runs.
6776

6877
**Decision**: Two separate install flags (`installed` + `model_download_complete`) rather than a single boolean.
6978
**Why**: The download can be interrupted (crash, cancel, network loss). A partial 2 GB file on disk looks "installed" but is corrupt. `model_download_complete` is only set after file-size verification passes. This prevents launching llama-server with a truncated model, which would crash silently or produce garbage.
7079

71-
**Decision**: Dev gate via `use_real_ai()` that returns `false` in debug builds unless `CMDR_REAL_AI=1`.
72-
**Why**: AI features spawn a child process, download multi-GB files, and consume GPU resources. Enabling this by default in dev would make every `cargo run` slow and resource-heavy. The env var opt-in keeps the dev loop fast while still allowing manual AI testing.
73-
74-
**Decision**: Port discovery via `bind(:0)` then pass to llama-server, instead of letting llama-server pick its own port.
75-
**Why**: llama-server doesn't have a reliable way to report its chosen port back to the parent. Binding port 0, reading the OS-assigned port, closing the listener, then passing it to llama-server avoids the tiny race window while keeping the architecture simple. The 100ms startup delay before the health check loop makes collisions practically impossible.
76-
77-
**Decision**: Cancellation via `Fn() -> bool` parameter rather than `Arc<AtomicBool>`.
78-
**Why**: `download_file` lives in a separate module from the manager's cancel state. Passing a closure (`is_cancel_requested`) decouples the download logic from the global `MANAGER` mutex — the download module doesn't need to know about `ManagerState` at all.
79-
80-
**Decision**: `SIGTERM` then 5s wait then `SIGKILL` for process shutdown.
81-
**Why**: llama-server may be mid-inference holding GPU memory. `SIGTERM` gives it a chance to release resources cleanly. The 5s timeout prevents hanging on app quit if the server is stuck.
80+
**Decision**: Frontend pushes AI config to backend via `configure_ai` -- Rust never reads settings files.
81+
**Why**: The frontend is the single source of truth for settings via `tauri-plugin-store`. Having Rust also read `settings.json` directly would create a second reader with potential format/timing mismatches.
8282

83-
**Decision**: `shutdown()` called from both `on_window_event` (CloseRequested/Destroyed) and `RunEvent::Exit`.
84-
**Why**: `on_window_event` handles normal quit, but force-quit/crash/SIGTERM bypass it. `RunEvent::Exit` fires on app-level exit regardless of how it was triggered. `shutdown()` is idempotent (`child_pid.take()` returns `None` on subsequent calls), so double-calling is safe.
83+
**Decision**: `init()` only sets up directories and cleans stale PIDs. Server start is deferred to `configure_ai`.
84+
**Why**: The frontend needs to load settings before the backend knows which provider to use. The ~500ms delay is negligible.
8585

86-
**Decision**: Context window (`-c 4096`) explicitly set on llama-server.
87-
**Why**: Without `-c`, llama-server defaults to the model's trained max context (256K for Ministral), creating a ~27 GB KV cache. Folder suggestions need at most 2K context. 4K is generous and keeps memory under ~400 MB.
86+
**Decision**: Port discovery via `bind(:0)` then pass to llama-server, instead of letting llama-server pick its own port.
87+
**Why**: llama-server doesn't have a reliable way to report its chosen port back to the parent.
8888

8989
**Decision**: Bundle pre-extracted individual binaries in `resources/ai/` instead of a `.tar.gz` archive.
90-
**Why**: Apple notarization inspects inside archives and rejects unsigned binaries. By extracting and signing at build time (in the Go download script when `APPLE_SIGNING_IDENTITY` is set), each binary is individually codesigned with hardened runtime + secure timestamp. This also removes the `tar` and `flate2` Rust dependencies — `extract.rs` just copies files instead of decompressing.
90+
**Why**: Apple notarization inspects inside archives and rejects unsigned binaries.
9191

9292
**Decision**: Suggestion sanitization strips bullets, markdown, numbering, and deduplicates case-insensitively.
93-
**Why**: Small LLMs (3B params) inconsistently follow formatting instructions. The same model that returns clean `docs\ntests\n` on one prompt may return `1. **Docs**\n2. tests` on the next. Aggressive sanitization makes the output reliable regardless of LLM mood.
93+
**Why**: Small LLMs (3B params) inconsistently follow formatting instructions.
9494

9595
## Gotchas
9696

97-
**Gotcha**: `tauri::async_runtime::spawn` is used in `init()` instead of `tokio::spawn`.
98-
**Why**: `init()` runs during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point.
97+
**Gotcha**: `tauri::async_runtime::spawn` is used in `configure_ai` and `start_ai_server` instead of `tokio::spawn`.
98+
**Why**: These may run during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point.
99+
100+
**Gotcha**: `get_folder_suggestions` returns `Ok(Vec::new())` on AI errors, not `Err`.
101+
**Why**: AI suggestions are a nice-to-have enhancement. Returning empty gracefully hides the failure.
99102

100-
**Gotcha**: `get_folder_suggestions` returns `Ok(Vec::new())` on LLM errors, not `Err`.
101-
**Why**: AI suggestions are a nice-to-have enhancement. Propagating errors would force the frontend to show error UI for a non-critical feature. Returning empty gracefully hides the failure — the user just sees no suggestions, same as if AI were not installed.
103+
**Gotcha**: `configure_ai` must NOT block. Server start is spawned in background via `tauri::async_runtime::spawn`.
104+
**Why**: `start_server_inner` takes 5-60s for health check polling. Blocking would freeze the frontend on startup.
102105

103106
## Dependencies
104107

apps/desktop/src-tauri/src/ai/client.rs

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
1-
//! HTTP client for the local llama-server (OpenAI-compatible API).
1+
//! HTTP client for AI chat completions (local llama-server and OpenAI-compatible APIs).
22
33
use serde::{Deserialize, Serialize};
44
use std::time::Duration;
55

6+
/// Backend target for AI requests.
7+
pub enum AiBackend {
8+
/// Local llama-server on localhost
9+
Local { port: u16 },
10+
/// OpenAI-compatible remote API
11+
OpenAi {
12+
api_key: String,
13+
base_url: String,
14+
model: String,
15+
},
16+
}
17+
618
/// Error types for AI client operations.
719
#[derive(Debug, Clone)]
820
pub enum AiError {
@@ -58,15 +70,29 @@ struct ChatChoiceMessage {
5870
content: String,
5971
}
6072

61-
/// Sends a chat completion request to the local llama-server.
73+
/// Sends a chat completion request to an AI backend (local or OpenAI-compatible).
6274
///
6375
/// Returns the assistant's response text, or an error.
64-
/// Times out after 10 seconds.
65-
pub async fn chat_completion(port: u16, prompt: &str) -> Result<String, AiError> {
66-
let url = format!("http://127.0.0.1:{port}/v1/chat/completions");
76+
pub async fn chat_completion(backend: &AiBackend, prompt: &str) -> Result<String, AiError> {
77+
let (url, model_name, auth_header) = match backend {
78+
AiBackend::Local { port } => (
79+
format!("http://127.0.0.1:{port}/v1/chat/completions"),
80+
String::from("local-model"),
81+
None,
82+
),
83+
AiBackend::OpenAi {
84+
api_key,
85+
base_url,
86+
model,
87+
} => (
88+
format!("{}/chat/completions", base_url.trim_end_matches('/')),
89+
model.clone(),
90+
Some(format!("Bearer {api_key}")),
91+
),
92+
};
6793

6894
let request_body = ChatCompletionRequest {
69-
model: String::from("local-model"), // llama-server uses whatever model it loaded
95+
model: model_name,
7096
messages: vec![
7197
ChatMessage {
7298
role: String::from("system"),
@@ -90,7 +116,12 @@ pub async fn chat_completion(port: u16, prompt: &str) -> Result<String, AiError>
90116
.build()
91117
.map_err(|e| AiError::ServerError(e.to_string()))?;
92118

93-
let response = client.post(&url).json(&request_body).send().await.map_err(|e| {
119+
let mut request = client.post(&url).json(&request_body);
120+
if let Some(auth) = auth_header {
121+
request = request.header("Authorization", auth);
122+
}
123+
124+
let response = request.send().await.map_err(|e| {
94125
if e.is_timeout() {
95126
AiError::Timeout
96127
} else if e.is_connect() {

0 commit comments

Comments
 (0)