|
1 | 1 | # AI subsystem |
2 | 2 |
|
3 | | -Local on-device AI features powered by llama.cpp's `llama-server`. Currently used for folder name suggestions. |
| 3 | +AI features powered by local LLM (llama-server) or OpenAI-compatible APIs. Currently used for folder name suggestions. |
4 | 4 |
|
5 | | -AI requires Apple Silicon (aarch64). Intel Macs are not supported — the bundled binary is ARM64-only. |
| 5 | +Three provider modes: |
| 6 | +- **Off**: No AI features. |
| 7 | +- **OpenAI-compatible** (BYOK): Any OpenAI-compatible API. Works on any hardware. |
| 8 | +- **Local LLM**: On-device llama-server. Requires Apple Silicon (aarch64). |
6 | 9 |
|
7 | 10 | ## Key files |
8 | 11 |
|
9 | 12 | | File | Purpose | |
10 | 13 | |---|---| |
11 | | -| `mod.rs` | Types (`AiStatus`, `AiState`, `DownloadProgress`, `ModelInfo`), model registry (`AVAILABLE_MODELS`, `DEFAULT_MODEL_ID`), gate functions | |
12 | | -| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. `get_folder_suggestions` is in `suggestions.rs`. Handles startup recovery. | |
| 14 | +| `mod.rs` | Types (`AiStatus`, `AiState`, `DownloadProgress`, `ModelInfo`), model registry (`AVAILABLE_MODELS`, `DEFAULT_MODEL_ID`), `is_local_ai_supported()` gate | |
| 15 | +| `manager.rs` | Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + OpenAI config in `ManagerState`. | |
13 | 16 | | `download.rs` | HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). | |
14 | 17 | | `extract.rs` | Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. | |
15 | | -| `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. SIGTERM → 5s wait → SIGKILL. Port discovery via `bind(:0)`. | |
16 | | -| `client.rs` | reqwest client: `POST /v1/chat/completions` (15s timeout), `GET /health` (2s timeout). | |
17 | | -| `suggestions.rs` | Builds few-shot prompt from listing cache, calls LLM, sanitizes response (strips bullets/markdown/numbering, rejects `/` and `\0`, deduplicates case-insensitively, enforces 255-char limit). Also hosts `get_folder_suggestions` Tauri command. | |
| 18 | +| `process.rs` | Spawns child process with `DYLD_LIBRARY_PATH` set. SIGTERM -> 5s wait -> SIGKILL. Port discovery via `bind(:0)`. Takes `ctx_size` param. | |
| 19 | +| `client.rs` | reqwest client with `AiBackend` enum: `Local { port }` or `OpenAi { api_key, base_url, model }`. Routes requests accordingly. | |
| 20 | +| `suggestions.rs` | Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. | |
18 | 21 |
|
19 | | -### Additional Tauri commands |
| 22 | +### Tauri commands |
20 | 23 |
|
21 | | -Beyond the core start/stop/status flow, the module also exposes: `uninstall_ai`, `dismiss_ai_offer`, `opt_out_ai`, `opt_in_ai`, `is_ai_opted_out`, `get_ai_model_info`. |
| 24 | +Core: `get_ai_status`, `get_ai_model_info`, `get_ai_runtime_status`, `configure_ai`, `start_ai_server`, `stop_ai_server`, `start_ai_download`, `cancel_ai_download`, `get_folder_suggestions`. |
| 25 | +Legacy (still wired, used by toast): `uninstall_ai`, `dismiss_ai_offer`, `opt_out_ai`, `opt_in_ai`, `is_ai_opted_out`. |
22 | 26 |
|
23 | | -## Dev gate |
24 | | - |
25 | | -`use_real_ai()` returns `false` in debug builds unless `CMDR_REAL_AI=1` is set. In release builds it returns `true` on supported hardware. All Tauri commands check this at entry and return `Unavailable`/empty when false. |
26 | | - |
27 | | -## Architecture / data flow |
| 27 | +## Startup flow |
28 | 28 |
|
29 | 29 | ``` |
30 | | -Frontend manager.rs process.rs / download.rs / client.rs |
31 | | - | | |
32 | | - |-- get_ai_status --------> | |
33 | | - |<- AiStatus ───────────── | |
34 | | - | | |
35 | | - |-- start_ai_download ----> | |
36 | | - | |-- extract_bundled_llama_server() |
37 | | - |<- ai-download-progress |-- download_file() (streams, emits events) |
38 | | - |<- ai-installing |-- spawn_llama_server() |
39 | | - | |-- poll /health (up to 60s) |
40 | | - |<- ai-install-complete | |
41 | | - | | |
42 | | - |-- get_folder_suggestions | (suggestions.rs → client.rs → llama-server) |
43 | | - |<- Vec<String> | |
| 30 | +Tauri setup() |
| 31 | + -> ai::manager::init() <- sets up dirs, cleans stale PIDs. Does NOT start server. |
| 32 | +
|
| 33 | +Frontend loads |
| 34 | + -> initializeSettings() <- loads settings from tauri-plugin-store |
| 35 | + -> configureAi({ <- pushes AI config to backend |
| 36 | + provider, contextSize, |
| 37 | + openaiApiKey, openaiBaseUrl, openaiModel |
| 38 | + }) |
| 39 | + -> backend: if provider === 'local' && model installed && local AI supported |
| 40 | + -> start_server_inner(ctx_size) |
| 41 | + -> emit 'ai-server-ready' when healthy |
44 | 42 | ``` |
45 | 43 |
|
| 44 | +## Provider routing in suggestions |
| 45 | + |
| 46 | +`get_folder_suggestions` reads `provider` from `ManagerState`: |
| 47 | +- `off` -> returns empty |
| 48 | +- `local` -> uses local llama-server (if running) |
| 49 | +- `openai-compatible` -> builds `AiBackend::OpenAi` from stored config, calls `chat_completion` |
| 50 | + |
46 | 51 | ## Key patterns |
47 | 52 |
|
48 | | -- Two install flags: `AiState.installed` AND `AiState.model_download_complete` — both must be true. |
49 | | -- State persisted to `ai-state.json` in the app data dir (`~/Library/Application Support/…/ai/`). |
50 | | -- Stale PIDs from previous sessions are stopped on startup (alive → SIGTERM/SIGKILL, dead → state cleared). |
| 53 | +- Two install flags: `AiState.installed` AND `AiState.model_download_complete` -- both must be true. |
| 54 | +- State persisted to `ai-state.json` in the app data dir (`~/Library/Application Support/.../ai/`). |
| 55 | +- Stale PIDs from previous sessions are stopped on startup (alive -> SIGTERM/SIGKILL, dead -> state cleared). |
51 | 56 | - Stale partial downloads (>24 hours) cleaned up at startup. |
52 | 57 | - Binary re-extraction is possible if model exists but binary is missing. |
53 | 58 | - Download guard: `download_in_progress` flag prevents concurrent downloads. |
54 | 59 | - Server logs written to `llama-server.log` in the AI dir for debugging. |
| 60 | +- `opted_out` field in `AiState` is legacy. `ai.provider` in frontend settings store is the source of truth. |
| 61 | +- OpenAI config (api_key, base_url, model) stored in `ManagerState` so suggestions.rs can read without settings files. |
| 62 | +- `configure_ai` is idempotent -- frontend calls it on startup and whenever any AI setting changes. |
| 63 | +- `ModelInfo` includes `kv_bytes_per_token` and `base_overhead_bytes` for frontend memory estimation. |
55 | 64 |
|
56 | 65 | ## Adding a new model |
57 | 66 |
|
58 | 67 | 1. Find the GGUF on HuggingFace. |
59 | 68 | 2. Get exact file size: `curl -sIL "<url>" | grep -i content-length` |
60 | | -3. Add entry to `AVAILABLE_MODELS` in `mod.rs`. |
| 69 | +3. Add entry to `AVAILABLE_MODELS` in `mod.rs` (including `kv_bytes_per_token` and `base_overhead_bytes`). |
61 | 70 | 4. Update `DEFAULT_MODEL_ID` if it should be the new default. |
62 | 71 |
|
63 | 72 | ## Key decisions |
64 | 73 |
|
65 | 74 | **Decision**: Global `Mutex<Option<ManagerState>>` singleton instead of Tauri managed state. |
66 | | -**Why**: AI state needs to be accessed from both Tauri commands and internal init/shutdown paths. Tauri managed state requires an `AppHandle` to access, but `shutdown()` is called from the quit handler where threading constraints make it simpler to use a plain global. The `Option` allows lazy init — `None` until `init()` runs. |
| 75 | +**Why**: AI state needs to be accessed from both Tauri commands and internal init/shutdown paths. Tauri managed state requires an `AppHandle` to access, but `shutdown()` is called from the quit handler where threading constraints make it simpler to use a plain global. The `Option` allows lazy init -- `None` until `init()` runs. |
67 | 76 |
|
68 | 77 | **Decision**: Two separate install flags (`installed` + `model_download_complete`) rather than a single boolean. |
69 | 78 | **Why**: The download can be interrupted (crash, cancel, network loss). A partial 2 GB file on disk looks "installed" but is corrupt. `model_download_complete` is only set after file-size verification passes. This prevents launching llama-server with a truncated model, which would crash silently or produce garbage. |
70 | 79 |
|
71 | | -**Decision**: Dev gate via `use_real_ai()` that returns `false` in debug builds unless `CMDR_REAL_AI=1`. |
72 | | -**Why**: AI features spawn a child process, download multi-GB files, and consume GPU resources. Enabling this by default in dev would make every `cargo run` slow and resource-heavy. The env var opt-in keeps the dev loop fast while still allowing manual AI testing. |
73 | | - |
74 | | -**Decision**: Port discovery via `bind(:0)` then pass to llama-server, instead of letting llama-server pick its own port. |
75 | | -**Why**: llama-server doesn't have a reliable way to report its chosen port back to the parent. Binding port 0, reading the OS-assigned port, closing the listener, then passing it to llama-server avoids the tiny race window while keeping the architecture simple. The 100ms startup delay before the health check loop makes collisions practically impossible. |
76 | | - |
77 | | -**Decision**: Cancellation via `Fn() -> bool` parameter rather than `Arc<AtomicBool>`. |
78 | | -**Why**: `download_file` lives in a separate module from the manager's cancel state. Passing a closure (`is_cancel_requested`) decouples the download logic from the global `MANAGER` mutex — the download module doesn't need to know about `ManagerState` at all. |
79 | | - |
80 | | -**Decision**: `SIGTERM` then 5s wait then `SIGKILL` for process shutdown. |
81 | | -**Why**: llama-server may be mid-inference holding GPU memory. `SIGTERM` gives it a chance to release resources cleanly. The 5s timeout prevents hanging on app quit if the server is stuck. |
| 80 | +**Decision**: Frontend pushes AI config to backend via `configure_ai` -- Rust never reads settings files. |
| 81 | +**Why**: The frontend is the single source of truth for settings via `tauri-plugin-store`. Having Rust also read `settings.json` directly would create a second reader with potential format/timing mismatches. |
82 | 82 |
|
83 | | -**Decision**: `shutdown()` called from both `on_window_event` (CloseRequested/Destroyed) and `RunEvent::Exit`. |
84 | | -**Why**: `on_window_event` handles normal quit, but force-quit/crash/SIGTERM bypass it. `RunEvent::Exit` fires on app-level exit regardless of how it was triggered. `shutdown()` is idempotent (`child_pid.take()` returns `None` on subsequent calls), so double-calling is safe. |
| 83 | +**Decision**: `init()` only sets up directories and cleans stale PIDs. Server start is deferred to `configure_ai`. |
| 84 | +**Why**: The frontend needs to load settings before the backend knows which provider to use. The ~500ms delay is negligible. |
85 | 85 |
|
86 | | -**Decision**: Context window (`-c 4096`) explicitly set on llama-server. |
87 | | -**Why**: Without `-c`, llama-server defaults to the model's trained max context (256K for Ministral), creating a ~27 GB KV cache. Folder suggestions need at most 2K context. 4K is generous and keeps memory under ~400 MB. |
| 86 | +**Decision**: Port discovery via `bind(:0)` then pass to llama-server, instead of letting llama-server pick its own port. |
| 87 | +**Why**: llama-server doesn't have a reliable way to report its chosen port back to the parent. |
88 | 88 |
|
89 | 89 | **Decision**: Bundle pre-extracted individual binaries in `resources/ai/` instead of a `.tar.gz` archive. |
90 | | -**Why**: Apple notarization inspects inside archives and rejects unsigned binaries. By extracting and signing at build time (in the Go download script when `APPLE_SIGNING_IDENTITY` is set), each binary is individually codesigned with hardened runtime + secure timestamp. This also removes the `tar` and `flate2` Rust dependencies — `extract.rs` just copies files instead of decompressing. |
| 90 | +**Why**: Apple notarization inspects inside archives and rejects unsigned binaries. |
91 | 91 |
|
92 | 92 | **Decision**: Suggestion sanitization strips bullets, markdown, numbering, and deduplicates case-insensitively. |
93 | | -**Why**: Small LLMs (3B params) inconsistently follow formatting instructions. The same model that returns clean `docs\ntests\n` on one prompt may return `1. **Docs**\n2. tests` on the next. Aggressive sanitization makes the output reliable regardless of LLM mood. |
| 93 | +**Why**: Small LLMs (3B params) inconsistently follow formatting instructions. |
94 | 94 |
|
95 | 95 | ## Gotchas |
96 | 96 |
|
97 | | -**Gotcha**: `tauri::async_runtime::spawn` is used in `init()` instead of `tokio::spawn`. |
98 | | -**Why**: `init()` runs during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point. |
| 97 | +**Gotcha**: `tauri::async_runtime::spawn` is used in `configure_ai` and `start_ai_server` instead of `tokio::spawn`. |
| 98 | +**Why**: These may run during Tauri setup before the tokio runtime is fully available. `tauri::async_runtime::spawn` uses Tauri's own runtime which is always ready at that point. |
| 99 | + |
| 100 | +**Gotcha**: `get_folder_suggestions` returns `Ok(Vec::new())` on AI errors, not `Err`. |
| 101 | +**Why**: AI suggestions are a nice-to-have enhancement. Returning empty gracefully hides the failure. |
99 | 102 |
|
100 | | -**Gotcha**: `get_folder_suggestions` returns `Ok(Vec::new())` on LLM errors, not `Err`. |
101 | | -**Why**: AI suggestions are a nice-to-have enhancement. Propagating errors would force the frontend to show error UI for a non-critical feature. Returning empty gracefully hides the failure — the user just sees no suggestions, same as if AI were not installed. |
| 103 | +**Gotcha**: `configure_ai` must NOT block. Server start is spawned in background via `tauri::async_runtime::spawn`. |
| 104 | +**Why**: `start_server_inner` takes 5-60s for health check polling. Blocking would freeze the frontend on startup. |
102 | 105 |
|
103 | 106 | ## Dependencies |
104 | 107 |
|
|
0 commit comments