You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `configure_ai` had three separate lock/unlock cycles, creating a race window where a spawned process existed but `child_pid` wasn't set yet. Rapid provider switching (Local → OpenAI → Local → OpenAI) could orphan processes that survived app quit.
- Fix: spawn + `child_pid` assignment now happen synchronously inside a single MANAGER lock. Only the health check (up to 60s) runs async.
- Split `start_server_inner` into `spawn_and_track_server` (sync) + `wait_for_server_health` (async) + `cleanup_failed_server`.
- `wait_for_server_health` now kills the process on timeout or early death instead of returning `Err` and leaving it running.
- Belt-and-suspenders: `kill_stale_llama_servers` uses `pgrep -f` to find and stop any llama-server processes from our AI directory, called on startup and before every spawn.
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/ai/CLAUDE.md
+11-4Lines changed: 11 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Three provider modes:
15
15
|`manager.rs`| Central coordinator. Global `Mutex<Option<ManagerState>>` singleton. Most Tauri commands live here. Stores provider + OpenAI config in `ManagerState`. |
16
16
|`download.rs`| HTTP streaming download with Range-based resume. Emits `ai-download-progress` events (200ms throttle). Cooperative cancellation via function parameter (`Fn() -> bool`). |
17
17
|`extract.rs`| Copies bundled `llama-server` binary + dylibs from `resources/ai/` to the AI data dir. Sets Unix permissions, handles symlinks. |
18
-
|`process.rs`| Spawns child process with `DYLD_LIBRARY_PATH` set. Instant SIGKILL to stop (llama-server is stateless; macOS reclaims all GPU/mmap resources). `kill_process` for fire-and-forget (quit, orphans), `kill_and_reap_in_background` for normal operation (reaps zombie in bg thread). Port discovery via `bind(:0)`. Takes `ctx_size` param. |
18
+
|`process.rs`| Spawns child process with `DYLD_LIBRARY_PATH` set. Instant SIGKILL to stop (llama-server is stateless; macOS reclaims all GPU/mmap resources). `kill_process` for fire-and-forget (quit, orphans), `kill_and_reap_in_background` for normal operation (reaps zombie in bg thread). `kill_stale_llama_servers` for belt-and-suspenders orphan cleanup by process name. Port discovery via `bind(:0)`. |
19
19
|`client.rs`| reqwest client with `AiBackend` enum: `Local { port }` or `OpenAi { api_key, base_url, model }`. Routes requests accordingly. |
20
20
|`suggestions.rs`| Builds few-shot prompt from listing cache, routes to configured backend, sanitizes response. |
21
21
@@ -37,7 +37,8 @@ Frontend loads
37
37
openaiApiKey, openaiBaseUrl, openaiModel
38
38
})
39
39
-> backend: if provider === 'local' && model installed && local AI supported
-> wait_for_server_health() (async — polls up to 60s)
41
42
-> emit 'ai-server-ready' when healthy
42
43
```
43
44
@@ -111,8 +112,14 @@ The frontend (`AiSection.svelte`) tracks `installStep` state and displays "Step
111
112
**Gotcha**: `get_folder_suggestions` returns `Ok(Vec::new())` on AI errors, not `Err`.
112
113
**Why**: AI suggestions are a nice-to-have enhancement. Returning empty gracefully hides the failure.
113
114
114
-
**Gotcha**: `configure_ai` must NOT block. Server start is spawned in background via `tauri::async_runtime::spawn`.
115
-
**Why**: `start_server_inner` takes 5-60s for health check polling. Blocking would freeze the frontend on startup.
115
+
**Gotcha**: `configure_ai` must NOT block. Only the health check runs async via `tauri::async_runtime::spawn`.
116
+
**Why**: Health check polling takes 5-60s. Blocking would freeze the frontend on startup.
117
+
118
+
**Gotcha**: The process spawn and `child_pid` assignment must happen synchronously inside the MANAGER lock.
119
+
**Why**: Previously, spawn happened inside an async task, creating a race window where a process existed but wasn't tracked in `child_pid`. Rapid provider switching (Local → OpenAI → Local → OpenAI) could orphan processes that survived app quit. Fixed by splitting into `spawn_and_track_server` (sync, inside lock) + `wait_for_server_health` (async).
120
+
121
+
**Gotcha**: `wait_for_server_health` kills the process on timeout or early death — don't remove that cleanup.
122
+
**Why**: Without it, a process that fails health check would be orphaned (PID tracked but never cleaned up until explicit stop).
0 commit comments