Skip to content

Commit e45c742

Browse files
committed
AI: Limit llama memory use by setting low context
Context window was 256k by default, now it's 4k
1 parent 39cf7b4 commit e45c742

2 files changed

Lines changed: 5 additions & 0 deletions

File tree

apps/desktop/src-tauri/src/ai/CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ Frontend manager.rs process.rs / download.rs / c
8080
**Decision**: `SIGTERM` then 5s wait then `SIGKILL` for process shutdown.
8181
**Why**: llama-server may be mid-inference holding GPU memory. `SIGTERM` gives it a chance to release resources cleanly. The 5s timeout prevents hanging on app quit if the server is stuck.
8282

83+
**Decision**: Context window (`-c 4096`) explicitly set on llama-server.
84+
**Why**: Without `-c`, llama-server defaults to the model's trained max context (256K for Ministral), creating a ~27 GB KV cache. Folder suggestions need at most 2K context. 4K is generous and keeps memory under ~400 MB.
85+
8386
**Decision**: Bundle pre-extracted individual binaries in `resources/ai/` instead of a `.tar.gz` archive.
8487
**Why**: Apple notarization inspects inside archives and rejects unsigned binaries. By extracting and signing at build time (in the Go download script when `APPLE_SIGNING_IDENTITY` is set), each binary is individually codesigned with hardened runtime + secure timestamp. This also removes the `tar` and `flate2` Rust dependencies — `extract.rs` just copies files instead of decompressing.
8588

apps/desktop/src-tauri/src/ai/process.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ pub fn spawn_llama_server(ai_dir: &Path, model_filename: &str, port: u16) -> Res
4848
.arg(port.to_string())
4949
.arg("--host")
5050
.arg("127.0.0.1")
51+
.arg("-c")
52+
.arg("4096") // Context window — 4K is plenty for folder suggestions, prevents 27 GB KV cache
5153
.arg("--temp")
5254
.arg("0.6")
5355
.arg("--top-p")

0 commit comments

Comments
 (0)