v0.6.66b479
·
226 commits
to main
since this release
Bigger working context on bigger machines
After the b478 RAM cleanup, the chat worker's working context was a flat 8K everywhere. That's right for a 16 GB laptop but tight on a 32 GB or 64 GB box, where there's plenty of headroom for retrieved chunks, history, and a longer reply to share. b479 picks the working context based on total host RAM, so larger machines get more room automatically. 8K stays the floor; nothing on smaller hosts regresses.
Tier table
| Total RAM | Working context |
|---|---|
| under 16 GB | 8192 |
| 16 to 32 GB | 12288 |
| 32 to 64 GB | 16384 |
| 64 GB and up | 24576 |
What you'll notice
- TUI on a 32 GB or 64 GB host. Chat-with-RAG conversations stop clipping. The retrieved chunks, the running history, and the reply all have more room to share within one turn.
- TUI on a 16 GB or smaller host. Unchanged. Same 8K target as b478, same memory footprint.
- Ollama and frontier models. Unchanged. Lilbee passes through to those backends; they keep using their own context settings.
- RAM-constrained moments. The dynamic picker still clamps the target to the model's training window and to actual free RAM at chat-worker start, so the scaled target is what the picker aims for, not a guarantee.
Power-user knobs
Still tunable via LILBEE_* env vars or /settings:
LILBEE_CHAT_N_CTX_TARGET. Working context the picker aims for. Default is now auto-scaled by host RAM; set explicitly to override.LILBEE_NUM_CTX_MAX. Explicit ceiling. Empty by default so the model's own training window is the cap; set to clamp below it on smaller hosts.LILBEE_KV_CACHE_TYPE.q8_0(default),f16,q4_0, orf32.LILBEE_NUM_CTX. Pin an exact context size that wins over the picker.