Skip to content

v0.6.66b479

Choose a tag to compare

@github-actions github-actions released this 20 May 08:05
· 226 commits to main since this release

Bigger working context on bigger machines

After the b478 RAM cleanup, the chat worker's working context was a flat 8K everywhere. That's right for a 16 GB laptop but tight on a 32 GB or 64 GB box, where there's plenty of headroom for retrieved chunks, history, and a longer reply to share. b479 picks the working context based on total host RAM, so larger machines get more room automatically. 8K stays the floor; nothing on smaller hosts regresses.

Tier table

Total RAM Working context
under 16 GB 8192
16 to 32 GB 12288
32 to 64 GB 16384
64 GB and up 24576

What you'll notice

  • TUI on a 32 GB or 64 GB host. Chat-with-RAG conversations stop clipping. The retrieved chunks, the running history, and the reply all have more room to share within one turn.
  • TUI on a 16 GB or smaller host. Unchanged. Same 8K target as b478, same memory footprint.
  • Ollama and frontier models. Unchanged. Lilbee passes through to those backends; they keep using their own context settings.
  • RAM-constrained moments. The dynamic picker still clamps the target to the model's training window and to actual free RAM at chat-worker start, so the scaled target is what the picker aims for, not a guarantee.

Power-user knobs

Still tunable via LILBEE_* env vars or /settings:

  • LILBEE_CHAT_N_CTX_TARGET. Working context the picker aims for. Default is now auto-scaled by host RAM; set explicitly to override.
  • LILBEE_NUM_CTX_MAX. Explicit ceiling. Empty by default so the model's own training window is the cap; set to clamp below it on smaller hosts.
  • LILBEE_KV_CACHE_TYPE. q8_0 (default), f16, q4_0, or f32.
  • LILBEE_NUM_CTX. Pin an exact context size that wins over the picker.