Release v0.6.66b479 · tobocop2/lilbee

Bigger working context on bigger machines

After the b478 RAM cleanup, the chat worker's working context was a flat 8K everywhere. That's right for a 16 GB laptop but tight on a 32 GB or 64 GB box, where there's plenty of headroom for retrieved chunks, history, and a longer reply to share. b479 picks the working context based on total host RAM, so larger machines get more room automatically. 8K stays the floor; nothing on smaller hosts regresses.

Tier table

Total RAM	Working context
under 16 GB	8192
16 to 32 GB	12288
32 to 64 GB	16384
64 GB and up	24576

What you'll notice

TUI on a 32 GB or 64 GB host. Chat-with-RAG conversations stop clipping. The retrieved chunks, the running history, and the reply all have more room to share within one turn.
TUI on a 16 GB or smaller host. Unchanged. Same 8K target as b478, same memory footprint.
Ollama and frontier models. Unchanged. Lilbee passes through to those backends; they keep using their own context settings.
RAM-constrained moments. The dynamic picker still clamps the target to the model's training window and to actual free RAM at chat-worker start, so the scaled target is what the picker aims for, not a guarantee.

Power-user knobs

Still tunable via LILBEE_* env vars or /settings:

LILBEE_CHAT_N_CTX_TARGET. Working context the picker aims for. Default is now auto-scaled by host RAM; set explicitly to override.
LILBEE_NUM_CTX_MAX. Explicit ceiling. Empty by default so the model's own training window is the cap; set to clamp below it on smaller hosts.
LILBEE_KV_CACHE_TYPE. q8_0 (default), f16, q4_0, or f32.
LILBEE_NUM_CTX. Pin an exact context size that wins over the picker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.6.66b479

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Bigger working context on bigger machines

Tier table

What you'll notice

Power-user knobs

Uh oh!