Release v0.1.10 — bounded memory, never-drop capture, local extraction · thameema/memnos

Hardening release — validated on a clean Linux box, a Hermes integration, and the 16 GB host that filed #15.

Recall memory is now bounded and returned to the OS (#15). The recall path no longer climbs to GBs and hold it — the ONNX reranker arena is bounded and freed memory is released back to the OS after each recall. Measured: a workload that previously climbed to ~2.6 GB and stuck now plateaus a few hundred MB under load and recedes when idle.
Slow extraction never drops a write. Capture (proxy / SDK / MCP remember) uses async writes, so a slow extraction backend can't time out and silently lose the assistant's turn — both sides of every exchange land. Higher default client timeouts too.
Fully-local extraction. Set MEMNOS_EXTRACT_BASE_URL (+ optional MEMNOS_EXTRACT_MODEL) to any OpenAI-compatible endpoint — Ollama, vLLM, LM Studio — and fact extraction runs locally while embeddings stay free local-384. No OpenAI key required for a fully on-device install.
Long recall queries no longer crash the Postgres FTS parser (tsquery stack) — clamped safely.
agent-setup --namespace <ns> now grants the wired token on that namespace (no more silent 403 on writes).
SDK __version__ derives from package metadata (no drift); memnos status reports the real extraction backend.

Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).

Provide feedback