v0.1.8 — recall that stays fast on your hardware
- No more cold-start stall. The server accepts recalls immediately on startup; while the reranker warms in the background, recall returns best-available results (flagged
degraded) instead of blocking. First-call latency drops from tens of seconds to under ~2s on every machine. - Self-calibrating to your hardware. memnos measures rerank speed at startup and sizes reranking to a latency ceiling: capable machines keep full ranking depth (no accuracy change), CPU-only machines stay responsive instead of timing out. Tunable via
MEMNOS_RERANK_BUDGET_MS/MEMNOS_RERANK_CAP;MEMNOS_RERANK=0disables reranking entirely. - Per-stage recall timings in the audit log (embed / sql / staleness / rerank) for diagnosing latency, plus a 60s query-embedding cache.
- Published benchmark: LongMemEval full-500 = 78.4% (gpt-4o answer + judge, on a competitor's open MemoryBench harness) — every prediction in
benchmarks/results/.
Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).