Skip to content

v0.1.8 — recall that stays fast on your hardware

Choose a tag to compare

@thameema thameema released this 12 Jun 19:15
· 33 commits to master since this release
  • No more cold-start stall. The server accepts recalls immediately on startup; while the reranker warms in the background, recall returns best-available results (flagged degraded) instead of blocking. First-call latency drops from tens of seconds to under ~2s on every machine.
  • Self-calibrating to your hardware. memnos measures rerank speed at startup and sizes reranking to a latency ceiling: capable machines keep full ranking depth (no accuracy change), CPU-only machines stay responsive instead of timing out. Tunable via MEMNOS_RERANK_BUDGET_MS / MEMNOS_RERANK_CAP; MEMNOS_RERANK=0 disables reranking entirely.
  • Per-stage recall timings in the audit log (embed / sql / staleness / rerank) for diagnosing latency, plus a 60s query-embedding cache.
  • Published benchmark: LongMemEval full-500 = 78.4% (gpt-4o answer + judge, on a competitor's open MemoryBench harness) — every prediction in benchmarks/results/.

Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).