v0.1.5 — reliable capture, deterministic proxy, field-hardened performance
The reliability release: every fix in here came from real field deployments.
Capture — the answer is the memory
- CRITICAL fix: the Claude Code Stop hook now captures BOTH speakers. Previously only the user's prompt was saved — everything the assistant said (decisions, ticket IDs, outcomes) was invisible across sessions. Assistant replies are now captured with identifiers preserved verbatim, and the extraction prompt keeps ticket keys / PR numbers / versions / URLs intact.
- New:
memnos proxy— deterministic both-sides capture for any OpenAI- or Anthropic-compatible client (Hermes, OpenClaw, Open WebUI, SDK apps): transparent relay (streaming included), BYOK (keys forwarded, never stored), fail-open (a capture problem never breaks your agent), agent-loop noise filtered (tool-call iterations, title calls, dedupe). Typed error taxonomy so you always know if a failure is the proxy, the network, or the LLM. - Session trust rails: every Claude Code session opens with a visible
memnos: memory ACTIVEstatus line (or a loud warning + fix);memnos statusshows server, proxy, capture counters, and stale/unmanaged-process detection.
Integrations
memnos agent-setupnow uniform across claude-code, claude-desktop, codex, cursor, windsurf, openclaw, hermes (one grammar;claude-setupremains an alias)- Claude Desktop: platform-aware config paths, absolute command path (fixes
spawn memnos ENOENT), and a memory skill for consistent recall/remember behavior memnos upgradere-wires previously-installed integrations automaticallymemnos autostart [--proxy]— launchd/systemd login services; the server waits for Postgres instead of crashing
Performance (measured at 52k–104k row scale)
/recall−51% latency (single-pass retrieve+rerank); raw-turn BM25 index added- New default reranker: ms-marco-MiniLM-L-6 — measured on the identical full-10 LoCoMo corpus: +6pp accuracy over the 13× larger model, 8.4× faster reranks, ~660MB less memory, 0.23s cold start
- No pool connection is ever held across LLM/embedding/CPU work (storm test: total write outage → 24/24 success, admin p95 3ms); async ingest option; statement timeouts
- Supersession writes 2,116ms → 0.9ms at 95k facts; bounded embedding cache (a 35k-write ingest now costs +26MB, was ~1.9GB); processes show as
memnos-server/memnos-proxyin ps/top - Admin console: loading/error/empty states everywhere, pagination, retry — plus faster endpoints
Benchmarks (reproducible, predictions published)
- Full LoCoMo (10 conversations, 1,542 questions, gpt-4o judge): 64–65% across three independent ingests with the new defaults (
benchmarks/results/)
memnos-sdk 0.1.5 published in lockstep.