Skip to content

v0.1.5 — reliable capture, deterministic proxy, field-hardened performance

Choose a tag to compare

@thameema thameema released this 11 Jun 05:08
· 58 commits to master since this release

The reliability release: every fix in here came from real field deployments.

Capture — the answer is the memory

  • CRITICAL fix: the Claude Code Stop hook now captures BOTH speakers. Previously only the user's prompt was saved — everything the assistant said (decisions, ticket IDs, outcomes) was invisible across sessions. Assistant replies are now captured with identifiers preserved verbatim, and the extraction prompt keeps ticket keys / PR numbers / versions / URLs intact.
  • New: memnos proxy — deterministic both-sides capture for any OpenAI- or Anthropic-compatible client (Hermes, OpenClaw, Open WebUI, SDK apps): transparent relay (streaming included), BYOK (keys forwarded, never stored), fail-open (a capture problem never breaks your agent), agent-loop noise filtered (tool-call iterations, title calls, dedupe). Typed error taxonomy so you always know if a failure is the proxy, the network, or the LLM.
  • Session trust rails: every Claude Code session opens with a visible memnos: memory ACTIVE status line (or a loud warning + fix); memnos status shows server, proxy, capture counters, and stale/unmanaged-process detection.

Integrations

  • memnos agent-setup now uniform across claude-code, claude-desktop, codex, cursor, windsurf, openclaw, hermes (one grammar; claude-setup remains an alias)
  • Claude Desktop: platform-aware config paths, absolute command path (fixes spawn memnos ENOENT), and a memory skill for consistent recall/remember behavior
  • memnos upgrade re-wires previously-installed integrations automatically
  • memnos autostart [--proxy] — launchd/systemd login services; the server waits for Postgres instead of crashing

Performance (measured at 52k–104k row scale)

  • /recall −51% latency (single-pass retrieve+rerank); raw-turn BM25 index added
  • New default reranker: ms-marco-MiniLM-L-6 — measured on the identical full-10 LoCoMo corpus: +6pp accuracy over the 13× larger model, 8.4× faster reranks, ~660MB less memory, 0.23s cold start
  • No pool connection is ever held across LLM/embedding/CPU work (storm test: total write outage → 24/24 success, admin p95 3ms); async ingest option; statement timeouts
  • Supersession writes 2,116ms → 0.9ms at 95k facts; bounded embedding cache (a 35k-write ingest now costs +26MB, was ~1.9GB); processes show as memnos-server/memnos-proxy in ps/top
  • Admin console: loading/error/empty states everywhere, pagination, retry — plus faster endpoints

Benchmarks (reproducible, predictions published)

  • Full LoCoMo (10 conversations, 1,542 questions, gpt-4o judge): 64–65% across three independent ingests with the new defaults (benchmarks/results/)

memnos-sdk 0.1.5 published in lockstep.