Releases: thameema/memnos
v0.1.11 — sharper recall
Sharper recall — entity-aware disambiguation, answer-quality ranking, and both-sides capture in headless Claude Code.
Recall quality
- Entity-aware recall (#17) — recall now uses each fact's subject entity to separate semantically-adjacent but distinct subjects, so a query about one project no longer pulls in another project's notes that merely share vocabulary. Adds an optional
subjectscope to hard-filter recall to a single entity. Default-on; tune/disable withMEMNOS_RECALL_ENTITY_BOOST/MEMNOS_RECALL_ENTITY_SCOPE. - Answer-quality ranking (#2) — candidate dedup, fact-first context rendering, and a bounded fact preference so distilled answers surface ahead of verbose meta-turns for list/broad queries. Verbatim queries still lead with the raw turn.
- Deterministic ordering — stable tie-break in the hybrid-RRF queries; recall order is now identical across PostgreSQL query plans.
Capture
- Headless Claude Code (#18) —
claude -p(print mode) now captures both the user prompt and the assistant reply; previously only the user turn was saved.
Fixes
- Long/pathological queries are clamped to the embedder and reranker and return fast.
memnos statusno longer reports a false "STALE pid file" under autostart.
Quality
- LoCoMo 64–65% band held (65% on the release gate, gpt-4o judge). LongMemEval 78.4% (500q).
Upgrade: memnos upgrade (or uv tool install -U memnos).
v0.1.10 — bounded memory, never-drop capture, local extraction
Hardening release — validated on a clean Linux box, a Hermes integration, and the 16 GB host that filed #15.
- Recall memory is now bounded and returned to the OS (#15). The recall path no longer climbs to GBs and hold it — the ONNX reranker arena is bounded and freed memory is released back to the OS after each recall. Measured: a workload that previously climbed to ~2.6 GB and stuck now plateaus a few hundred MB under load and recedes when idle.
- Slow extraction never drops a write. Capture (proxy / SDK / MCP
remember) uses async writes, so a slow extraction backend can't time out and silently lose the assistant's turn — both sides of every exchange land. Higher default client timeouts too. - Fully-local extraction. Set
MEMNOS_EXTRACT_BASE_URL(+ optionalMEMNOS_EXTRACT_MODEL) to any OpenAI-compatible endpoint — Ollama, vLLM, LM Studio — and fact extraction runs locally while embeddings stay free local-384. No OpenAI key required for a fully on-device install. - Long recall queries no longer crash the Postgres FTS parser (
tsquerystack) — clamped safely. agent-setup --namespace <ns>now grants the wired token on that namespace (no more silent 403 on writes).- SDK
__version__derives from package metadata (no drift);memnos statusreports the real extraction backend.
Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).
v0.1.9 — fresh-install & capture fixes
Bug-fix release hardening the install path and agent capture (found via a clean-Linux + Hermes field test).
- Linux install on more distros. memnos now works with pgvector 0.6.0 — it feature-detects the installed pgvector and uses full-precision
vectorcolumns on <0.7,halfvecautomatically on ≥0.7. No source build needed where your distro ships 0.6. memnos proxyhandles gzip'd upstreams. Fixes a bug where a gzip-compressed response (e.g. via OpenRouter) broke both capture and a client behind the proxy. Both sides of every exchange are now reliably captured — verified end-to-end with a live agent.- Write failures are never silent. MCP write tools (
remember,memory_write, …) now raise an explicit error instead of reporting success when a write is rejected — so an agent can't believe it saved something it didn't. agent-setupwires each agent its own scoped principal + token (fixes writes silently failing with 403).
Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).
v0.1.8 — recall that stays fast on your hardware
- No more cold-start stall. The server accepts recalls immediately on startup; while the reranker warms in the background, recall returns best-available results (flagged
degraded) instead of blocking. First-call latency drops from tens of seconds to under ~2s on every machine. - Self-calibrating to your hardware. memnos measures rerank speed at startup and sizes reranking to a latency ceiling: capable machines keep full ranking depth (no accuracy change), CPU-only machines stay responsive instead of timing out. Tunable via
MEMNOS_RERANK_BUDGET_MS/MEMNOS_RERANK_CAP;MEMNOS_RERANK=0disables reranking entirely. - Per-stage recall timings in the audit log (embed / sql / staleness / rerank) for diagnosing latency, plus a 60s query-embedding cache.
- Published benchmark: LongMemEval full-500 = 78.4% (gpt-4o answer + judge, on a competitor's open MemoryBench harness) — every prediction in
benchmarks/results/.
Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).
v0.1.7 — what's true now
- Belief-change supersession on the write path: when a new memory contradicts an old one (status flips, negations, value updates like "rate limit is now 200"), the old fact is closed out with full provenance — recall returns what's true now, with the transition visible ("superseded as of ").
- Write-path dedupe: restating the same fact bumps its salience instead of stacking duplicate rows.
memnos namespace reconcile <ns>— applies the same rules to memories stored before 0.1.7 (--dry-runshows exact counts first; no LLM calls).- Smarter ranking on broad questions: "where are we with X" now leads with distilled facts; verbatim questions still surface the original conversation. Tunable/disablable via env.
- Windows installation guide (
docs/guides/windows.md); LoCoMo full-10 band updated to 64–65% with results published.
Closes #10, #11. Verified: full test suite + 3-OS CI matrix + LoCoMo full-10 held at 64–65% across the changes.
Upgrade: memnos upgrade (or uv tool upgrade memnos), then memnos restart.
v0.1.6 — the coordination release: attributed, grounded, governed
Memory as the coordination plane for multi-agent systems.
Author-attributed memory
Every memory now carries its author — stamped server-side from the authenticated token, impossible to forge from a request body. Recall and context blocks show who wrote what (- (decision, 2026-06-11, by arch-agent) ...), and /recall accepts an author filter. Multi-agent blackboard coordination: shared namespace + signed writes + webhook subscriptions = agents that hand off work with full provenance.
Grounded recall (knowledge bases)
Tag any namespace kind=knowledge (no reserved prefixes) and link it to working namespaces: memnos namespace link proj:billing cms57-docs. Recall then automatically consults linked knowledge bases — enforced by the engine, not the prompt. Links are policy, grants are permission: both required, and skipped links are visible in the response (grounded_in / links_skipped).
Typed memories + pinned constraints
memnos remember "..." --type decision|incident|constraint|skill|fact — types flow through extraction and recall filters. Constraints are always injected: type=constraint memories appear in every recall for their namespace (and granted linked KBs), rendered first as CONSTRAINT: lines regardless of the query — compliance rules your agents physically cannot forget. New "Memory feed" tab in the admin console (type badges, author, age).
Published, CI-enforced API contract
openapi.yaml — 58 operations, every one exercised and schema-validated against the real server in CI. The published spec cannot drift from the implementation. Human reference at docs/api.md with curl + SDK examples.
The memnos CLI grammar
Consistent noun–verb commands (principal create, token mint, grant add, namespace link) — every old form still works as an alias. The full CLI reference is auto-generated from the parser itself (staleness gated in CI), published at docs/cli.md and as a CLI Reference tab in the admin console. New cross-platform CI matrix (Linux/macOS/Windows) — which caught and fixed a real Windows console encoding bug before release.
memnos-sdk 0.1.6 published in lockstep.
v0.1.5 — reliable capture, deterministic proxy, field-hardened performance
The reliability release: every fix in here came from real field deployments.
Capture — the answer is the memory
- CRITICAL fix: the Claude Code Stop hook now captures BOTH speakers. Previously only the user's prompt was saved — everything the assistant said (decisions, ticket IDs, outcomes) was invisible across sessions. Assistant replies are now captured with identifiers preserved verbatim, and the extraction prompt keeps ticket keys / PR numbers / versions / URLs intact.
- New:
memnos proxy— deterministic both-sides capture for any OpenAI- or Anthropic-compatible client (Hermes, OpenClaw, Open WebUI, SDK apps): transparent relay (streaming included), BYOK (keys forwarded, never stored), fail-open (a capture problem never breaks your agent), agent-loop noise filtered (tool-call iterations, title calls, dedupe). Typed error taxonomy so you always know if a failure is the proxy, the network, or the LLM. - Session trust rails: every Claude Code session opens with a visible
memnos: memory ACTIVEstatus line (or a loud warning + fix);memnos statusshows server, proxy, capture counters, and stale/unmanaged-process detection.
Integrations
memnos agent-setupnow uniform across claude-code, claude-desktop, codex, cursor, windsurf, openclaw, hermes (one grammar;claude-setupremains an alias)- Claude Desktop: platform-aware config paths, absolute command path (fixes
spawn memnos ENOENT), and a memory skill for consistent recall/remember behavior memnos upgradere-wires previously-installed integrations automaticallymemnos autostart [--proxy]— launchd/systemd login services; the server waits for Postgres instead of crashing
Performance (measured at 52k–104k row scale)
/recall−51% latency (single-pass retrieve+rerank); raw-turn BM25 index added- New default reranker: ms-marco-MiniLM-L-6 — measured on the identical full-10 LoCoMo corpus: +6pp accuracy over the 13× larger model, 8.4× faster reranks, ~660MB less memory, 0.23s cold start
- No pool connection is ever held across LLM/embedding/CPU work (storm test: total write outage → 24/24 success, admin p95 3ms); async ingest option; statement timeouts
- Supersession writes 2,116ms → 0.9ms at 95k facts; bounded embedding cache (a 35k-write ingest now costs +26MB, was ~1.9GB); processes show as
memnos-server/memnos-proxyin ps/top - Admin console: loading/error/empty states everywhere, pagination, retry — plus faster endpoints
Benchmarks (reproducible, predictions published)
- Full LoCoMo (10 conversations, 1,542 questions, gpt-4o judge): 64–65% across three independent ingests with the new defaults (
benchmarks/results/)
memnos-sdk 0.1.5 published in lockstep.
v0.1.4 — friction-free CLI
Friction-free install & operations
Server lifecycle like any daemon
memnos start/stop/restart/status— background server with pid + log management;memnos serveremains the foreground primitive for systemd/launchd/Dockermemnos startshows live progress (first start downloads the local ONNX models ~1 GB — it tells you, tails the log, and surfaces startup errors inline)- Prominent "START THE SERVER" guidance at the end of setup
Setup
memnos setup --docker— provisions a ready pgvector Postgres container (native Postgres remains the first-class path)- Clear, actionable guidance when pgvector is missing or built for the wrong Postgres major (brew version-mismatch gotcha included)
- OpenAI-key step: hidden input, whitespace-scrubbed, format-checked, live-validated against the OpenAI API before acceptance; blank entry now confirms before locking in free local 384-d mode; key stored AES-256-GCM encrypted in the vault
- Re-running setup is safe — the schema is additive/idempotent (never wipes data)
New: memnos migrate-embeddings
- Lossless migration between local 384-d and OpenAI 1536-d embeddings — re-embeds every memory from its stored source text, swaps the column type, rebuilds the HNSW indexes, and flips the server mode.
--to {384,1536}, with cost + running-server warnings
Upgrades & versioning
memnos upgrade(--check) — detects uv/pipx/pip installs and upgrades in place- Bare
memnosandmemnos -Vshow the version; an available upgrade is hinted
Engine (since 0.1.3)
- Reranker + local embeddings on ONNX Runtime via fastembed — torch removed, install ~770 MB → ~236 MB, identical ranking (LoCoMo re-validated within the 57–61% band)
memnos-sdk 0.1.4 published in lockstep (no functional SDK changes).
memnos v0.1.3 — uv-first install docs
Docs refresh: uv is now the primary installer (uv tool install memnos for the CLI, uv pip install memnos-sdk for the library); pip/pipx are fallbacks. No engine changes. memnos + memnos-sdk both at 0.1.3.
memnos v0.1.2 — ONNX backend (no torch), ~2.4x smaller install
Reranker + local embeddings now run on ONNX Runtime (fastembed) instead of torch — install ~770MB to ~236MB. Same models, bit-identical reranking. LoCoMo re-validated: 57% (57-61% band). memnos + memnos-sdk both at 0.1.2.