Skip to content

Releases: thameema/memnos

v0.1.11 — sharper recall

15 Jun 03:25

Choose a tag to compare

Sharper recall — entity-aware disambiguation, answer-quality ranking, and both-sides capture in headless Claude Code.

Recall quality

  • Entity-aware recall (#17) — recall now uses each fact's subject entity to separate semantically-adjacent but distinct subjects, so a query about one project no longer pulls in another project's notes that merely share vocabulary. Adds an optional subject scope to hard-filter recall to a single entity. Default-on; tune/disable with MEMNOS_RECALL_ENTITY_BOOST / MEMNOS_RECALL_ENTITY_SCOPE.
  • Answer-quality ranking (#2) — candidate dedup, fact-first context rendering, and a bounded fact preference so distilled answers surface ahead of verbose meta-turns for list/broad queries. Verbatim queries still lead with the raw turn.
  • Deterministic ordering — stable tie-break in the hybrid-RRF queries; recall order is now identical across PostgreSQL query plans.

Capture

  • Headless Claude Code (#18)claude -p (print mode) now captures both the user prompt and the assistant reply; previously only the user turn was saved.

Fixes

  • Long/pathological queries are clamped to the embedder and reranker and return fast.
  • memnos status no longer reports a false "STALE pid file" under autostart.

Quality

  • LoCoMo 64–65% band held (65% on the release gate, gpt-4o judge). LongMemEval 78.4% (500q).

Upgrade: memnos upgrade (or uv tool install -U memnos).

v0.1.10 — bounded memory, never-drop capture, local extraction

13 Jun 17:52

Choose a tag to compare

Hardening release — validated on a clean Linux box, a Hermes integration, and the 16 GB host that filed #15.

  • Recall memory is now bounded and returned to the OS (#15). The recall path no longer climbs to GBs and hold it — the ONNX reranker arena is bounded and freed memory is released back to the OS after each recall. Measured: a workload that previously climbed to ~2.6 GB and stuck now plateaus a few hundred MB under load and recedes when idle.
  • Slow extraction never drops a write. Capture (proxy / SDK / MCP remember) uses async writes, so a slow extraction backend can't time out and silently lose the assistant's turn — both sides of every exchange land. Higher default client timeouts too.
  • Fully-local extraction. Set MEMNOS_EXTRACT_BASE_URL (+ optional MEMNOS_EXTRACT_MODEL) to any OpenAI-compatible endpoint — Ollama, vLLM, LM Studio — and fact extraction runs locally while embeddings stay free local-384. No OpenAI key required for a fully on-device install.
  • Long recall queries no longer crash the Postgres FTS parser (tsquery stack) — clamped safely.
  • agent-setup --namespace <ns> now grants the wired token on that namespace (no more silent 403 on writes).
  • SDK __version__ derives from package metadata (no drift); memnos status reports the real extraction backend.

Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).

v0.1.9 — fresh-install & capture fixes

13 Jun 13:55

Choose a tag to compare

Bug-fix release hardening the install path and agent capture (found via a clean-Linux + Hermes field test).

  • Linux install on more distros. memnos now works with pgvector 0.6.0 — it feature-detects the installed pgvector and uses full-precision vector columns on <0.7, halfvec automatically on ≥0.7. No source build needed where your distro ships 0.6.
  • memnos proxy handles gzip'd upstreams. Fixes a bug where a gzip-compressed response (e.g. via OpenRouter) broke both capture and a client behind the proxy. Both sides of every exchange are now reliably captured — verified end-to-end with a live agent.
  • Write failures are never silent. MCP write tools (remember, memory_write, …) now raise an explicit error instead of reporting success when a write is rejected — so an agent can't believe it saved something it didn't.
  • agent-setup wires each agent its own scoped principal + token (fixes writes silently failing with 403).

Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).

v0.1.8 — recall that stays fast on your hardware

12 Jun 19:15

Choose a tag to compare

  • No more cold-start stall. The server accepts recalls immediately on startup; while the reranker warms in the background, recall returns best-available results (flagged degraded) instead of blocking. First-call latency drops from tens of seconds to under ~2s on every machine.
  • Self-calibrating to your hardware. memnos measures rerank speed at startup and sizes reranking to a latency ceiling: capable machines keep full ranking depth (no accuracy change), CPU-only machines stay responsive instead of timing out. Tunable via MEMNOS_RERANK_BUDGET_MS / MEMNOS_RERANK_CAP; MEMNOS_RERANK=0 disables reranking entirely.
  • Per-stage recall timings in the audit log (embed / sql / staleness / rerank) for diagnosing latency, plus a 60s query-embedding cache.
  • Published benchmark: LongMemEval full-500 = 78.4% (gpt-4o answer + judge, on a competitor's open MemoryBench harness) — every prediction in benchmarks/results/.

Upgrade: memnos upgrade && memnos restart (or uv tool upgrade memnos).

v0.1.7 — what's true now

11 Jun 21:06

Choose a tag to compare

  • Belief-change supersession on the write path: when a new memory contradicts an old one (status flips, negations, value updates like "rate limit is now 200"), the old fact is closed out with full provenance — recall returns what's true now, with the transition visible ("superseded as of ").
  • Write-path dedupe: restating the same fact bumps its salience instead of stacking duplicate rows.
  • memnos namespace reconcile <ns> — applies the same rules to memories stored before 0.1.7 (--dry-run shows exact counts first; no LLM calls).
  • Smarter ranking on broad questions: "where are we with X" now leads with distilled facts; verbatim questions still surface the original conversation. Tunable/disablable via env.
  • Windows installation guide (docs/guides/windows.md); LoCoMo full-10 band updated to 64–65% with results published.

Closes #10, #11. Verified: full test suite + 3-OS CI matrix + LoCoMo full-10 held at 64–65% across the changes.

Upgrade: memnos upgrade (or uv tool upgrade memnos), then memnos restart.

v0.1.6 — the coordination release: attributed, grounded, governed

11 Jun 05:51

Choose a tag to compare

Memory as the coordination plane for multi-agent systems.

Author-attributed memory

Every memory now carries its author — stamped server-side from the authenticated token, impossible to forge from a request body. Recall and context blocks show who wrote what (- (decision, 2026-06-11, by arch-agent) ...), and /recall accepts an author filter. Multi-agent blackboard coordination: shared namespace + signed writes + webhook subscriptions = agents that hand off work with full provenance.

Grounded recall (knowledge bases)

Tag any namespace kind=knowledge (no reserved prefixes) and link it to working namespaces: memnos namespace link proj:billing cms57-docs. Recall then automatically consults linked knowledge bases — enforced by the engine, not the prompt. Links are policy, grants are permission: both required, and skipped links are visible in the response (grounded_in / links_skipped).

Typed memories + pinned constraints

memnos remember "..." --type decision|incident|constraint|skill|fact — types flow through extraction and recall filters. Constraints are always injected: type=constraint memories appear in every recall for their namespace (and granted linked KBs), rendered first as CONSTRAINT: lines regardless of the query — compliance rules your agents physically cannot forget. New "Memory feed" tab in the admin console (type badges, author, age).

Published, CI-enforced API contract

openapi.yaml — 58 operations, every one exercised and schema-validated against the real server in CI. The published spec cannot drift from the implementation. Human reference at docs/api.md with curl + SDK examples.

The memnos CLI grammar

Consistent noun–verb commands (principal create, token mint, grant add, namespace link) — every old form still works as an alias. The full CLI reference is auto-generated from the parser itself (staleness gated in CI), published at docs/cli.md and as a CLI Reference tab in the admin console. New cross-platform CI matrix (Linux/macOS/Windows) — which caught and fixed a real Windows console encoding bug before release.

memnos-sdk 0.1.6 published in lockstep.

v0.1.5 — reliable capture, deterministic proxy, field-hardened performance

11 Jun 05:08

Choose a tag to compare

The reliability release: every fix in here came from real field deployments.

Capture — the answer is the memory

  • CRITICAL fix: the Claude Code Stop hook now captures BOTH speakers. Previously only the user's prompt was saved — everything the assistant said (decisions, ticket IDs, outcomes) was invisible across sessions. Assistant replies are now captured with identifiers preserved verbatim, and the extraction prompt keeps ticket keys / PR numbers / versions / URLs intact.
  • New: memnos proxy — deterministic both-sides capture for any OpenAI- or Anthropic-compatible client (Hermes, OpenClaw, Open WebUI, SDK apps): transparent relay (streaming included), BYOK (keys forwarded, never stored), fail-open (a capture problem never breaks your agent), agent-loop noise filtered (tool-call iterations, title calls, dedupe). Typed error taxonomy so you always know if a failure is the proxy, the network, or the LLM.
  • Session trust rails: every Claude Code session opens with a visible memnos: memory ACTIVE status line (or a loud warning + fix); memnos status shows server, proxy, capture counters, and stale/unmanaged-process detection.

Integrations

  • memnos agent-setup now uniform across claude-code, claude-desktop, codex, cursor, windsurf, openclaw, hermes (one grammar; claude-setup remains an alias)
  • Claude Desktop: platform-aware config paths, absolute command path (fixes spawn memnos ENOENT), and a memory skill for consistent recall/remember behavior
  • memnos upgrade re-wires previously-installed integrations automatically
  • memnos autostart [--proxy] — launchd/systemd login services; the server waits for Postgres instead of crashing

Performance (measured at 52k–104k row scale)

  • /recall −51% latency (single-pass retrieve+rerank); raw-turn BM25 index added
  • New default reranker: ms-marco-MiniLM-L-6 — measured on the identical full-10 LoCoMo corpus: +6pp accuracy over the 13× larger model, 8.4× faster reranks, ~660MB less memory, 0.23s cold start
  • No pool connection is ever held across LLM/embedding/CPU work (storm test: total write outage → 24/24 success, admin p95 3ms); async ingest option; statement timeouts
  • Supersession writes 2,116ms → 0.9ms at 95k facts; bounded embedding cache (a 35k-write ingest now costs +26MB, was ~1.9GB); processes show as memnos-server/memnos-proxy in ps/top
  • Admin console: loading/error/empty states everywhere, pagination, retry — plus faster endpoints

Benchmarks (reproducible, predictions published)

  • Full LoCoMo (10 conversations, 1,542 questions, gpt-4o judge): 64–65% across three independent ingests with the new defaults (benchmarks/results/)

memnos-sdk 0.1.5 published in lockstep.

v0.1.4 — friction-free CLI

10 Jun 06:22

Choose a tag to compare

Friction-free install & operations

Server lifecycle like any daemon

  • memnos start / stop / restart / status — background server with pid + log management; memnos serve remains the foreground primitive for systemd/launchd/Docker
  • memnos start shows live progress (first start downloads the local ONNX models ~1 GB — it tells you, tails the log, and surfaces startup errors inline)
  • Prominent "START THE SERVER" guidance at the end of setup

Setup

  • memnos setup --docker — provisions a ready pgvector Postgres container (native Postgres remains the first-class path)
  • Clear, actionable guidance when pgvector is missing or built for the wrong Postgres major (brew version-mismatch gotcha included)
  • OpenAI-key step: hidden input, whitespace-scrubbed, format-checked, live-validated against the OpenAI API before acceptance; blank entry now confirms before locking in free local 384-d mode; key stored AES-256-GCM encrypted in the vault
  • Re-running setup is safe — the schema is additive/idempotent (never wipes data)

New: memnos migrate-embeddings

  • Lossless migration between local 384-d and OpenAI 1536-d embeddings — re-embeds every memory from its stored source text, swaps the column type, rebuilds the HNSW indexes, and flips the server mode. --to {384,1536}, with cost + running-server warnings

Upgrades & versioning

  • memnos upgrade (--check) — detects uv/pipx/pip installs and upgrades in place
  • Bare memnos and memnos -V show the version; an available upgrade is hinted

Engine (since 0.1.3)

  • Reranker + local embeddings on ONNX Runtime via fastembed — torch removed, install ~770 MB → ~236 MB, identical ranking (LoCoMo re-validated within the 57–61% band)

memnos-sdk 0.1.4 published in lockstep (no functional SDK changes).

memnos v0.1.3 — uv-first install docs

10 Jun 03:57

Choose a tag to compare

Docs refresh: uv is now the primary installer (uv tool install memnos for the CLI, uv pip install memnos-sdk for the library); pip/pipx are fallbacks. No engine changes. memnos + memnos-sdk both at 0.1.3.

memnos v0.1.2 — ONNX backend (no torch), ~2.4x smaller install

10 Jun 03:42

Choose a tag to compare

Reranker + local embeddings now run on ONNX Runtime (fastembed) instead of torch — install ~770MB to ~236MB. Same models, bit-identical reranking. LoCoMo re-validated: 57% (57-61% band). memnos + memnos-sdk both at 0.1.2.