Skip to content

Memory Arena v0.1.8

Latest

Choose a tag to compare

@xmpuspus xmpuspus released this 29 Jun 02:02

A 30-line vector store beats every funded agent-memory SDK on the same eval, and none is close to solving the problem.

Memory Arena runs 20 agent-memory strategies through one lifecycle (setup -> ingest -> recall -> teardown) on the same LongMemEval-S smoke corpus, judged by the same model (Claude Opus 4.7), with the same embeddings and top_k.

What's in 0.1.8

  • 20 strategies: pure-Python baselines and retrievers, vendor SDKs (Mem0, Graphiti, Graphiti-on-FalkorDB, Cognee, LangMem, Memori), an LLM-maintained wiki, and two quantum rerankers over the same vector store.
  • mem0 and langmem leveled to the same model (Claude Sonnet) as the baselines, so the comparison isn't a model handicap.
  • Bootstrap 95% CIs; a 19-way GPT-4o cross-judge agrees on the ranking (Spearman +0.967, more lenient in absolute terms).
  • Next.js dashboard, Typer CLI, and reproducible result JSONs stamped with commit SHA, package versions, model IDs, and seed.

Quickstart

pip install memory-arena
memory-arena demo

memory-arena demo launches the bundled dashboard with pre-computed results. Full reproduction steps and the methodology notes are in the README.