A 30-line vector store beats every funded agent-memory SDK on the same eval, and none is close to solving the problem.
Memory Arena runs 20 agent-memory strategies through one lifecycle (setup -> ingest -> recall -> teardown) on the same LongMemEval-S smoke corpus, judged by the same model (Claude Opus 4.7), with the same embeddings and top_k.
What's in 0.1.8
- 20 strategies: pure-Python baselines and retrievers, vendor SDKs (Mem0, Graphiti, Graphiti-on-FalkorDB, Cognee, LangMem, Memori), an LLM-maintained wiki, and two quantum rerankers over the same vector store.
mem0andlangmemleveled to the same model (Claude Sonnet) as the baselines, so the comparison isn't a model handicap.- Bootstrap 95% CIs; a 19-way GPT-4o cross-judge agrees on the ranking (Spearman +0.967, more lenient in absolute terms).
- Next.js dashboard, Typer CLI, and reproducible result JSONs stamped with commit SHA, package versions, model IDs, and seed.
Quickstart
pip install memory-arena
memory-arena demo
memory-arena demo launches the bundled dashboard with pre-computed results. Full reproduction steps and the methodology notes are in the README.