A curated list of papers on memory for LLM / multimodal agents — methods, benchmarks, and surveys — covering episodic, semantic, procedural, and multimodal memory, with both parametric (internal) and retrieval-based (external) storage, learned via prompting, supervised finetuning, or reinforcement learning.
90 papers · 7 surveys · 31 benchmarks · 52 methods · last updated 2026-04-21
Interactive dashboard with multi-tag filtering: https://yyyujintang.github.io/Awesome-Agent-Memory-Papers/
Contributions welcome — open an issue or PR with new papers.
- Surveys
- Benchmarks
- Methods
- Multimodal Memory (16)
- Procedural Memory (10)
- Episodic Memory (18)
- Semantic Memory (2)
- Internal / Parametric Memory (4)
- Other Methods (2)
- Tag Legend
- Rethinking Memory Mechanisms of Foundation Agents in the Second Half
2026-01-14 · Jiawei Han, Philip Yu
Survey - AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
2025-12-29 · [code]
Survey - Memory in the Age of AI Agents
2025-12-15 · Shuicheng Yan, Guibin Zhang · [code]
Survey - Measuring Agents in Production
2025-12-02 · Shuicheng Yan, Guibin Zhang
Survey - Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
2025-03-23 · Xuming Hu
Survey - Episodic memory in AI agents poses risks that should be studied and mitigate
2025-01-20
Survey - A Survey on the Memory Mechanism of Large Language Model based Agents
2024-04-21
Survey
Evaluation suites for agent memory, split by interaction mode.
- KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
2026-01-08
BenchmarkQA - Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents
2026-01-07
BenchmarkQA - Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
2025-07-07 · ICLR26 · Yuanzhe Hu · [code]
BenchmarkQA - LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
2024-10-14
BenchmarkQA - (LoCoMo) Evaluating Very Long-Term Conversational Memory of LLM Agents
2024-02-27
BenchmarkQA
- WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks
2025-06-02
BenchmarkWeb - RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
2025-04-14
BenchmarkWeb - The BrowserGym Ecosystem for Web Agent Research
2024-12-06
BenchmarkWeb - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
2024-01-24 · ACL24 · [code]
BenchmarkWeb - WebArena: A Realistic Web Environment for Building Autonomous Agents
2023-07-25 · ICLR24 · Shuyan Zhou, Duke
BenchmarkWeb - Mind2Web: Towards a Generalist Agent for the Web
2023-06-09 · NeurIPS23, Spotlight
BenchmarkWeb - WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
2022-07-04 · NeurIPS22 · [code]
BenchmarkWeb
- Gym-Anything: Turn any Software into an Agent Environment
2026-04-07
BenchmarkGUI - MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
2026-02-03 · [code]
BenchmarkGUI - OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks
2026-01-28
BenchmarkGUI - LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
2026-01-26 · ICLR26
BenchmarkGUI - VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
2024-08-12 · THU-Jie Tang · [code]
BenchmarkGUI - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
2024-04-11 · NeurIPS24 · [code]
BenchmarkGUI
- AGENTVISTA: Evaluating Multimodal Agents in Ultra-Challenging
Realistic Visual Scenarios
2026-02-26 · Junxian He, May Fung
BenchmarkEmbodied - MentisOculi: Revealing the Limits of Reasoning with Mental Imagery
2026-02-02
BenchmarkEmbodied - CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
2025-04-21 · ICCV25 · Mohit Bansal
BenchmarkEmbodied - ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
2020-10-08 · ICLR21
BenchmarkEmbodied - ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
2019-12-03 · CVPR20
BenchmarkEmbodied - TextWorld: A Learning Environment for Text-based Game
2018-06-29 · IJCAI18
BenchmarkEmbodied
- AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
2026-02-26
BenchmarkLong-Horizon - MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks
2026-02-12 · Yu Wang, Yuanzhe Hu
BenchmarkLong-Horizon - A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
2025-09-30 · ICLR26 · Nikhil, ABxLab
BenchmarkLong-Horizon - OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows
2025-08-12
BenchmarkLong-Horizon - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
2024-12-18
BenchmarkLong-Horizon - MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
2024-09-30 · NeurIPS25 · [code]
BenchmarkLong-Horizon - AgentBench: Evaluating LLMs as Agents
2023-08-07 · ICLR24 · [code]
BenchmarkLong-Horizon
Each paper is placed in exactly one primary section (Multimodal > Procedural > Episodic > Semantic > External > Internal). Tag badges on each entry show the full tag vector — use the website for true multi-axis filtering.
- Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory
2026-04-01
MethodExternalPrompt-basedEpisodicMultimodalProceduralSemantic - Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models
2026-01-27 · Mingsheng Long Bytedance Seed
MethodInternalSFTMultimodal - MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
2026-01-26
MethodExternalPrompt-basedEpisodicMultimodal - MemVerse: Multimodal Memory for Lifelong Learning Agents
2025-12-03 · [code]
MethodExternalPrompt-basedEpisodicMultimodalProceduralSemantic - ViLoMem: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
2025-11-26 · CVPR26 · [code]
MethodExternalMultimodalSemantic - LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
2025-11-25 · CVPR26
MethodExternalPrompt-basedMultimodalProcedural - VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
2025-11-14 · Shuicheng Yan
MethodInternalSFTMultimodal - VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
2025-10-19 · NeurIPS25 · [code]
MethodInternalRL-basedEpisodicMultimodal - VideoLucy: Deep Memory Backtracking for Long Video Understanding
2025-10-14 · NeurIPS25
MethodExternalSFTEpisodicMultimodal - (M3-Agent) Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
2025-08-13 · ICLR26 · ByteDance Seed · [code]
MethodExternalSFTEpisodicMultimodalSemantic - MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
2025-08-11
MethodExternalPrompt-basedEpisodicMultimodal - Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Token
2025-06-20 · Chuang Gan · [code]
MethodInternalMultimodal - 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
2025-05-28
MethodExternalPrompt-basedEpisodicMultimodalSemantic - Towards General Continuous Memory for Vision-Language Models
2025-05-23 · NeurIPS25
MethodExternalInternalSFTEpisodicMultimodalSemantic - SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
2025-01-30 · [code]
MethodExternalPrompt-basedEpisodicMultimodal - Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
2024-08-07 · NeurIPS24 · [code]
MethodExternalPrompt-basedMultimodal
- A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
2026-03-20
MethodExternalPrompt-basedTraining-freeEpisodicProcedural - Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation
2026-02-15 · Weinan Zhang
MethodRL-basedProcedural - MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
2026-02-02
MethodExternalRL-basedProcedural - TokMem: Tokenized Procedural Memory for Large Language Models
2025-10-01
MethodInternalSFTProcedural - ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
2025-09-29 · ICLR26 · Siru Ouyang
MethodExternalPrompt-basedEpisodicProcedural - Memory Management and Contextual Consistency for Long-Running Low-Code Agents
2025-09-27
MethodExternalPrompt-basedEpisodicProcedural - Memory OS of AI Agent
2025-05-30 · EMNLIP25 Main
MethodExternalPrompt-basedEpisodicProceduralSemantic - A-MEM: Agentic Memory for LLM Agents
2025-02-17 · NeurIPS25 · [code]
MethodExternalPrompt-basedEpisodicProceduralSemantic - Agent Workflow Memory (AWM)
2024-09-11 · ICML26 · [code]
MethodExternalPrompt-basedProcedural - Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
2023-06-13 · [code]
MethodExternalPrompt-basedEpisodicProcedural
- Gated Memory Policy
2026-04-21 · Shuran Song
MethodInternalRL-basedEpisodic - HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents
2026-04-20
MethodExternalPrompt-basedTraining-freeEpisodicSemantic - PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
2026-02-23 · [code]
MethodExternalPrompt-basedTraining-freeEpisodicSemantic - Modeling Distinct Human Interaction in Web Agents
2026-02-19
MethodExternalPrompt-basedEpisodic - REMem: Reasoning with Episodic Memory in Language Agent
2026-02-13 · Yu Su, Huan Sun
MethodExternalPrompt-basedEpisodic - TraceMem: Weaving Narrative Memory Schemata from User Conversational Traces
2026-02-10 · HKU
MethodExternalPrompt-basedEpisodicSemantic - Learning to Continually Learn via Meta-learning Agentic Memory Designs
2026-02-08 · [code]
MethodExternalRL-basedEpisodic - Dep-Search: Learning Dependency-Aware Reasoning Traces with Persistent Memory
2026-01-27
MethodExternalPrompt-basedEpisodic - CAST: Character-and-Scene Episodic Memory for Agents
2026-01-14
MethodExternalPrompt-basedEpisodic - SimpleMem: Efficient Lifelong Memory for LLM Agents
2026-01-05
MethodExternalPrompt-basedEpisodicSemantic - Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects
2025-12-14
MethodExternalPrompt-basedEpisodicSemantic - A neural network model of free recall learns multiple memory strategies
2025-09-25 · [code]
MethodInternalEpisodic - PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
2025-07-07 · EMNLP25, Main
MethodExternalPrompt-basedEpisodicSemantic - Ella: Embodied Social Agents with Lifelong Memory
2025-06-30 · Chuang Gan
MethodExternalPrompt-basedEpisodicSemantic - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
2025-04-28
MethodExternalPrompt-basedEpisodicSemantic - R3Mem: Bridging Memory Retention and Retrieval via Reversible Compressio
n
2025-02-21
MethodExternalPrompt-basedEpisodic - HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
2024-05-23 · NeurIPS24 · Yu Su · [code]
MethodExternalPrompt-basedTraining-freeEpisodicSemantic - MemoryBank: Enhancing Large Language Models with Long-Term Memory
2023-05-17
MethodExternalPrompt-basedEpisodicSemantic
- Explicit v.s. Implicit Memory: Exploring Multi-hop Complex Reasoning Over Personalized Information
2025-08-15 · SIGKDD 26 · Zeyu Zhang
MethodExternalInternalPrompt-basedSemantic - From RAG to Memory: Non-Parametric Continual Learning for Large Language Models (HippoRAG 2)
2025-02-20 · ICML25 · [code]
MethodExternalInternalPrompt-basedSemantic
- When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning
2026-02-11 · Bytedance Seed
MethodInternalSFT - QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
2025-12-25
MethodInternalSFT - MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
2025-09-29 · Shuicheng Yan, Guibin Zhang · [code]
MethodInternal - Scaling Test-time Compute for LLM Agents
2025-06-15 · ICLR26
MethodInternalPrompt-based
- Agentic Reasoning for Large Language Models
2026-01-18 · Heng Ji
MethodPrompt-basedTraining-free - AgentRL: Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
2025-10-05 · Jie Tang
MethodRL-based
| Axis | Values |
|---|---|
| Category | Survey · Benchmark · Method |
| Benchmark Type | QA · Web · GUI · Embodied · Long-Horizon |
| Storage | Internal (parametric — weights / latent tokens) · External (non-parametric — retrieval) |
| Learning | Prompt-based · RL-based · SFT · Training-free |
| Memory Type | Episodic · Semantic · Procedural · Multimodal |
If this list is useful in your work, please consider starring the repo.