Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
- 1. Overview
- 2. Installation
- 3. Project Structure
- 4. Data Preparation
- 5. Model Preparation
- 6. Configuration
- 7. Training
- 8. Evaluation
- 9. Standalone AgentScope Example
AgeMem is built on Trinity-RFT and performs reinforcement fine-tuning (RFT) on HotpotQA to train LLM agents with context management and long-term memory management capabilities.
The model uses six callable tools:
| Tool | Type | Function |
|---|---|---|
Summary_context |
STM (context) | Compresses historical dialogue to save tokens |
Clear_context (Filter_context) |
STM (context) | Removes irrelevant context by semantic criteria |
Retrieve_memory |
STM (context) | Retrieves relevant long-term memory into current context |
Add_memory |
LTM | Adds new memory to vector store |
Update_memory |
LTM | Updates existing memory |
Delete_memory |
LTM | Deletes memory by ID |
Stage 1: Casual interaction - Learn Add/Update/Delete memory behavior from context facts
Stage 2: Distractor injection - Learn Clear/Summary behavior under noisy context
Stage 3: Formal QA - Learn integrated retrieval + reasoning + context control
git clone https://github.com/y1y5/AgeMem
cd AgeMem# Conda (recommended)
conda create -n trinity python=3.10.19
conda activate trinity
# Or venv
python3.10 -m venv .venv
source .venv/bin/activate# Editable install (recommended)
pip install -e ".[dev]"
# Optional: flash-attn acceleration
pip install -e ".[flash_attn]"
# If build fails, try:
# pip install flash-attn==2.8.1 --no-build-isolation# Base model path
export TRINITY_MODEL_PATH=/path/to/Qwen2.5-7B-Instruct
# Checkpoint root
export TRINITY_CHECKPOINT_ROOT_DIR=/path/to/checkpoints
# HotpotQA fullwiki path
export HOTPOTQA_PATH=/path/to/dataset/hotpot_qa/fullwiki
# DashScope API key (required for distractor generation and LLM-as-judge)
export DASHSCOPE_API_KEY=your_dashscope_api_key
# Tokenizer path (optional, defaults to bert-base-uncased)
export TOKENIZER_PATH=/path/to/bert-base-uncased
# WandB API key (optional)
export WANDB_API_KEY=your_wandb_api_keyAgeMem/
├── trinity/common/workflows/
│ ├── memory_context/
│ │ ├── train_hotpotQA.py
│ │ ├── eval_hotpotQA.py
│ │ ├── utils.py
│ │ ├── memory_store.py
│ │ ├── workflow_prompt.py
│ │ └── workflow_metrics.py
│ └── memory_reward/
│ └── my_reward.py
├── examples/
│ └── agemem_hotpotqa/
│ ├── agemem_train.yaml
│ ├── agemem_eval.yaml
│ └── README.md
├── AgeMem_code_agentscope/
├── docs/
│ └── AgeMem_README.md
└── pyproject.toml
AgeMem uses HotpotQA in fullwiki format.
Expected directory layout:
/path/to/dataset/hotpot_qa/
├── distractor/
├── fullwiki/
└── ...
| Field | Type | Description |
|---|---|---|
question |
str |
Input question |
answer |
str |
Ground-truth answer (can be missing in some test sets) |
context |
dict |
{"title": [...], "sentences": [[...], ...]} |
supporting_facts |
dict (optional) |
{"title": [...], "sent_id": [...]} |
buffer:
explorer_input:
taskset:
storage_type: file
path: '/path/to/dataset/hotpot_qa/fullwiki'
split: 'train'
format:
prompt_key: 'question'
response_key: 'answer'# HuggingFace
huggingface-cli download Qwen/Qwen2.5-7B-Instruct \
--local-dir /path/to/model/Qwen2.5-7B-Instruct
# Or ModelScope
modelscope download Qwen/Qwen2.5-7B-Instruct \
--local_dir /path/to/model/Qwen2.5-7B-Instructmodel:
model_path: ${oc.env:TRINITY_MODEL_PATH,/path/to/Qwen2.5-7B-Instruct}Key fields:
| Field | Description |
|---|---|
buffer.explorer_input.taskset.path |
HotpotQA training set path |
buffer.explorer_input.default_workflow_type |
AgeMem_hotpot_workflow_training |
algorithm.algorithm_type |
grpo |
algorithm.repeat_times |
Rollouts per sample (default 8) |
workflow_args.stage2_distractor_messages |
Stage 2 distractor count |
workflow_args.stage3_max_rounds |
Stage 3 max rounds |
workflow_args.max_context_tokens |
Context token budget |
Key fields:
| Field | Description |
|---|---|
mode |
bench (evaluation mode) |
buffer.explorer_input.default_workflow_type |
AgeMem_hotpot_workflow_evaluation |
buffer.explorer_input.eval_tasksets |
Evaluation tasksets |
explorer.bench_on_latest_checkpoint |
Evaluate latest checkpoint or not |
explorer.eval_on_startup |
Run evaluation on startup |
explorer.env_vars.DASHSCOPE_API_KEY |
API key for LLM judge |
workflow_args.use_context_tools |
Enable Summary/Clear/Retrieve |
workflow_args.enable_stage2_in_eval |
Enable Stage 2 distractors in eval |
# Single machine
ray start --head
# Worker node
ray start --address=<master_ip>:6379trinity run --config examples/agemem_hotpotqa/agemem_train.yamlTraining loop:
- Explorer runs
AgeMem_hotpot_workflow_trainingfor three-stage rollouts - Experiences are written into buffer
- Trainer updates policy with GRPO
- Checkpoints are synchronized by configured interval
continue_from_checkpoint: trueMake sure checkpoint_root_dir and experiment name match the original run.
Enable the monitor section in YAML and set WANDB_API_KEY.
trinity run --config examples/agemem_hotpotqa/agemem_eval.yamlBefore running:
- Ensure
model.lora_configs[].pathpoints to your checkpoint - Ensure all
eval_tasksetspaths are correct - Ensure
DASHSCOPE_API_KEYis set
AgeMem_code_agentscope/ provides a standalone demo that does not depend on the Trinity-RFT training pipeline.
pip install -r AgeMem_code_agentscope/requirements.txt
export DASHSCOPE_API_KEY=your_key
python -m AgeMem_code_agentscope.mainSee AgeMem_code_agentscope/README.md for details.
This project is built on top of Trinity-RFT, an excellent open-source reinforcement fine-tuning framework for LLM agents. We sincerely thank the Trinity-RFT team for their outstanding contribution to the community.
If this codebase helps your research, please cite the AgeMem paper.
@article{yu2026agentic,
title={Agentic memory: Learning unified long-term and short-term memory management for large language model agents},
author={Yu, Yi and Yao, Liuyi and Xie, Yuexiang and Tan, Qingquan and Feng, Jiaqi and Li, Yaliang and Wu, Libing},
journal={arXiv preprint arXiv:2601.01885},
year={2026}
}