Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

1. Overview

AgeMem is built on Trinity-RFT and performs reinforcement fine-tuning (RFT) on HotpotQA to train LLM agents with context management and long-term memory management capabilities.

The model uses six callable tools:

Tool	Type	Function
`Summary_context`	STM (context)	Compresses historical dialogue to save tokens
`Clear_context` (`Filter_context`)	STM (context)	Removes irrelevant context by semantic criteria
`Retrieve_memory`	STM (context)	Retrieves relevant long-term memory into current context
`Add_memory`	LTM	Adds new memory to vector store
`Update_memory`	LTM	Updates existing memory
`Delete_memory`	LTM	Deletes memory by ID

Three-stage training pipeline

Stage 1: Casual interaction  - Learn Add/Update/Delete memory behavior from context facts
Stage 2: Distractor injection - Learn Clear/Summary behavior under noisy context
Stage 3: Formal QA           - Learn integrated retrieval + reasoning + context control

2. Installation

2.1 Clone the repo

git clone https://github.com/y1y5/AgeMem
cd AgeMem

2.2 Create a virtual environment

# Conda (recommended)
conda create -n trinity python=3.10.19
conda activate trinity

# Or venv
python3.10 -m venv .venv
source .venv/bin/activate

2.3 Install Trinity-RFT

# Editable install (recommended)
pip install -e ".[dev]"

# Optional: flash-attn acceleration
pip install -e ".[flash_attn]"
# If build fails, try:
# pip install flash-attn==2.8.1 --no-build-isolation

2.4 Set environment variables

# Base model path
export TRINITY_MODEL_PATH=/path/to/Qwen2.5-7B-Instruct

# Checkpoint root
export TRINITY_CHECKPOINT_ROOT_DIR=/path/to/checkpoints

# HotpotQA fullwiki path
export HOTPOTQA_PATH=/path/to/dataset/hotpot_qa/fullwiki

# DashScope API key (required for distractor generation and LLM-as-judge)
export DASHSCOPE_API_KEY=your_dashscope_api_key

# Tokenizer path (optional, defaults to bert-base-uncased)
export TOKENIZER_PATH=/path/to/bert-base-uncased

# WandB API key (optional)
export WANDB_API_KEY=your_wandb_api_key

3. Project Structure

AgeMem/
├── trinity/common/workflows/
│   ├── memory_context/
│   │   ├── train_hotpotQA.py
│   │   ├── eval_hotpotQA.py
│   │   ├── utils.py
│   │   ├── memory_store.py
│   │   ├── workflow_prompt.py
│   │   └── workflow_metrics.py
│   └── memory_reward/
│       └── my_reward.py
├── examples/
│   └── agemem_hotpotqa/
│       ├── agemem_train.yaml
│       ├── agemem_eval.yaml
│       └── README.md
├── AgeMem_code_agentscope/
├── docs/
│   └── AgeMem_README.md
└── pyproject.toml

4. Data Preparation

AgeMem uses HotpotQA in fullwiki format.

Expected directory layout:

/path/to/dataset/hotpot_qa/
├── distractor/
├── fullwiki/
└── ...

4.1 Required fields

Field	Type	Description
`question`	`str`	Input question
`answer`	`str`	Ground-truth answer (can be missing in some test sets)
`context`	`dict`	`{"title": [...], "sentences": [[...], ...]}`
`supporting_facts`	`dict` (optional)	`{"title": [...], "sent_id": [...]}`

4.2 Dataset path in YAML

buffer:
  explorer_input:
    taskset:
      storage_type: file
      path: '/path/to/dataset/hotpot_qa/fullwiki'
      split: 'train'
      format:
        prompt_key: 'question'
        response_key: 'answer'

5. Model Preparation

5.1 Download base model

# HuggingFace
huggingface-cli download Qwen/Qwen2.5-7B-Instruct \
  --local-dir /path/to/model/Qwen2.5-7B-Instruct

# Or ModelScope
modelscope download Qwen/Qwen2.5-7B-Instruct \
  --local_dir /path/to/model/Qwen2.5-7B-Instruct

5.2 Set model path in YAML

model:
  model_path: ${oc.env:TRINITY_MODEL_PATH,/path/to/Qwen2.5-7B-Instruct}

6. Configuration

6.1 Training config (`agemem_train.yaml`)

Key fields:

Field	Description
`buffer.explorer_input.taskset.path`	HotpotQA training set path
`buffer.explorer_input.default_workflow_type`	`AgeMem_hotpot_workflow_training`
`algorithm.algorithm_type`	`grpo`
`algorithm.repeat_times`	Rollouts per sample (default 8)
`workflow_args.stage2_distractor_messages`	Stage 2 distractor count
`workflow_args.stage3_max_rounds`	Stage 3 max rounds
`workflow_args.max_context_tokens`	Context token budget

6.2 Evaluation config (`agemem_eval.yaml`)

Key fields:

Field	Description
`mode`	`bench` (evaluation mode)
`buffer.explorer_input.default_workflow_type`	`AgeMem_hotpot_workflow_evaluation`
`buffer.explorer_input.eval_tasksets`	Evaluation tasksets
`explorer.bench_on_latest_checkpoint`	Evaluate latest checkpoint or not
`explorer.eval_on_startup`	Run evaluation on startup
`explorer.env_vars.DASHSCOPE_API_KEY`	API key for LLM judge
`workflow_args.use_context_tools`	Enable Summary/Clear/Retrieve
`workflow_args.enable_stage2_in_eval`	Enable Stage 2 distractors in eval

7. Training

7.1 Start Ray cluster

# Single machine
ray start --head

# Worker node
ray start --address=<master_ip>:6379

7.2 Run training

trinity run --config examples/agemem_hotpotqa/agemem_train.yaml

Training loop:

Explorer runs AgeMem_hotpot_workflow_training for three-stage rollouts
Experiences are written into buffer
Trainer updates policy with GRPO
Checkpoints are synchronized by configured interval

7.3 Resume from checkpoint

continue_from_checkpoint: true

Make sure checkpoint_root_dir and experiment name match the original run.

7.4 Monitoring (optional)

Enable the monitor section in YAML and set WANDB_API_KEY.

8. Evaluation

trinity run --config examples/agemem_hotpotqa/agemem_eval.yaml

Before running:

Ensure model.lora_configs[].path points to your checkpoint
Ensure all eval_tasksets paths are correct
Ensure DASHSCOPE_API_KEY is set

9. Standalone AgentScope Example

AgeMem_code_agentscope/ provides a standalone demo that does not depend on the Trinity-RFT training pipeline.

pip install -r AgeMem_code_agentscope/requirements.txt
export DASHSCOPE_API_KEY=your_key
python -m AgeMem_code_agentscope.main

See AgeMem_code_agentscope/README.md for details.

Acknowledgement

This project is built on top of Trinity-RFT, an excellent open-source reinforcement fine-tuning framework for LLM agents. We sincerely thank the Trinity-RFT team for their outstanding contribution to the community.

Citation

If this codebase helps your research, please cite the AgeMem paper.

@article{yu2026agentic,
  title={Agentic memory: Learning unified long-term and short-term memory management for large language model agents},
  author={Yu, Yi and Yao, Liuyi and Xie, Yuexiang and Tan, Qingquan and Feng, Jiaqi and Li, Yaliang and Wu, Libing},
  journal={arXiv preprint arXiv:2601.01885},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
AgeMem_code_agentscope		AgeMem_code_agentscope
benchmark		benchmark
docs		docs
environments		environments
examples		examples
scripts		scripts
tests		tests
trinity		trinity
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
get-docker.sh		get-docker.sh
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Table of Contents

1. Overview

Three-stage training pipeline

2. Installation

2.1 Clone the repo

2.2 Create a virtual environment

2.3 Install Trinity-RFT

2.4 Set environment variables

3. Project Structure

4. Data Preparation

4.1 Required fields

4.2 Dataset path in YAML

5. Model Preparation

5.1 Download base model

5.2 Set model path in YAML

6. Configuration

6.1 Training config (`agemem_train.yaml`)

6.2 Evaluation config (`agemem_eval.yaml`)

7. Training

7.1 Start Ray cluster

7.2 Run training

7.3 Resume from checkpoint

7.4 Monitoring (optional)

8. Evaluation

9. Standalone AgentScope Example

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Table of Contents

1. Overview

Three-stage training pipeline

2. Installation

2.1 Clone the repo

2.2 Create a virtual environment

2.3 Install Trinity-RFT

2.4 Set environment variables

3. Project Structure

4. Data Preparation

4.1 Required fields

4.2 Dataset path in YAML

5. Model Preparation

5.1 Download base model

5.2 Set model path in YAML

6. Configuration

6.1 Training config (agemem_train.yaml)

6.2 Evaluation config (agemem_eval.yaml)

7. Training

7.1 Start Ray cluster

7.2 Run training

7.3 Resume from checkpoint

7.4 Monitoring (optional)

8. Evaluation

9. Standalone AgentScope Example

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

6.1 Training config (`agemem_train.yaml`)

6.2 Evaluation config (`agemem_eval.yaml`)

Packages