GitHub

👋 Overview

This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model, which automatically tunes the language agent prompts from environment feedback through policy gradient. Specifically, our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model which refines the language agent prompt by summarizing the root cause of prior failed attempts and proposing action plans. Experimental results on various tasks demonstrate that the language agents improve over time and that our approach considerably outperforms baselines that do not properly leverage gradients from the environment.

✨ Framework

Retroformer is comprised of two language model components: an actor LLM, denoted, which generates reasoning thoughts and actions, and a retrospective LLM, which generates verbal reinforcement cues to assist the actor in self-improvement by refining the actor prompt with reflection responses.

The actor model is regarded as an frozen LLM, such as GPT, with inaccessible model parameters. In this scenario, the most direct approach to enhancing actor performance in a given environment is by refining the actor LM's prompt. Consequently, the retrospective model, a smaller local language model, refines the actor's prompt by incorporating a concise summary of errors and valuable insights from failed attempts. We therefore aim to optimize the retrospective model using environment reward. The desired behavior of $M_r$ is to improve the actor model $M_a$ in next attempt. Hence, the difference in episode returns between two consecutive trials naturally serves as a reward signal for fine-tuning the retrospective model with reinforcement learning.

Read our paper for more details.

@article{yao2023retroformer,
  title={Retroformer: Retrospective large language agents with policy gradient optimization},
  author={Yao, Weiran and Heinecke, Shelby and Niebles, Juan Carlos and Liu, Zhiwei and Feng, Yihao and Xue, Le and Murthy, Rithesh and Chen, Zeyuan and Zhang, Jianguo and Arpit, Devansh and others},
  journal={arXiv preprint arXiv:2308.02151},
  year={2023}
}

🚀 Setup

Install Miniconda
Configure the training environment

conda create -n train python=3.10 -y
pip install -r requirements.txt

Configure the local LLM environment. Go to llm/serve.sh and run the command one by one
Configure the evaluation environment. Install THREE separate Python environments for evaluation tasks.

HotPotQA

conda create -n hotpotqa python=3.10 -y
pip install -r experiments/hotpotqa_runs/requiresments.txt

Webshop. Install the requirements for the webshop on: https://github.com/princeton-nlp/WebShop
Alfworld. Install the requirements for the Alfworld on https://github.com/alfworld/alfworld

Configure the environment variables in the .env file.

OPENAI_API_KEY='OpenAI API Key Here if using OpenAI Model (required for inference)'
OPENAI_MODEL='OpenAI MODEL NAME'
CKPT_DIR=/path/to/checkpoints
DATA_DIR=/path/to/data

👩‍💻 Experiments

Model training. Run these three scripts one by one.

python sft_run.py
python reward_run.py
python ppo_run.py

Evaluation Note that the evaluation environment are largely imported from the original Reflexion code repo. We thank the Reflexion team for providing the evaluation environments and self-reflection agent architecture.
See the experiments/hotpotqa folder for running hotpotqa experiments.
See the experiments/alfworld folder for running alfworld experiments.
See the experiments/webshop folder for running webshop experiments.
See the data/ folder for details about the preference data and generation methods. We provide data samples for hotpotqa environment in this repo.

🪪 License

APACHE 2.0. Check LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
ckpts		ckpts
configs		configs
data		data
experiments		experiments
llm		llm
script		script
trainer		trainer
wandb		wandb
.DS_Store		.DS_Store
.env		.env
.env.example		.env.example
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
ppo_run.py		ppo_run.py
requirements.txt		requirements.txt
reward_run.py		reward_run.py
setup.sh		setup.sh
sft_run.py		sft_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👋 Overview

✨ Framework

🚀 Setup

👩‍💻 Experiments

🪪 License

About

Releases

Packages

Languages

License

weirayao/Retroformer

Folders and files

Latest commit

History

Repository files navigation

👋 Overview

✨ Framework

🚀 Setup

👩‍💻 Experiments

🪪 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages