Skip to content

yongchao98/PROMST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

Paper Link: https://arxiv.org/pdf/2402.08702.pdf

This project is for automatic prompt optimization with a focus on multi-step tasks.

Requirements

Please setup the environment with conda and pip as follows:

conda create -n PROMST python=3.10
conda activate PROMST
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install conda-forge::tiktoken
pip install openai --upgrade
pip install pygame
conda install conda-forge::transformers
conda install anaconda::scikit-learn
pip install gym==0.26.2
pip install pddlgym
pip install tarski
pip install tiktoken

Note that when doing the Logistics and BlocksWorld tasks, we slightly modify the pddlgym files. Thus, you need to substitute the original installed pddlgym directory in packages (e.g., /Users/Your user name/opt/anaconda3/envs/PROMST/lib/python3.10/site-packages/pddlgym) with the pddlgym directory in BlocksWorld/env_data_BlocksWorld/pddlgym. Like the following code:

cd /Users/Your user name/opt/anaconda3/envs/PROMST/lib/python3.10/site-packages
cp -r /Your path to PROMST/PROMST/BlocksWorld/env_data_BlocksWorld/pddlgym ./

Usage

Your system does not need GPU to train the score model if you set with_score_model = 'False'. To run each task, you enter into each task directory and set up the OPENAI_API key (sk-..) in LLM.py line 11. Then you can run the following code. The args can be changed with your preference.

python env3-box-arrange-train_MCTS.py
-experiment_trial_num 1
-input_error_prompt_token_limit 15000
-model_name_promptLLM gpt-4-1106-preview
-model_name_testLLM gpt-3.5-turbo-16k-0613
-min_level 2
-n_children 8
-n_selected 2
-prompt_method PROMST
-with_score_model 'False'
-Training_path ../BoxLift/train_set/
-Testing_path ../BoxLift/test_set/
-base_path ../BoxLift/
python env2-box-arrange-train_MCTS.py -experiment_trial_num 1 -input_error_prompt_token_limit 15000 -model_name_promptLLM claude-3-opus-20240229 -model_name_testLLM claude-3-opus-20240229 -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/
python env2-box-arrange-train_MCTS.py -experiment_trial_num 1 -input_error_prompt_token_limit 15000 -model_name_promptLLM open-mixtral-8x7b -model_name_testLLM open-mixtral-8x7b -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/
python env2-box-arrange-train_MCTS.py -experiment_trial_num 2 -input_error_prompt_token_limit 15000 -model_name_promptLLM mistral-large-latest -model_name_testLLM mistral-large-latest -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/

Command for Alfworld/Scienceworld/Webarena

python agentboard/env9-box-arrange-train_MCTS.py     --cfg-path eval_configs/main_results_all_tasks.yaml     --tasks alfworld    --log_path ./results/gpt-3.5-turbo-16k-0613     --project_name evaluate-gpt-4 --experiment_trial_num 1  --model_name_promptLLM gpt-4-1106-preview  --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 8 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./alfworld_result/ --max_num_steps 30

Command for the restart from training process for Alfworld/Scienceworld/Webarena

python agentboard/env9-box-arrange-train_MCTS_restart.py     --cfg-path eval_configs/main_results_all_tasks.yaml     --tasks webarena    --log_path ./results/gpt-3.5-turbo-16k-0613     --project_name evaluate-gpt-4 --experiment_trial_num 3  --model_name_promptLLM gpt-4-1106-preview  --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 4 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./webarena_result/ --max_num_steps 30

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/gpt-3.5-turbo-16k-0613 --project_name evaluate-gpt-3.5-turbo-16k-0613 --experiment_trial_num 2 --model_name_promptLLM gpt-4-1106-preview --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 4 --n_selected 2 --prompt_method APE --with_score_model 'False' --base_path ./alfworld_result/

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/gpt-4-1106-preview --project_name evaluate-gpt-4-1106-preview --experiment_trial_num 2 --model_name_promptLLM gpt-4-1106-preview --model_name_testLLM gpt-4-1106-preview --min_level 2 --n_children 4 --n_selected 2 --prompt_method APE --with_score_model 'False' --base_path ./alfworld_result/

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/claude-3-opus-20240229 --project_name evaluate-claude-3-opus-20240229 --experiment_trial_num 1 --model_name_promptLLM claude-3-opus-20240229 --model_name_testLLM claude-3-opus-20240229 --min_level 2 --n_children 4 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./alfworld_result/

Args to be set Choices Explanation
experiment_trial_num 1, 2, 3, ... The trial number to record each trial
input_error_prompt_token_limit 10000, 15000, 20000, ... The prompt maximum token length for context sliding window
model_name_promptLLM gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, gpt-4-1106-preview LLM type of PromptLLM
model_name_testLLM gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, gpt-4-1106-preview LLM type of TestLLM
min_level 1, 2, 3, ... Minimum number of prompt levels to be explored
n_children 4, 8, 12, ... Number of prompts to be expanded in each level
n_selected 2, 3, 4, ... Number of best prompts to be further optimized in each level
prompt_method PROMST, APE, APO Prompt optimziation methods
with_score_model 'False', 'True' Whether implementing socre model in PROMST
Training_path ../BoxLift/train_set/ The path to the created train_set path
Tesing_path ../BoxLift/train_set/, ../BoxLift/test_set/ The path to the created test_set path (be the same as the Training_path during optimization)
base_path ../BoxLift/ The base path to save the prompt optimization results

Recommended Work

Large language models are human-level prompt engineers

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published