PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

(Website)

Paper Link: https://arxiv.org/pdf/2402.08702.pdf

This project is for automatic prompt optimization with a focus on multi-step tasks.

Requirements

Please setup the environment with conda and pip as follows:

conda create -n PROMST python=3.10
conda activate PROMST
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install conda-forge::tiktoken
pip install openai --upgrade
pip install pygame
conda install conda-forge::transformers
conda install anaconda::scikit-learn
pip install gym==0.26.2
pip install pddlgym
pip install tarski
pip install tiktoken

Note that when doing the Logistics and BlocksWorld tasks, we slightly modify the pddlgym files. Thus, you need to substitute the original installed pddlgym directory in packages (e.g., /Users/Your user name/opt/anaconda3/envs/PROMST/lib/python3.10/site-packages/pddlgym) with the pddlgym directory in BlocksWorld/env_data_BlocksWorld/pddlgym. Like the following code:

cd /Users/Your user name/opt/anaconda3/envs/PROMST/lib/python3.10/site-packages
cp -r /Your path to PROMST/PROMST/BlocksWorld/env_data_BlocksWorld/pddlgym ./

Usage

Your system does not need GPU to train the score model if you set with_score_model = 'False'. To run each task, you enter into each task directory and set up the OPENAI_API key (sk-..) in LLM.py line 11. Then you can run the following code. The args can be changed with your preference.

python env3-box-arrange-train_MCTS.py
-experiment_trial_num 1
-input_error_prompt_token_limit 15000
-model_name_promptLLM gpt-4-1106-preview
-model_name_testLLM gpt-3.5-turbo-16k-0613
-min_level 2
-n_children 8
-n_selected 2
-prompt_method PROMST
-with_score_model 'False'
-Training_path ../BoxLift/train_set/
-Testing_path ../BoxLift/test_set/
-base_path ../BoxLift/

python env2-box-arrange-train_MCTS.py -experiment_trial_num 1 -input_error_prompt_token_limit 15000 -model_name_promptLLM claude-3-opus-20240229 -model_name_testLLM claude-3-opus-20240229 -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/

python env2-box-arrange-train_MCTS.py -experiment_trial_num 1 -input_error_prompt_token_limit 15000 -model_name_promptLLM open-mixtral-8x7b -model_name_testLLM open-mixtral-8x7b -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/

python env2-box-arrange-train_MCTS.py -experiment_trial_num 2 -input_error_prompt_token_limit 15000 -model_name_promptLLM mistral-large-latest -model_name_testLLM mistral-large-latest -min_level 2 -n_children 4 -n_selected 2 -prompt_method PROMST -with_score_model 'False' -Training_path ../BoxNet2/train_set/ -Testing_path ../BoxNet2/test_set/ -base_path ../BoxNet2/

Command for Alfworld/Scienceworld/Webarena

python agentboard/env9-box-arrange-train_MCTS.py     --cfg-path eval_configs/main_results_all_tasks.yaml     --tasks alfworld    --log_path ./results/gpt-3.5-turbo-16k-0613     --project_name evaluate-gpt-4 --experiment_trial_num 1  --model_name_promptLLM gpt-4-1106-preview  --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 8 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./alfworld_result/ --max_num_steps 30

Command for the restart from training process for Alfworld/Scienceworld/Webarena

python agentboard/env9-box-arrange-train_MCTS_restart.py     --cfg-path eval_configs/main_results_all_tasks.yaml     --tasks webarena    --log_path ./results/gpt-3.5-turbo-16k-0613     --project_name evaluate-gpt-4 --experiment_trial_num 3  --model_name_promptLLM gpt-4-1106-preview  --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 4 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./webarena_result/ --max_num_steps 30

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/gpt-3.5-turbo-16k-0613 --project_name evaluate-gpt-3.5-turbo-16k-0613 --experiment_trial_num 2 --model_name_promptLLM gpt-4-1106-preview --model_name_testLLM gpt-3.5-turbo-16k-0613 --min_level 2 --n_children 4 --n_selected 2 --prompt_method APE --with_score_model 'False' --base_path ./alfworld_result/

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/gpt-4-1106-preview --project_name evaluate-gpt-4-1106-preview --experiment_trial_num 2 --model_name_promptLLM gpt-4-1106-preview --model_name_testLLM gpt-4-1106-preview --min_level 2 --n_children 4 --n_selected 2 --prompt_method APE --with_score_model 'False' --base_path ./alfworld_result/

python agentboard/env9-box-arrange-train_MCTS.py --cfg-path eval_configs/main_results_all_tasks.yaml --tasks alfworld --log_path ./results/claude-3-opus-20240229 --project_name evaluate-claude-3-opus-20240229 --experiment_trial_num 1 --model_name_promptLLM claude-3-opus-20240229 --model_name_testLLM claude-3-opus-20240229 --min_level 2 --n_children 4 --n_selected 2 --prompt_method PROMST --with_score_model 'False' --base_path ./alfworld_result/

Args to be set	Choices	Explanation
experiment_trial_num	1, 2, 3, ...	The trial number to record each trial
input_error_prompt_token_limit	10000, 15000, 20000, ...	The prompt maximum token length for context sliding window
model_name_promptLLM	gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, gpt-4-1106-preview	LLM type of PromptLLM
model_name_testLLM	gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, gpt-4-1106-preview	LLM type of TestLLM
min_level	1, 2, 3, ...	Minimum number of prompt levels to be explored
n_children	4, 8, 12, ...	Number of prompts to be expanded in each level
n_selected	2, 3, 4, ...	Number of best prompts to be further optimized in each level
prompt_method	PROMST, APE, APO	Prompt optimziation methods
with_score_model	'False', 'True'	Whether implementing socre model in PROMST
Training_path	../BoxLift/train_set/	The path to the created train_set path
Tesing_path	../BoxLift/train_set/, ../BoxLift/test_set/	The path to the created test_set path (be the same as the Training_path during optimization)
base_path	../BoxLift/	The base path to save the prompt optimization results

Recommended Work

Large language models are human-level prompt engineers

Automatic Prompt Optimization with “Gradient Descent” and Beam Search

Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
BlocksWorld		BlocksWorld
BoxLift		BoxLift
BoxNet1		BoxNet1
BoxNet2		BoxNet2
GridWorld1		GridWorld1
GridWorld2		GridWorld2
Logistics		Logistics
WareHouse		WareHouse
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BlocksWorld

BlocksWorld

BoxLift

BoxLift

BoxNet1

BoxNet1

BoxNet2

BoxNet2

GridWorld1

GridWorld1

GridWorld2

GridWorld2

Logistics

Logistics

WareHouse

WareHouse

README.md

README.md

Repository files navigation

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

(Website)

Requirements

Usage

Recommended Work

About

Releases

Packages

Languages

yongchao98/PROMST

Folders and files

Latest commit

History

Repository files navigation

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

(Website)

Requirements

Usage

Recommended Work

About

Resources

Stars

Watchers

Forks

Languages