SIPO: Self-Improvement Towards Pareto Optimality

Official code for paper Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment.

We provide instructions for running SIPO with MOD sampling on the HelpSteer dataset. For the BeaverTails-10K subset, please refer to scripts/examples/beavertails.sh.

Environment Set Up

conda create -n SIPO python=3.10.15
conda activate SIPO
pip install -r requirements.txt

Dataset Preprocess

python ./src/data/preprocess/HS_preprocess.py --tokenizer_path PKU-Alignment/alpaca-7b-reproduced

SFT

We follow the code of modpo to perform SFT on llama-2-7b. See the repository and scripts/sft for more details.

First Time Alignment

Perform first time alignment on the sft-model using DPO on two conflicted objectives (correctness and verbosity for HelpSteer).

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python scripts/baselines/dpo.py     --sft_model_name {path_to_the_sft_model}     --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"     --dataset_name "nvidia/HelpSteer-pairwise-{correctness|verbosity}-0.6"     --max_length 512     --training_args.output_dir "./output/nvidia/HelpSteer/SIPO/{1.0|0.0}correctness"     --training_args.run_name "nvidia/HelpSteer/SIPO/{1.0|0.0}correctness"     --training_args.per_device_train_batch_size 1     --training_args.per_device_eval_batch_size 6     --training_args.gradient_accumulation_steps 2     --training_args.learning_rate 5e-4     --peft_config.r 64     --peft_config.target_modules q_proj k_proj v_proj o_proj     --peft_config.lora_alpha 1     --peft_config.lora_dropout 0

MOD Sampling

Use MOD sampling to sample responses.

PYTHONPATH=. accelerate launch scripts/baselines/mod.py     --soup_weights 0.2 0.4 0.6 0.8   --sft_model_name {path_to_the_sft_model}    --dpo_model_1_name "./output/nvidia/HelpSteer/SIPO/0.0correctness/best_checkpoint"    --dpo_model_2_name "./output/nvidia/HelpSteer/SIPO/1.0correctness/best_checkpoint"     --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"     --dataset_name "nvidia/HelpSteer-pairwise-correctness-0.6"     --output_dir "./output/nvidia/HelpSteer/SIPO/gen_sample"     --max_length 512  --eval_size -1  --split "train_conflict"

Review Generation

Generate reviews for the MOD-sampled responses.

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python ./scripts/SIPO/review.py --sft_model_name {path_to_the_sft_model} --adapter_model_name "./output/nvidia/HelpSteer/SIPO/1.0correctness/best_checkpoint" --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:" --input_dir "./output/nvidia/HelpSteer/SIPO/gen_sample" --output_dir "./output/nvidia/HelpSteer/SIPO/review" --max_length 1200 --replication 4

Rewrite

Rewrite MOD_sampled response based on the reviews.

PYTHONPATH=. accelerate launch ./scripts/SIPO/rewrite.py    --sft_model_name {path_to_the_sft_model}    --soup_weights 0.2 0.4 0.6 0.8    --dpo_model_1_name "./output/nvidia/HelpSteer/SIPO/0.0correctness/best_checkpoint"   --dpo_model_2_name "./output/nvidia/HelpSteer/SIPO/1.0correctness/best_checkpoint"    --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"    --input_dir "./output/nvidia/HelpSteer/SIPO/review"   --output_dir "./output/nvidia/HelpSteer/SIPO/rewrite"    --max_length 1600

Filter

Combine rewritten responses that are no worse than MOD-sampled responses. Filter the combined responses using weight-interpolated models.

combine rewritten responses

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python3 scripts/SIPO/select_rewrite.py    --sft_model_name {path_to_the_sft_model}    --adapter_model_name "./output/nvidia/HelpSteer/SIPO/0.0correctness/best_checkpoint , ./output/nvidia/HelpSteer/SIPO/1.0correctness/best_checkpoint"    --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"    --input_mod_dir "./output/nvidia/HelpSteer/SIPO/gen_sample"  --input_dir "./output/nvidia/HelpSteer/SIPO/rewrite"    --output_dir "./output/nvidia/HelpSteer/SIPO/combine"    --replication 4    --peft_config.r 64     --peft_config.target_modules q_proj k_proj v_proj o_proj     --peft_config.lora_alpha 1     --peft_config.lora_dropout 0

filter combined responses

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python3 scripts/SIPO/select_sample.py    --soup_weights 0.2 0.4 0.6 0.8    --sft_model_name {path_to_the_sft_model}    --adapter_model_name "./output/nvidia/HelpSteer/SIPO/0.0correctness/best_checkpoint , ./output/nvidia/HelpSteer/SIPO/1.0correctness/best_checkpoint"   --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"    --dataset_name "nvidia/HelpSteer-pairwise-correctness-0.6"    --input_dir "./output/nvidia/HelpSteer/SIPO/combine"    --output_dir "./output/nvidia/HelpSteer/SIPO/filter"    --replication 4    --eval_size -1    --peft_config.r 64     --peft_config.target_modules q_proj k_proj v_proj o_proj     --peft_config.lora_alpha 1     --peft_config.lora_dropout 0

Fine-tuning

Fine-tune the models obtained in the initial alignment on filtered responses, using DPO loss with NLL loss regulatization.

cache conflicted data locally

PYTHONPATH=. python scripts/SIPO/cache_conflict.py --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:" --dataset_name "nvidia/HelpSteer-pairwise-correctness-0.6" --eval_size -1

merge peft adpters to model weights

PYTHONPATH=. python src/tools/merge_peft_adapter.py     --adapter_model_name "./output/nvidia/HelpSteer/SIPO/{1.0|0.0}correctness"     --base_model_name {path_to_the_sft_model}     --output_name "/model/HelpSteer_{correct|verbose}_DPO_model"    --dtype bf16

re-alignment

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python scripts/SIPO/Iterative_DPO.py --sft_model_name "/model/HelpSteer_{correct|verbose}_DPO_model"     --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"    --cached_data_dir "./output/cached_datasets/nvidia/HelpSteer-pairwise-correctness-0.6"    --dataset_name "./output/nvidia/HelpSteer/SIPO/filter"    --original_dataset_name "nvidia/HelpSteer-pairwise-{correctness|verbosity}-0.6"    --max_length 512    --training_args.output_dir "./output/nvidia/HelpSteer/SIPO/re-alignment/{1.0|0.0}correctness"     --training_args.run_name "nvidia/HelpSteer/SIPO/re-alignment/{1.0|0.0}correctness"     --training_args.per_device_train_batch_size 1     --training_args.per_device_eval_batch_size 6     --training_args.gradient_accumulation_steps 2     --training_args.learning_rate 5e-6     --peft_config.r 64     --peft_config.target_modules q_proj k_proj v_proj o_proj     --peft_config.lora_alpha 1     --peft_config.lora_dropout 0    --alpha 0.1

Evaluation

Use the models after re-alignment and MOD sampling to sample responses on test questions. Evaluate the responses using official released reward models.

MOD sampling on models after re-alignment

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python3 scripts/utils/mod_general.py     --soup_weights {0.2|0.4|0.6|0.8}   --sft_model_1_name "/model/HelpSteer_correct_DPO_model"    --sft_model_2_name "/model/HelpSteer_verbose_DPO_model"     --dpo_model_1_name "./output/nvidia/HelpSteer/SIPO/re-alignment/1.0correctness/best_checkpoint"    --dpo_model_2_name "./output/nvidia/HelpSteer/SIPO/re-alignment/0.0correctness/best_checkpoint"     --prompt_template "BEGINNING OF CONVERSATION: USER: {raw_prompt} ASSISTANT:"     --dataset_name "nvidia/HelpSteer-pairwise-correctness"     --output_dir "./output/nvidia/HelpSteer/SIPO/re-alignment/{0.2|0.4|0.6|0.8}correctness/gen_test"     --max_length 512  --eval_size -1  --split "test"

evaluation using official released RMs

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=. python3 scripts/utils/HS_score.py  --input_dir "./output/nvidia/HelpSteer/SIPO/re-alignment/{0.2|0.4|0.6|0.8}correctness/gen_test"     --output_dir "./output/nvidia/HelpSteer/SIPO/re-alignment/{0.2|0.4|0.6|0.8}correctness/score"

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIPO: Self-Improvement Towards Pareto Optimality

Environment Set Up

Dataset Preprocess

SFT

First Time Alignment

MOD Sampling

Review Generation

Rewrite

Filter

combine rewritten responses

filter combined responses

Fine-tuning

cache conflicted data locally

merge peft adpters to model weights

re-alignment

Evaluation

MOD sampling on models after re-alignment

evaluation using official released RMs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

zyttt-coder/SIPO

Folders and files

Latest commit

History

Repository files navigation

SIPO: Self-Improvement Towards Pareto Optimality

Environment Set Up

Dataset Preprocess

SFT

First Time Alignment

MOD Sampling

Review Generation

Rewrite

Filter

combine rewritten responses

filter combined responses

Fine-tuning

cache conflicted data locally

merge peft adpters to model weights

re-alignment

Evaluation

MOD sampling on models after re-alignment

evaluation using official released RMs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages