This repository contains the code and data pipeline for "Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs" [UNDER REVIEW].
git clone <repo-url>
cd <repo-name>
pip install -r requirements.txtNote: The IFEval dependency requires a vendored
lm-evaluation-harnesssubmodule underevals/. Make sure to install from the repo root as shown above so the-e ./evals/lm-evaluation-harness[ifeval]path resolves correctly.
After installing, download the spaCy English model:
python -m spacy download en_core_web_smFill in your API keys in env.sh, then source it before running any script:
# env.sh
export HF_TOKEN="your_huggingface_token"
export OPENAI_API_KEY="your_openai_key"
export WANDB_API_KEY="your_wandb_key"source env.shThe pipeline takes a prompt dataset, generates diverse responses, filters and embeds them, scores diversity, and selects preference pairs for DPO training.
For Qwen models (recommended starting point):
bash scripts/end_to_end_subset.shFor OLMo models:
bash scripts/unity_end_to_end.shStep 1 — Prepare prompts
Collect and filter prompts from source datasets:
python data_processing/prepare_prompts.pyOutput: data/instruct_subset.jsonl
Step 2 — Generate responses
Generate k diverse responses per prompt using vLLM:
bash scripts/generate.sh
# or for Llama models:
python data_processing/generate.py \
--input_file data/instruct_subset.jsonl \
--models "meta-llama/Llama-3.1-8B-Instruct,meta-llama/Llama-3.1-8B" \
--k 16 \
--output_dir ./generated_data \
--output_file_name generations_llama.jsonl \
--temperature 0.9 --top_p 0.95 --max_new_tokens 1024Output: generated_data/generations.jsonl
Step 3 — Clean responses
Lightly clean responses (fix truncation, punctuation) using an LLM:
bash scripts/cleanup.shOutput: generated_data/generations_cleaned.jsonl
Filter base model outputs where the instruct model's response is clearly superior:
bash scripts/clean_base.shOutput: generated_data/generations_full_cleaned.jsonl
Step 4 — Filter responses
Apply safety and instruction-following filters:
bash scripts/filter.shOutput: filtered_data/pilot_hard_filtered.jsonl
Step 5 — Embed responses
Group responses by prompt and embed them:
bash scripts/embed.sh [input_file] [model]
# model options: openai | bge | bothOutput: filtered_data/<basename>_embedded.jsonl
Step 6 — Score diversity
Compute marginal diversity scores for each response:
python data_processing/score_diversity.py \
--input_file filtered_data/<basename>_embedded.jsonl \
--embedding_method openai \
--diversity_method maxsim \
--output_file scored_data/scored.jsonlOutput: scored_data/scored.jsonl
Step 7 — Select preference pairs
Select (chosen, rejected) pairs for DPO training:
bash scripts/select_pairs.sh [input_file] [output_file] [mode] [epsilon]
# mode options: epsilon | bin | weightedExample:
bash scripts/select_pairs.sh \
scored_data/scored.jsonl \
preference_data/pairs.jsonl \
epsilon 6.0Output: preference_data/pairs.jsonl
Train with DPO using the selected preference pairs. Edit configs/config.yaml (Qwen), configs/llama_config.yaml (Llama), or configs/olmo_config.yaml (OLMo) to set your model path, LoRA settings, and hyperparameters, then run:
bash scripts/train.shOr directly:
export WANDB_PROJECT=YOUR_PROJECT
python train/train.py --config configs/config.yaml --train_file preference_data/pairs.jsonlWe evaluate on five benchmarks: MTBench, AlpacaEval, IFEval, Novelty-bench, and HarmBench.
bash evals/run_all.sh <model_path> \
--model-type [base|instruct|lora] \
--lora <path/to/lora/adapter> \ # only for lora
--output-dir ./evals/results/<model_id> \
--test-cases-path ./evals/harmbench/data/test_cases_directrequest_test.json# Evaluate a base model
bash scripts/eval_qwen_base.sh
# Evaluate an instruct model
bash scripts/eval_qwen_instruct.sh
# Evaluate OLMo instruct
bash scripts/eval_olmo_instruct.shCHECKPOINT_DIR=/path/to/your/checkpoints bash scripts/eval_qwen_checkpoints.sh
CHECKPOINT_DIR=/path/to/your/checkpoints bash scripts/eval_olmo_checkpoints.shbash evals/mtbench/run_mtbench.sh <model_path> [options]
bash evals/ifeval/run_ifeval.sh <model_path> [options]
bash evals/alpaca_eval_suite/run_alpaca.sh <model_path> [options]
bash evals/harmbench/run_harmbench.sh <model_path> [options]
bash evals/novelty_bench/run_novelty.sh <model_path> [options]Pass --help (or run without arguments) to any script to see its full option list.
