This is the repository containing code to replicate the experiments in our paper.
Create a python environment and then run:
pip install -r requirements.txt
To run our toy data experiment, run all cells in /src/experiments/toy_data_experiments
.
To run our empirical experiment, we will need to:
- Generate strings with different sampling adaptors using an RLHF-tuned model.
- Score the generated corpora and compute their log-probabilities under the prior and RLHF-tuned model with the sampling adaptor.
- Run the Independent Metropolis Hastings Algorithm.
To do so, you can use our files or run the scripts to generate everything from scratch.
- Download
data.zip
from here anddata_dpo.zip
from here. - Unzip the files into
src/language_model_experiments/data
andsrc/language_model_experiments/data_dpo
respectively. - Download the cached intermediate outputs
.cache.zip
from here and.cache_dpo.zip
from here. - Unzip the files into
src/language_model_experiments/.cache
andsrc/language_model_experiments/.cache_dpo
respectively. - Generate the figures with
bash create_figures.sh
- Generate all 25 corpora with both a RLHF-tuned and DPO-tuned model with:
bash generate.sh
This should produce 50 .csv
files of generated text.
- Score all 25 corpora with:
bash score_with_correction.sh
Comment out the files and model settings as needed before running the script.
For every .csv
file, this should produce 3 files scoring the reward and log-probabilities of the generated strings.
When this is done, move all files to src/language_model_experiments/data
and src/language_model_experiments/data_dpo
, for the RLHF-tuned model and DPO-tuned model, respectively.
- Run the Independent Metropolis Hastings algorithm and generate the figures with:
bash create_figures.sh