A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

This is the repository containing code to replicate the experiments in our paper.

Environment set up

Create a python environment and then run:

pip install -r requirements.txt

Toy Experiment

To run our toy data experiment, run all cells in /src/experiments/toy_data_experiments.

Language Model Experiment

To run our empirical experiment, we will need to:

Generate strings with different sampling adaptors using an RLHF-tuned model.
Score the generated corpora and compute their log-probabilities under the prior and RLHF-tuned model with the sampling adaptor.
Run the Independent Metropolis Hastings Algorithm.

To do so, you can use our files or run the scripts to generate everything from scratch.

Option 1: download our files and generate the figures

Download data.zip from here and data_dpo.zip from here.
Unzip the files into src/language_model_experiments/data and src/language_model_experiments/data_dpo respectively.
Download the cached intermediate outputs .cache.zip from here and .cache_dpo.zip from here.
Unzip the files into src/language_model_experiments/.cache and src/language_model_experiments/.cache_dpo respectively.
Generate the figures with

bash create_figures.sh

Option 2: Run the experiment from scratch

Generate all 25 corpora with both a RLHF-tuned and DPO-tuned model with:

bash generate.sh

This should produce 50 .csv files of generated text.

Score all 25 corpora with:

bash score_with_correction.sh

Comment out the files and model settings as needed before running the script.

For every .csv file, this should produce 3 files scoring the reward and log-probabilities of the generated strings. When this is done, move all files to src/language_model_experiments/data and src/language_model_experiments/data_dpo, for the RLHF-tuned model and DPO-tuned model, respectively.

Run the Independent Metropolis Hastings algorithm and generate the figures with:

bash create_figures.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Environment set up

Toy Experiment

Language Model Experiment

Option 1: download our files and generate the figures

Option 2: Run the experiment from scratch

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Environment set up

Toy Experiment

Language Model Experiment

Option 1: download our files and generate the figures

Option 2: Run the experiment from scratch