A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

This is the repository containing code to replicate the experiments in our paper.

Environment set up

Create a python environment and then run:

pip install -r requirements.txt

Toy Experiment

To run our toy data experiment, run all cells in /src/experiments/toy_data_experiments.

Language Model Experiment

To run our empirical experiment, we will need to:

Generate strings with different sampling adaptors using an RLHF-tuned model.
Score the generated corpora and compute their log-probabilities under the prior and RLHF-tuned model with the sampling adaptor.
Run the Independent Metropolis Hastings Algorithm.

To do so, you can use our files or run the scripts to generate everything from scratch.

Option 1: download our files and generate the figures

Download data.zip from here and data_dpo.zip from here.
Unzip the files into src/language_model_experiments/data and src/language_model_experiments/data_dpo respectively.
Download the cached intermediate outputs .cache.zip from here and .cache_dpo.zip from here.
Unzip the files into src/language_model_experiments/.cache and src/language_model_experiments/.cache_dpo respectively.
Generate the figures with

bash create_figures.sh

Option 2: Run the experiment from scratch

Generate all 25 corpora with both a RLHF-tuned and DPO-tuned model with:

bash generate.sh

This should produce 50 .csv files of generated text.

Score all 25 corpora with:

bash score_with_correction.sh

Comment out the files and model settings as needed before running the script.

For every .csv file, this should produce 3 files scoring the reward and log-probabilities of the generated strings. When this is done, move all files to src/language_model_experiments/data and src/language_model_experiments/data_dpo, for the RLHF-tuned model and DPO-tuned model, respectively.

Run the Independent Metropolis Hastings algorithm and generate the figures with:

bash create_figures.sh

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
backup.sh		backup.sh
create_figures.sh		create_figures.sh
generate.sh		generate.sh
requirements.txt		requirements.txt
score_with_correction.sh		score_with_correction.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Environment set up

Toy Experiment

Language Model Experiment

Option 1: download our files and generate the figures

Option 2: Run the experiment from scratch

About

Releases

Packages

Languages

tanyjnaaman/probability-quality-paradox

Folders and files

Latest commit

History

Repository files navigation

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Environment set up

Toy Experiment

Language Model Experiment

Option 1: download our files and generate the figures

Option 2: Run the experiment from scratch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages