In [None]:
!pip install transformers==4.48.3
!pip install datasets==3.2.0
!pip install torch==2.5.1
!pip install numpy==1.25.0
!pip install pandas==2.2.2
!pip install peft==0.10.0
!pip install trl==0.14.0
!pip install huggingface-hub==0.26.1
!pip install google-generativeai==0.8.4
!pip install tqdm==4.67.7


In [None]:
%load_ext autoreload
%autoreload 2

In [117]:
import submission as submission
import dpo
import pandas as pd

# Part 1: Generate Completions

A core component in RLHF (and generative AI) is the quality of data used to train the model. Generating data goes against conventional statistical practices, but has turned out to be a powerful tool for training generative AI models. 

In this assignment, you will generate completions for a given IMDB movie review. You are given a dataset of over 500 movie reviews, but truncated to the first four words. From these four words, your goal is to use DPO to generate positive sentiment completions based on the first four words.

Your task for Part 1 is as follows:

0. Download the `pretrained.zip` file from the assignment files and unzip it. It should produce a filed called `sft_models`. Make sure it is in your working directory.
1. Fill out the `huggingface_key()` and `gemini_api_key()` functions in the `submission.py` file to be able to access our API calls 
2. Fill out the `pair_generator()` function in the `submission.py` file. Your goal is to prompt Gemini to generate 1) a positive sentiment completion and 2) a negative sentiment completion for each of the first four words of a given review. Be sure to instruct the model to keep the reviews short, max 1-2 sentences long.
3. Run the cells below to generate the completions. It is up to you to determine how many completions to generate. Generally, more data is better. 
4. Once the completions are generated, it will be saved to a csv file called `imdb_completions.csv`. Gemini will most likely return a completion that includes the first 4 words of the input text. We need to remove the first 4 words of the completion so that it is in the correct format. A function has been provided to you to do this in `submission.py` called `fix_completions()`. Run that function to fix the completions, which will save the fixed completions to a csv file called `imdb_completions_edited.csv`.

If you have successfully completed Part 1, you should now have 2 csv files in your working directory: `imdb_completions.csv` and `imdb_completions_edited.csv` populated with an `Input_Text`, `Accepted_Completion`, and `Rejected_Completion` column.

In [114]:
# verify you key is set
assert submission.huggingface_key() != "", "You need to set your huggingface key."
assert submission.gemini_api_key() != "", "You need to set your gemini api key."

In [None]:
n = 100 # number of completions to generate. You should change this to another number. Generally, more data is better. The highest you can set is 512.
df = pd.read_csv("imdb_kernels.csv")
df = df.iloc[:n]
submission.dataset_generator(df)

In [116]:
submission.fix_completions()

# Part 2: Train DPO
Now that you have the generated completions, you can train a DPO model.

Your task for Part 2 is as follows:

1. Fill out the `MyDPOConfig` class in the `submission.py` file. There are 2 parameters you can play with: `learning_rate`, and `beta`.

2. After filling out these values, run the cell below to train the DPO model.


In [90]:
config = submission.MyDPOConfig()
original_model, tokenizer = dpo.load_model(config)
trained_model = dpo.train_dpo(config)

Part 3: Evaluate DPO

Your deliverable is the trained DPO model. It should have been trained in the cell above and the model should be in a variable called `trained_model`.

You will be scored on how positive your generated completions are.

To get a sense of how good your model is, you can run the cell below. It will generate completions for the first 20 reviews in the dataset and print the completions.

We will be scoring you on a holdout set of 100 reviews, so the scores you see below may not be indicative of your final score. It is up to you to determine how thoroughly you want to evaluate your model.


In [None]:
# this cell pushes your trained model to the hub which will be graded.
trained_model.push_to_hub(submission.hub_model_name(), token=submission.huggingface_key())

In [109]:
# this cell generates completions using your DPO trained model and the based model from pretrained.zip
dpo_completions = []
original_model_completions = []
for k, row in df.iloc[:20].iterrows():  
    prompt = row['Input_Text']
    dpo_completions.append(dpo.run_inference(config, trained_model, prompt))
    original_model_completions.append(dpo.run_inference(config, original_model, prompt))


In [None]:
# this cell prints the completions
completions = {'DPO': dpo_completions[:2], 'Original Model': original_model_completions[:2]}
pd.set_option('display.max_colwidth', None)  # Allow full string display
pd.DataFrame(completions)


In [None]:
# this cell classifies the completions as positive or negative and gives the percentage of positive completions
classes = submission.batch_classification(dpo_completions)
print(f"DPO Positive Completion Rate: {sum([c == 'POSITIVE' for c in classes]) / len(classes)}")

classes = submission.batch_classification(original_model_completions)
print(f"Original Model Positive Completion Rate: {sum([c == 'POSITIVE' for c in classes]) / len(classes)}")


## Grading Rubric:

Your grade is based on how positive your DPO generated completions are.
- Positive completion rate >= 0.95: Score: 1.0
- Positive completion rate >= 0.85: Score: 0.9
- Positive completion rate >= 0.7: Score: 0.5
- Positive completion rate >= 0.6: Score: 0.4
- Positive completion rate >= 0.5: Score: 0.2
- Positive completion rate < 0.5: Score: 0.0

