# Set up
Run the cell below to install all of the necessary packages


In [None]:
# Run this cell to install the necessary packages (may take a few minutes)
!rm -rf inverse-scaling-eval-pipeline
!git clone -b main --single-branch https://github.com/naimenz/inverse-scaling-eval-pipeline.git
!pip install git+https://github.com/naimenz/inverse-scaling-eval-pipeline.git@main &> /dev/null

# somehow these commands fix matplotlib to draw in the notebook
# https://stackoverflow.com/questions/64862818/cannot-import-name-png-from-matplotlib
%matplotlib inline
!python -m pip uninstall matplotlib -y
!pip install matplotlib==3.1.3 &> /dev/null

# Running
You'll need to provide an [OpenAI API key](https://openai.com/blog/api-no-waitlist/) in the cell below (replace `sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` with your key).


IMPORTANT: Don't put quotes around your key. If you get your key wrong, you will need to go to `Runtime > Restart runtime` and run all your cells again.

In [None]:
%env OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

# A GPT-3 Run
Choose from the evaluation metrics `classification`, `sequence_prob`, `logodds`, and `absolute_logodds`

To upload a file to the colab, click `Files` (the folder icon) on the sidebar on the left.
Then click `Upload to session storage` (the file with an arrow icon) and choose your `.csv` from your computer.

Descriptions of the evaluation metrics can be found in [this section of the README](https://github.com/inverse-scaling/prize).

You can try running on the InstructGPT models to see if your task's inverse scaling is robust to RLHF. These models are called `text-ada-001`, `text-babbage-001`, `text-curie-001`, and `text-davinci-001`.

NOTE: For most metrics, an inverse scaling trend looks like a line that goes up (i.e. increasing loss with model size). For accuracy, an inverse scaling trend looks like a line that goes down (i.e. decreasing accuracy with model size).

In [None]:
import sys
sys.path.append('inverse-scaling-eval-pipeline/')

#@title Running GPT-3 and plotting the results { display-mode: "form" }
evaluation_metric = "classification" #@param ["classification", "sequence_prob", "logodds", "absolute_logodds", "classification_acc"]
file_name = "csvs/final_first_round.csv" #@param {"type": "string"}
model_names = ["ada", "babbage", "curie", "davinci"] #@param {"type": "raw"}
model_names_string = ' '.join(model_names)

#@markdown Once you've specified an evaluation metric, file name, and models, run this cell.
%run inverse-scaling-eval-pipeline/eval_pipeline/main.py \
  --dataset-path "$file_name" \
  --exp-dir results \
  --models $model_names_string \
  --task-type $evaluation_metric \
  --batch-size 100

# we have to use %run because executing with !python does not load the python code in the colab shell
%run inverse-scaling-eval-pipeline/eval_pipeline/plot_loss.py \
  results \
  --task-type $evaluation_metric