# Run Fine-tune CoT on OpenAI using our `oai` module

This notebook contains code to (1) generate reasoning samples from teacher models (e.g., GPT-3 175B `text-davinci-002`), (2) fine-tune student models (e.g., GPT-3 0.3B `ada`) and (3) generate and evaluate samples from fine-tuned student models.

- To run from scratch, first download and save original benchmark data (see README).
- To use existing teacher-generated samples, first download and save original benchmark data and teacher completion data (see README). Then, replace the completion_key `zs_cot_test` with `zs_cot` in the code below.

### TODO: Set OpenAI Key

Create an account on OpenAI and retrieve your API key. Experiments will incurs fees on your OpenAI account.

In [1]:
import openai
openai.api_key = ""

### Imports and Parameters

In [4]:
from data.completion_dataset import CompletionMetadata, CompletionDataset
from oai.inference import infer_completion_data

In [63]:
#teacher_base_model = "text-davinci-002"  # GPT-3 (175B)
teacher_base_model = "gpt-3.5-turbo-instruct"
#base_model = "ada"                       # GPT-3 (0.3B)
#base_model = "babbage"                   # GPT-3 (1.3B)
base_model = "curie"                     # GPT-3 (6.7B)
#dataset_key = "multiwd"
dataset_key = "lrf"
# dataset_key = "multiwdsmall"

## Infer teacher completions using OpenAI (generate CompletionDataset)

In [25]:
# Note, completion_key identifies the method used to generate completions
# Note, prediction_template selects the prediction template from those pre-defined in
#       `oai.data.format.Formatter`.
completion_metadata = CompletionMetadata(base_model=teacher_base_model, completion_key="zs_cot",
                                         dataset_key=dataset_key, prediction_template="zs_cot")

In [5]:
from data.split import load_train_test_split 
train, test = load_train_test_split(dataset_key)

In [6]:
# Run Zero-shot-CoT step 1 (rationale generation)
# Note, sample_indices=None means we want to infer on all samples
completion_dataset = infer_completion_data(completion_metadata, zs_cot_step=1,
                                           sample_indices=train, augs=1, temperature=0,
                                           max_tokens=300)

Loaded 55184 samples from:
/home/azureuser/reasoning-teacher/saved/completion_data/B_gpt-3.5-turbo-instruct__C_zs_cot/D_lrf.json
All 3449 samples have been completed.


In [15]:
# Run Zero-shot-CoT step 2 (answer)
completion_dataset = infer_completion_data(completion_metadata, zs_cot_step=2,
                                           sample_indices=train, augs=1, temperature=0,
                                           max_tokens=20)

Loaded 4900 samples from:
/home/azureuser/reasoning-teacher/saved/completion_data/B_gpt-3.5-turbo-instruct__C_zs_cot/D_multiwd.json
Inferring completions for 520 remaining samples (total=4900)


Inferring completions via OpenAI: 100%|██████████| 520/520 [00:32<00:00, 15.78it/s]


## Load CompletionDataset and evaluate test set

In [6]:
from data.completion_dataset import CompletionIdentifier
from data.split import load_train_test_split 
from evaluation.evaluator import Evaluator
from evaluation.summary import summarize_evaluation 

In [28]:
completion_identifier = CompletionIdentifier(teacher_base_model, "zs_cot", dataset_key)
completion_dataset = CompletionDataset.load(completion_identifier)
# Note, completion_metadata can be used instead of completion_identifier such as below
# completion_dataset = CompletionDataset.load(completion_metadata)
train, test = load_train_test_split(dataset_key)

In [29]:
evaluator = Evaluator.for_completion_dataset(completion_dataset)
evaluation = evaluator.evaluate_completion_dataset(completion_dataset, train)

In [11]:
evaluation.head()

Unnamed: 0,sample_index,completion_index,correct,contains_answer,correct_format,complete
0,0,0,True,True,False,True
1,0,1,True,True,False,True
2,0,2,True,True,False,True
3,0,3,True,True,False,True
4,0,4,True,True,False,True


In [30]:
summarize_evaluation(evaluation)

{'accuracy': 0.565765306122449,
 'contains_answer': 0.565765306122449,
 'correct_format': 1.0,
 'complete': 0.9997831632653061}

## Create fine-tune `File` and `Finetune` using training set

In [7]:
from oai.finetune import init_finetune, generate_finetune_data_from_completion_dataset
from oai.utils.api_wrapper import fetch_model_ids

In [8]:
# Replace "zs_cot_test" with "zs_cot" to use our teacher-generated completions (see README for how to download).
completion_identifier = CompletionIdentifier(teacher_base_model, "zs_cot", dataset_key)
completion_dataset = CompletionDataset.load(completion_identifier)
train, test = load_train_test_split(dataset_key)

In [54]:
#finetune_key = "zs_cot_{}".format(dataset_key)
#train_key = "ft_cot"
#finetune_key = "zs_cot_auto_j_rate_6_{}".format(dataset_key)
#finetune_key = "zs_cot_auto_j_rate_5_{}".format(dataset_key)
#finetune_key = "zs_cot_8_shots_{}".format(dataset_key)
#finetune_key = "zs_cot_16_shots_{}".format(dataset_key)
finetune_key = "zs_cot_baseline_{}".format(dataset_key)

In [55]:
finetune_key

'zs_cot_baseline_lrf'

In [None]:
# Note, finetune_key is a unique identifier for the finetuning data and should contain the source dataset
generate_finetune_data_from_completion_dataset(completion_dataset=completion_dataset,
                                               prediction_template="ft_cot_token",
                                               finetune_key=finetune_key,
                                               sample_indices=train,
                                               only_correct=True,  # default
                                              )

In [58]:
# Inspect finetune data
import json
from paths import get_finetune_data_path
with open(get_finetune_data_path("openai", finetune_key)) as f:
    print(json.dumps(json.loads(f.readline()), indent=4))

{
    "prompt": "All my life i've been going through shit (only 17 years old) and when things started to get better i crashed. I can't get myself to get out of bed no matter how much i try, my family understands but do still not approve since my grades dropped from all A's to E-C. It has been like this for 1-2 years now and none of my friends understands how It's like, I can't really blame them either since I don't like talking about it and i've always been taught to be a man and keep this stuff to myself. They just see a lazy fuck who is too irresponsible to go too school, same with my teachers. Idk if typing here is going to help at all but if anyone has some tips/advice on how to get motivated again i would be super happy. Does the post shows risk of thwarted belongingness? You only need to answer yes/no. ###",
    "completion": " --> yes END"
}


In [56]:
train_key = "IRF_BASE"
#train_key = "MultiWD_BASE"

In [80]:
# Note, train_key identifies the method used to train the model, i.e., the method used to fine-tune the base model.
init_finetune(finetune_key, base_model, dataset_key, train_key)

Created OpenAI finetune `B_curie__D_lrf__T_IRF_BASE`: `ft-FbA1zT4l8KlXckX5BFhMy9XF`


'B_curie__D_lrf__T_IRF_BASE'

### Fetch fine-tuned `Model` id

You need to keep calling this function to check if your `Finetune` is finished. Fine-tuning typically take about 5 minutes to 1 hour.

In [97]:
fetch_model_ids()

Fetching model ids from 8 finetunes
----------------------------------------------------------------------------------------------------
model_key                                                                       status              
----------------------------------------------------------------------------------------------------
B_curie__D_multiwd__T_multiwd                                                   failed              
B_curie__D_lrf__T_ft_cot_baseline                                               failed              
B_ada__D_lrf__T_ft_cot_baseline2                                                failed              
B_curie__D_lrf__T_ft_cot_baseline2                                              failed              
B_ada__D_lrf__T_ft_cot_baseline3                                                failed              
B_ada__D_lrf__T_ft_cot_baseline4                                                failed              
B_curie__D_lrf__T_IRF_16_Shots                         

False

### Access OpenAI metadata

We use metadata files to map our identifiers (keys) to the identifier (ids) used by OpenAI objects.
These can be accessed manually, as follows.

In [98]:
from oai.utils.metadata import get_file_id, get_finetune_id, get_model_id, get_model_key

In [100]:
#base_model = "ada"
#base_model = "babbage"
base_model = "curie"
dataset_key = "lrf"
#dataset_key = "multiwd"
#train_key = "MultiWD"
#train_key = "IRF"
#train_key = "IRF_Auto_J_Rate_6"
#train_key = "IRF_Auto_J_Rate_5"
#train_key = "MultiWD_Auto_J_Rate_6"
#train_key = "IRF_8_Shots"
#train_key = "MultiWD_8_Shots"
#train_key = "IRF_16_Shots"
#train_key = "MultiWD_16_Shots"
#train_key = "MultiWD_BASE"
train_key = "IRF_BASE"
#finetune_key = "zs_cot_{}".format(dataset_key)
#finetune_key = "zs_cot_auto_j_rate_6_{}".format(dataset_key)
#finetune_key = "zs_cot_auto_j_rate_5_{}".format(dataset_key)
#finetune_key = "zs_cot_8_shots_{}".format(dataset_key)
#finetune_key = "zs_cot_16_shots_{}".format(dataset_key)
finetune_key = "zs_cot_baseline_{}".format(dataset_key)

In [101]:
# Note that `base_model`, `dataset_key`, `train_key` are joined together to form a `model_key` which
# identifies fine-tuned models. There is a one-to-one-to-one mapping between a model_key, Finetune object,
# and Model object.

model_key = get_model_key(base_model, dataset_key, train_key)

In [102]:
model_key

'B_curie__D_lrf__T_IRF_BASE'

In [87]:
# Note that our `finetune_key` identifies the fine-tuning "data", therefore is mapped to a File object
# rather than a Finetune object.
get_file_id(finetune_key)

'file-3u1xRGpDm8I8udvVoUzoBqEd'

In [104]:
get_finetune_id(model_key)

'ft-FbA1zT4l8KlXckX5BFhMy9XF'

In [103]:
get_model_id(model_key)  # fetched by `fetch_model_ids()`

'curie:ft-personal-2023-11-22-09-14-39'

## Infer student completions

We only infer test set samples for evaluation.

In [105]:
# Note, completion_key and train_key are both "ft_cot_test". Recall that completion_key refers to
# the method used to generate completions by the student model, and train_key refers to the method
# used to train the student model.
completion_metadata = CompletionMetadata(base_model=base_model, completion_key="zs_cot_final",
                                         dataset_key=dataset_key, finetune_key=finetune_key,
                                         prediction_template="ft_cot_token",
                                         train_key=train_key, epoch=None)
train, test = load_train_test_split(dataset_key)

In [106]:
# Note, `infer_completion_data` will find our new student model (that we fetched above) by using
#       `base_model`, `dataset_key`, and `train_key` which is specified in `completion_metadata`.
completion_dataset = infer_completion_data(completion_metadata, zs_cot_step=None,
                                           sample_indices=test, augs=1, temperature=0,
                                           max_tokens=1024)  # note, we use 1024 tokens for student inference

Initializing new CompletionDataset at:
/home/azureuser/reasoning-teacher/saved/completion_data/B_curie__C_zs_cot_final/D_lrf__T_IRF_BASE.json
Inferring completions for 1479 remaining samples (total=1479)


Inferring completions via OpenAI:   0%|          | 0/1479 [00:00<?, ?it/s]

Inferring completions via OpenAI: 100%|██████████| 1479/1479 [00:27<00:00, 53.17it/s]


## Evaluate student completions

In [107]:
completion_identifier = CompletionIdentifier(base_model, completion_key="zs_cot_final", dataset_key=dataset_key,
                                             train_key=train_key)
completion_dataset = CompletionDataset.load(completion_identifier)
train, test = load_train_test_split(dataset_key)

In [108]:
evaluator = Evaluator(dataset_key, "ft_cot_token")
evaluation = evaluator.evaluate_completion_dataset(completion_dataset, test)

In [109]:
summarize_evaluation(evaluation)

{'accuracy': 0.8174442190669371,
 'contains_answer': 0.8174442190669371,
 'correct_format': 1.0,
 'complete': 1.0}