## Setup

You will need to install the following packages:

- openai
- pandas
- requests

You will also need:

- OpenAI account (https://platform.openai.com/)
- OpenAI API key

In [None]:
import json

from openai import OpenAI
import pandas as pd

from dotenv import load_dotenv

In [None]:
# Load Environment Variables
load_dotenv()

In [None]:
# put your OpenAI API key in .env
client = OpenAI()

## Problem Definition: Insurance Support Ticket Classifier

*Note: The problem definition, data, and labels used in this example were synthetically generated using an LLM.*

In the insurance industry, customer support plays a crucial role in ensuring client satisfaction and retention. Insurance companies receive a high volume of support tickets daily, covering a wide range of topics such as billing, policy administration, claims assistance, and more. Manually categorizing these tickets can be time-consuming and inefficient, leading to longer response times and potentially impacting customer experience.

#### Labeled Data

The data can be found in the week-2 `data` folder.

We will use the following datasets:
- `./data/train.tsv`
- `./data/test.tsv`

In [None]:
training_examples = pd.read_csv('../data/train.tsv', sep='\t')
test_examples = pd.read_csv('../data/test.tsv', sep='\t')

# In order to not leak information about the test labels into our prompts, the list of possible categories will be defined 
# based on the training labels.
categories = sorted(training_examples['label'].unique().tolist())
categories_str = '\n'.join(categories)

training_tickets = training_examples['text'].tolist()
training_labels = training_examples['label'].tolist()

test_tickets = test_examples['text'].tolist()
test_labels = test_examples['label'].tolist()

In [None]:
training_examples

### Dataset Curation

We first must transform our dataset into the format expected by OpenAI, and then upload the dataset. The dataset must conform to the schema expected by the Chat Completions API.

See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details

In [None]:
def create_prompt(ticket):
    return f"""classify a customer support ticket into one of the following categories:
                <categories>
                {categories_str}
                </categories>

                Here is the customer support ticket:    
                <ticket>{ticket}</ticket>

                Respond using this format:
                <category>The category label you chose goes here</category>
            """    

In [None]:
# Converts the training examples to the format expected by OpenaI.
def training_examples_to_json(examples):
    json_objs = list()
    for _, example in examples.iterrows():  
        user_msg = create_prompt(example['text'])
        asst_msg = f"<category>{example['label']}</category>"
        msg = {"messages": [
            {"role": "user", "content": user_msg}, 
            {"role": "assistant", "content": asst_msg}
        ]}
        json_objs.append(msg)
    
    return json_objs
training_json = training_examples_to_json(training_examples)

In [None]:
# Writes the data to a file and then uploads it to OpenAI
dataset_file_name = 'ticket-classification_training_data.jsonl'

with open(dataset_file_name, 'w') as f:
    for obj in training_json:
        json.dump(obj, f)
        f.write('\n')

client.files.create(
  file=open(dataset_file_name, "rb"),
  purpose="fine-tune"
)

### Fine-Tuning

We will now fine-tune models using the OpenAI API. OpenAI supports creating fine-tuning jobs both via the fine-tuning UI or programmatically. The number of epochs, learning rate, and batch size can all be optimized manually for your use case. In this exercise, we will use the default parameters.

See https://platform.openai.com/docs/guides/fine-tuning/create-a-fine-tuned-model for more details

## Uncomment the below cell if you want to fine-tune LLM and be mindful of the cost, dont use large training datasets

In [None]:
# # Creates a training job with the default hyperparameters
# client.fine_tuning.jobs.create(
#   training_file='file-xxxxxxxxxxxxxxxx', # the file ID that was returned when the training file was uploaded to the OpenAI API.
#   model='gpt-4o-mini-2024-07-18',
#   method = {
#     "type": "supervised",
#     "supervised": {
#       "hyperparameters": {
#         "batch_size": "auto", # to be fine tuned
#         "learning_rate_multiplier": "auto", # to be fine tuned
#         "n_epochs": 5, # to be fine tuned
#       }
#     }
#   }
# )

In [None]:
# List all the fine-tuning jobs
client.fine_tuning.jobs.list()

### Evluate Results

We will now deploy our models and evaluate the results. We will calculate the accuracy on two different models.

- The base model gpt-4o-mini model without any fine-tuning.
- Our fine-tuned model.

See https://platform.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model for more details

In [None]:
# Uses an LLM to predicted class labels for a list of support tickets
def classify_tickets(tickets, model):
    responses = list()

    for ticket in tickets:
        user_prompt = create_prompt(ticket)
    
        response = client.chat.completions.create(
            model=model,
            messages=[{ "role": "user", "content": user_prompt}],
            temperature=0, # setting temperature to 0 for this use case, so that responses are as deterministic as possible
            stop=["</category>"],
            max_tokens=2048,
        )

        response = response.choices[0].message.content.split("<category>")[-1].strip()
        responses.append(response)
        print(response)

    return responses


# Calculates the percent of predictions we classified correctly
def evaluate_accuracy(predicted, actual):
    num_correct = sum([predicted[i] == actual[i] for i in range(len(actual))])
    return round(100 * num_correct / len(actual), 2)

In [None]:
# Determine how the base model without any fine-tuning performs
model_id = 'gpt-4o-mini'

test_responses = classify_tickets(
    tickets=test_tickets[:20], 
    model=model_id
)

accuracy = evaluate_accuracy(test_responses, test_labels[:20])
print('-----------------')
print(f"Test Set Accuracy: {accuracy}%")

In [None]:
# Determine how the base model performs with the increases rank, epochs, and learning rate
model_id = 'ft:gpt-4o-mini-2024-07-18:xxxxxxxxxx' # REPLACE THIS WITH THE OUTPUT MODEL ID IN THE OPENAI FINE-TUNING DASHBOARD

test_responses = classify_tickets(
    tickets=test_tickets[:20], 
    model=model_id
)

accuracy = evaluate_accuracy(test_responses, test_labels[:20])
print('-----------------')
print(f"Test Set Accuracy: {accuracy}%")