#Introduction

The goal of this exercise is to identify the existence of a few dimensions in letters written by CEOs to shareholders using OpenAI’s language model. The language model is accessed using OpenAI’s API. The model is first fine-tuned using two of the three training datasets provided with the leftover dataset used as a validation set. We check the predicted dimensions for the validation dataset and compare it with the true values. Similarly, we also predict the dimensions for in-sample data and compare with their true values. The accuracy of the prediction is calculated.




In [26]:
import pandas as pd
import requests
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import string
import os
from sklearn.feature_extraction.text import TfidfVectorizer
from google.colab import drive
from openai import OpenAI
import json
import time

In [27]:
# Mount Google Drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Methodology

## Load and Preprocess the Data:
We are using the provided train2 and train3 datasets as the training set and the train1 dataset as the validation set. In this step, we concatenated the train2 and train3 datasets to create a single training set.

In [28]:
def load_and_preprocess_train_data(train_files):
  # Load data from Excel files
  df_train = pd.concat([pd.read_excel(file) for file in train_files])

  # Handling Missing Values
  # Replace missing values with a placeholder or drop rows with missing values
  df_train.dropna(subset=['paragraph'], inplace=True)

  """# Download stopwords corpus (if not already downloaded)
  nltk.download('stopwords')
  nltk.download('punkt')"""

  # Preprocess the text data
  # Lower-Case words
  df_train['processed_paragraph'] = df_train['paragraph'].apply(lambda text: text.lower())

  return df_train

In [29]:
folder_path = r'/content/drive/MyDrive/GRA_KU_Assessment/Test accuracy/training data'
# Get a list of all files in the folder
files_in_folder = os.listdir(folder_path)
# Filter Excel files with specific names (train1, train2, train3, etc.)
train_files = [os.path.join(folder_path, file) for file in files_in_folder if file.startswith('train') and file.endswith('.xlsx')]
df_train = load_and_preprocess_train_data(train_files)

## Prepare Data for Fine-tuning
In this step, we convert our data into the format required for the OpenAI API.

We convert the data from the table format (.xlsx format) provided to a Chat Completions API format that is accepted by OpenAI’s API. This format is a list of messages where each message has a role and content.

In our training set, for each message in the requirem format will have three components for each datapoint (sentence) with each playing a different role. The first part of the message plays the role of *system*. This is where we give precise instructions to the model as to what we expect it to do. The next part of the message plays the role of *user* which contains portions of the letters from CEOs. This is supposed to be the input for fine-tuning process. The last part of the message plays the role of *assistant*, which is the result from which we want it to fine-tune. This is the training part.

In [30]:
def prepare_data_for_fine_tuning(df_train):
  # Convert the 'paragraph' column to a list
  paragraphs = df_train['processed_paragraph'].tolist()

  # Create a list to store the formatted data
  train_data = []

  # Iterate through each row in the DataFrame and create prompt-completion pairs
  for index, row in df_train.iterrows():
      prompt = row['processed_paragraph']
      # Convert 'Yes' and 'No' to 1 and 0, respectively
      completion = ','.join(['1' if row[col] == 'Yes' else '0' for col in df_train.columns[1:]])
      train_data.append({"prompt": prompt, "completion": completion})

  # Display a few examples to verify the format
  for example in train_data[:1]:
      print(example)

  return train_data

In [31]:
def convert_to_chat_completion(prompt_completion_data):
    chat_completion_data = []

    for entry in prompt_completion_data:
        prompt = entry['prompt']
        completion = entry['completion']

        # Extracting the completion details and converting them into the desired format
        completion_details = [f"{key}: {'Yes' if value == '1' else 'No'}" for key, value in zip(['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance'], completion.split(','))]

        # Joining the completion details into a single string
        completion_text = ', '.join(completion_details)

        # Creating the chat-completion format
        conversation = {
            "messages": [
                {"role": "system", "content": "Use the folowing step-by-step instructon to respond to the user inputs. Step 1 - In the user content which is taken from letters written by CEO to shareholders, you have to identify the existence of dimensions/qualities that are provided in this list given in brackets and that are seperated by commas ['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance']. Step 2 - For each of these dimensions, if the dimension exists in the user prompt based on the assistant content I provide to you in the fine-tuning data, answer Yes, otherwise answer No. After step2, this is an example output whose template you must use to provide your answer - ['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: Yes, Tactics: No, Relevance: No']"},
                {"role": "user", "content": prompt},
                {"role": "assistant", "content": completion_text}
            ]
        }

        chat_completion_data.append(conversation)

    return chat_completion_data


In [32]:
train_data = prepare_data_for_fine_tuning(df_train)
train_data = convert_to_chat_completion(train_data)

{'prompt': 'to our shareowners:\nrandom forests, naïve bayesian estimators, restful services, gossip protocols, eventual consistency, data\nsharding, anti-entropy, byzantine quorum, erasure coding, vector clocks … walk into certain amazon meetings,\nand you may momentarily think you’ve stumbled into a computer science lecture', 'completion': '0,0,1,0,0,0,0,0,1,0'}


In [33]:
print(train_data[0])

{'messages': [{'role': 'system', 'content': "Use the folowing step-by-step instructon to respond to the user inputs. Step 1 - In the user content which is taken from letters written by CEO to\xa0shareholders, you have to identify the existence of dimensions/qualities that are provided in this list given in brackets and that are seperated by commas ['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance']. Step 2 - For each of these dimensions, if the dimension exists in the user prompt based on the assistant content I provide to you in the fine-tuning data, answer Yes, otherwise answer No. After step2, this is an example output whose template you must use to provide your answer - ['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: Yes, Tactics: No, Relevance: No']"}, {'role': 'user', 'content': 'to our shareowners:\nrandom forests, naïve bayesian estimators, restful services, gossip protocols, eventual consistency, data\nsh

## Fine-tune the model
We invoke an OpenAI model, feed the training data and finetune it. The base model used for fine-tuning is gpt-3.5-turbo.

In [34]:
def fine_tune_model(train_data, api_key):
  # Assuming train_data contains your prompt-completion pairs
  # Save the train_data in JSON Lines format
  with open("/content/drive/MyDrive/GRA_KU_Assessment/Test accuracy/mydata.jsonl", "w") as file:
      for example in train_data:
          file.write(json.dumps(example) + "\n")

  # Initialize the OpenAI client
  client = OpenAI(api_key= api_key)

  # Upload the JSON Lines file for fine-tuning
  try:
    resp1 = client.files.create(
        file=open("/content/drive/MyDrive/GRA_KU_Assessment/Test accuracy/mydata.jsonl", "rb"),
        purpose="fine-tune"
    )
    print("File uploaded successfully.")
  except Exception as e:
    print("File upload failed:", e)
    return None, None

  # Create the fine-tuning job
  try:
    resp2 = client.fine_tuning.jobs.create(
    training_file=resp1.id,
    model="gpt-3.5-turbo"
    )
    print("Fine-tuning job created successfully.")
  except Exception as e:
    print("Fine-tuning job creation failed:", e)
    return None, None

  # Check the status of the fine-tuning job
  while True:
    resp3 = client.fine_tuning.jobs.retrieve(resp2.id)
    status = resp3.status
    print("Fine-tuning job status:", status)
    if status == "succeeded":
      print("Fine-tuning job completed successfully.")
      break
    elif status == "failed":
      print("Fine-tuning job failed:", resp3.error)
      break
    elif status == "cancelled":
      print("Fine-tuning job cancelled by user.")
      break
    else:
      print("Fine-tuning job in progress. Please wait...")
      time.sleep(60)

  return resp2, client


In [35]:
api_key = 'your-api-key'
response, client = fine_tune_model(train_data, api_key)

File uploaded successfully.
Fine-tuning job created successfully.
Fine-tuning job status: validating_files
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning job status: running
Fine-tuning job in progress. Please wait...
Fine-tuning j

In [36]:
print(response)

FineTuningJob(id='ftjob-yI2sD8wHD0vKTdB0rBuOSAUa', created_at=1700583838, error=None, fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0613', object='fine_tuning.job', organization_id='org-BNdO85mmDzyE4YWAlVvz9Z2A', result_files=[], status='validating_files', trained_tokens=None, training_file='file-mStBwGKPsCyJFcEk0FAXuQ4O', validation_file=None)


## Load and preprocess the validation data
We load the validation data and do the necessary preprocessing. This is the train1.xslx data.

In [37]:
def load_and_preprocess_test_data(test_file):
  # Load data from the test file
  df_test = pd.read_excel(test_file)

  # Preprocess the text data similar to the training data
  df_test['processed_paragraph'] = df_test['paragraph'].apply(lambda text: text.lower())

  return df_test

In [38]:
test_file = r"/content/drive/MyDrive/GRA_KU_Assessment/Test accuracy/test data/test_labels.xlsx"
df_test = load_and_preprocess_test_data(test_file)

## Make Predictions
We now use the fine-tuned model and feed in the validation set provided. We will use the results from the validation set to assess the model’s performance.



In [72]:
def make_predictions(df_test, fine_tuned_model_id, client):
  # Get the sentences to test from the dataframe
  sentences_to_test = df_test['processed_paragraph'].tolist()
  # Initialize an empty list to store the responses
  responses = []
  # Set the batch size for chat completions
  batch_size = 10
  # Loop through the sentences in batches
  for i in range(0, len(sentences_to_test), batch_size):
    # Get the current batch of sentences
    batch = sentences_to_test[i:i+batch_size]
    # Loop through the sentences in the batch
    for ind, sentence in enumerate(batch):
      # Create the system and user messages for each sentence
      messages = [
        {"role": "system", "content": "Use the folowing step-by-step instructon to respond to the user inputs. Step 1 - In the user content which is taken from letters written by CEO to shareholders, you have to identify the existence of dimensions/qualities that are provided in this list given in brackets and that are seperated by commas ['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance']. Step 2 - For each of these dimensions, if the dimension exists in the user prompt based on the assistant content I provide to you in the fine-tuning data, answer Yes, otherwise answer No. After step2, this is an example output whose template you must use to provide your answer - ['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: Yes, Tactics: No, Relevance: No']"},
        {"role": "user", "content": sentence}
      ]
      # Try to make a chat completion for the sentence
      try:
        response = client.chat.completions.create(
        model=fine_tuned_model_id,
        messages=messages,
        seed=99,
        temperature=0
        )
        print("Chat completion succeeded for sentence", ind, " and batch ", i)
      # Handle any errors or exceptions
      except Exception as e:
        print("Chat completion failed for sentence", ind, " and batch ", i, ":", e)
        return None
      # Append the assistant message content to the responses list
      responses.append(response.choices[0].message.content)

  # Return the responses list
  return responses


In [73]:
fine_tuned_model_id = "gpt-3.5-turbo"
predictions = make_predictions(df_test, fine_tuned_model_id, client)

Chat completion succeeded for sentence 0  and batch  0
Chat completion succeeded for sentence 1  and batch  0
Chat completion succeeded for sentence 2  and batch  0
Chat completion succeeded for sentence 3  and batch  0
Chat completion succeeded for sentence 4  and batch  0
Chat completion succeeded for sentence 5  and batch  0
Chat completion succeeded for sentence 6  and batch  0
Chat completion succeeded for sentence 7  and batch  0
Chat completion succeeded for sentence 8  and batch  0
Chat completion succeeded for sentence 9  and batch  0
Chat completion succeeded for sentence 0  and batch  10
Chat completion succeeded for sentence 1  and batch  10
Chat completion succeeded for sentence 2  and batch  10
Chat completion succeeded for sentence 3  and batch  10
Chat completion succeeded for sentence 4  and batch  10
Chat completion succeeded for sentence 5  and batch  10
Chat completion succeeded for sentence 6  and batch  10
Chat completion succeeded for sentence 7  and batch  10
Ch

In [74]:
print(predictions)

["['Goal: No, Activity: No, Strategy: No, Plan: No, Structure: No, Innovation: No, Tactics: No, Relevance: No']", "['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: Yes, Tactics: No, Relevance: No']", "['Goal: No, Activity: No, Strategy: No, Plan: No, Structure: No, Innovation: No, Tactics: No, Relevance: No']", "['Goal: No, Activity: No, Strategy: No, Plan: No, Structure: No, Innovation: No, Tactics: No, Relevance: No']", "['Goal: No, Activity: Yes, Strategy: Yes, Plan: No, Structure: No, Innovation: Yes, Tactics: No, Relevance: No']", "['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: Yes, Tactics: No, Relevance: No']", "['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: No, Innovation: Yes, Tactics: No, Relevance: No']", "['Goal: No, Activity: Yes, Strategy: Yes, Plan: Yes, Structure: Yes, Innovation: No, Tactics: No, Relevance: No']", "['Goal: Yes, Activity: No, Strategy: No, Plan: No, Structure: No, Innovat

## Evaluate performance of Fine-tuned Model:
We assess the performance of the fine-tuned model by comparing the results from the predictions step to the actual values provided in the validation set (out-of-sample) as well as the training set (in-sample).

In [75]:
import pandas as pd
from sklearn.metrics import accuracy_score as acc_score, f1_score as f1, classification_report as class_rep



def evaluate_classification(df_test, predictions):
    output_list = [item[1:-1].split(', ') for item in predictions]
    processed_data = []

    for row in output_list:
      if row == ['No dimensions identified in the user content.']:
        processed_data.append([0] * 8)  # Appending zeros for all attributes as it's an exception
      else:
        processed_row = [1 if item.split(': ')[1].strip("'") == 'Yes' else 0 for item in row]
        processed_data.append(processed_row)

    df_processed = pd.DataFrame(processed_data, columns=['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance'])

    # Reset the index of df_test and df_processed before concatenation
    df_test.reset_index(drop=True, inplace=True)
    df_processed.reset_index(drop=True, inplace=True)

    final_df = pd.concat([df_test['paragraph'], df_processed, df_test['processed_paragraph']], axis=1)

    new_df_test = df_test.copy()
    cols_to_convert = new_df_test.columns[~new_df_test.columns.isin(['processed_paragraph', 'paragraph'])]
    new_df_test[cols_to_convert] = new_df_test[cols_to_convert].replace({'Yes': 1, 'No': 0})

    columns = ['Goal', 'Activity', 'Strategy', 'Plan', 'Structure', 'Innovation', 'Tactics', 'Relevance']
    # Renamed variables to avoid conflict
    accuracy_scores = {}
    f1_scores = {}
    classification_reports = {}

    for col in columns:
        y_pred = final_df[col]
        y_true = new_df_test[col]

        # Use the renamed function references
        accuracy = acc_score(y_true, y_pred)
        f1_value = f1(y_true, y_pred)

        accuracy_scores[col] = accuracy
        f1_scores[col] = f1_value

        report = class_rep(y_true, y_pred)
        classification_reports[col] = report

    return accuracy_scores, f1_scores, classification_reports


## Results


### Validation data
The accuracy scores, F1 scores and the classification reports for the validation dataset (out-sample dataset), are presented below.

In [76]:
accuracy_score, f1_score, classification_report = evaluate_classification(df_test, predictions)

In [77]:
# Print or use the collected metrics as needed
print("Accuracy Scores:", accuracy_score)
print("F1 Scores:", f1_score)
print("Classification Reports:", classification_report)

Accuracy Scores: {'Goal': 0.7692307692307693, 'Activity': 0.717948717948718, 'Strategy': 0.41025641025641024, 'Plan': 0.5384615384615384, 'Structure': 0.7948717948717948, 'Innovation': 0.7692307692307693, 'Tactics': 0.7435897435897436, 'Relevance': 0.28205128205128205}
F1 Scores: {'Goal': 0.1818181818181818, 'Activity': 0.7659574468085107, 'Strategy': 0.25806451612903225, 'Plan': 0.24999999999999997, 'Structure': 0.5, 'Innovation': 0.6666666666666665, 'Tactics': 0.0, 'Relevance': 0.125}
Classification Reports: {'Goal': '              precision    recall  f1-score   support\n\n           0       0.85      0.88      0.87        33\n           1       0.20      0.17      0.18         6\n\n    accuracy                           0.77        39\n   macro avg       0.53      0.52      0.52        39\nweighted avg       0.75      0.77      0.76        39\n', 'Activity': '              precision    recall  f1-score   support\n\n           0       0.71      0.59      0.65        17\n           1