# Kaggle LLM Science Exam Dataset Exploration and GPT Tests

- Manually get a feel for what the dataset is like and how GPT performs
- Experiment with different ways of using GPT (see the benchmark notebook for much more on this)

Mostly sourced from [Jeremy Howard's notebook](https://www.kaggle.com/code/jhoward/getting-started-with-llms) introducing the competition with GPT

In [1]:
# !pip install -Uq llm 'urllib3>2' fastkaggle

In [2]:
import time
import numpy as np
import pandas as pd
from fastcore.utils import *
from fastkaggle import *

In [3]:
import dotenv
dotenv.load_dotenv()

True

In [4]:
import os
import subprocess
from pathlib import Path
import zipfile

COMPETITION='kaggle-llm-science-exam'

# fix as needed
data_dir = Path(os.getcwd()) / 'data'
data_dir.mkdir(exist_ok=True)

file_path = data_dir / f'{COMPETITION}.zip'

if not os.path.exists(file_path):
    # download dataset
    subprocess.run(['kaggle', 'competitions', 'download', '-p', data_dir, '-c', COMPETITION], check=True)
    # subprocess.run(['unzip', 'kaggle-llm-science-exam.zip'], check=True)
    with zipfile.ZipFile(file_path, 'r') as zip_ref:
        zip_ref.extractall(data_dir)


In [5]:
trn = pd.read_csv(data_dir / 'train.csv')
trn

Unnamed: 0,id,prompt,A,B,C,D,E,answer
0,0,Which of the following statements accurately d...,MOND is a theory that reduces the observed mis...,MOND is a theory that increases the discrepanc...,MOND is a theory that explains the missing bar...,MOND is a theory that reduces the discrepancy ...,MOND is a theory that eliminates the observed ...,D
1,1,Which of the following is an accurate definiti...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,A
2,2,Which of the following statements accurately d...,The triskeles symbol was reconstructed as a fe...,The triskeles symbol is a representation of th...,The triskeles symbol is a representation of a ...,The triskeles symbol represents three interloc...,The triskeles symbol is a representation of th...,A
3,3,What is the significance of regularization in ...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,C
4,4,Which of the following statements accurately d...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,D
...,...,...,...,...,...,...,...,...
195,195,What is the relation between the three moment ...,The three moment theorem expresses the relatio...,The three moment theorem is used to calculate ...,The three moment theorem describes the relatio...,The three moment theorem is used to calculate ...,The three moment theorem is used to derive the...,C
196,196,"What is the throttling process, and why is it ...",The throttling process is a steady flow of a f...,The throttling process is a steady adiabatic f...,The throttling process is a steady adiabatic f...,The throttling process is a steady flow of a f...,The throttling process is a steady adiabatic f...,B
197,197,What happens to excess base metal as a solutio...,"The excess base metal will often solidify, bec...",The excess base metal will often crystallize-o...,"The excess base metal will often dissolve, bec...","The excess base metal will often liquefy, beco...","The excess base metal will often evaporate, be...",B
198,198,"What is the relationship between mass, force, ...",Mass is a property that determines the weight ...,Mass is an inertial property that determines a...,Mass is an inertial property that determines a...,Mass is an inertial property that determines a...,Mass is a property that determines the size of...,D


Have a go at answering the questions manually to get a feel for what the data is like

In [6]:
# n_questions = 200
# for index, row in trn.head(n_questions).iterrows():
#     print(f"Question: {row.prompt}")
#     print(f"Option A: {row.A}")
#     print(f"Option B: {row.B}")
#     print(f"Option C: {row.C}")
#     print(f"Option D: {row.D}")
#     print(f"Option E: {row.E}", flush=True)
#     user_answer = input("Answer [A/B/C/D/E]: ")
#     if user_answer == row.answer:
#         print("Correct!")
#     else:
#         print("Incorrect! The correct answer is: ", row.answer)
#     print(flush=True)

## Using the OpenAI API

Skip the usual EDA, and do some natural language data exploration instead

In [7]:
def prompt1(row):
    return f"""Question: {row.prompt}
A: {row.A}
B: {row.B}
C: {row.C}
D: {row.D}
E: {row.E}
Answer: """

In [8]:
r0 = trn.iloc[0]
print(prompt1(r0), r0.answer)

Question: Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters?
A: MOND is a theory that reduces the observed missing baryonic mass in galaxy clusters by postulating the existence of a new form of matter called "fuzzy dark matter."
B: MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20.
C: MOND is a theory that explains the missing baryonic mass in galaxy clusters that was previously considered dark matter by demonstrating that the mass is in the form of neutrinos and axions.
D: MOND is a theory that reduces the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 2.
E: MOND is a theory that eliminates the observed m

Go with GPT-3.5 Turbo initially

In [9]:
# latest model as of 26/11/2023
GPT_MODEL='gpt-3.5-turbo-1106'
cost_prompt_tokens = 0.001 / 1000
cost_completion_tokens = 0.002 / 1000

Using Simon Willison's cli and python `llm` lib which allows for the same interface to API and locally hosted open source models for this test.

But it doesn't know about the latest GPT-3.5

We can either register it programatically, or add to a yaml with extra openai models. I've done the latter so it's also available via the cli but either works

See [Adding more OpenAI models](https://llm.datasette.io/en/stable/other-models.html#adding-more-openai-models)

```yaml
- model_name: gpt-3.5-turbo-1106
  model_id: gpt-3.5-turbo-1106
  aliases: ["1106"]
```

In [11]:
import llm
# Current version doesn't know about the latest GPT-3.5 so add to aliases
model = llm.get_model(GPT_MODEL)

In [13]:
response = model.prompt(prompt1(r0))
response.text()

'B: MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20.'