# Codealong Notebook

Use this notebook as your "scratch pad" as you go through the course contents. Feel free to copy any example code and tweak it to get a better understanding of how it works!

Use the **+** button or `Insert` menu to add additional code cells as needed.

In [21]:
import openai
import os

openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = os.getenv("VOCAREUM_OPENAI_API_KEY")

## Step 0: Inspecting Non-Customized Results

Before we perform any prompt engineering, **let's ask the OpenAI model some questions and see how it answers**.

(If you encounter an `AuthenticationError` when running this code, make sure that you have added a valid API key to the cell above and executed it.)

In [22]:
ukraine_prompt = """
Question: "When did Russia invade Ukraine?"
Answer:
"""
initial_ukraine_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=ukraine_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()
print(initial_ukraine_answer)

Russia invaded Ukraine in February 2014. This invasion led to the annexation of Crimea and ongoing conflict in eastern Ukraine.


In [23]:
twitter_prompt = """
Question: "Who owns Twitter?"
Answer:
"""
initial_twitter_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=twitter_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()
print(initial_twitter_answer)

Twitter is a publicly traded company and its ownership is divided among its shareholders. The majority shareholder is CEO and co-founder Jack Dorsey, who owns about 2% of the company's stock. Other major shareholders include venture capital firm Spark Capital, mutual fund companies Vanguard and BlackRock, and other technology companies like Google and Microsoft. Ultimately, the ownership of Twitter is constantly changing as shares are bought and sold on the stock market.


## Step 1

### Scrap Data from Wikipedia

In [7]:
import requests

params = {
    "action": "query", 
    "prop": "extracts",
    "exlimit": 1,
    "titles": "2022",
    "explaintext": 1,
    "formatversion": 2,
    "format": "json"
}
response = requests.get("https://en.wikipedia.org/w/api.php", params=params)
for event in response.json()['query']['pages'][0]['extract'].split('\n'):
    print(event)

2022 (MMXXII) was a common year starting on Saturday of the Gregorian calendar, the 2022nd year of the Common Era (CE) and Anno Domini (AD) designations, the 22nd  year of the 3rd millennium and the 21st century, and the  3rd   year of the 2020s decade.  
The year began with another wave in the COVID-19 pandemic, with Omicron spreading rapidly and becoming the dominant variant of the SARS-CoV-2 virus worldwide. Tracking a decrease in cases and deaths, 2022 saw the removal of most COVID-19 restrictions and the reopening of international borders in the vast majority of countries, while the global rollout of COVID-19 vaccines continued. The global economic recovery from the pandemic continued, though many countries experienced an ongoing inflation surge; in response, many central banks raised their interest rates to landmark levels. The world population reached eight billion people in 2022. The year also witnessed numerous natural disasters, including two devastating Atlantic hurricanes (

### Loading the Data with `pandas`

In [15]:
import pandas as pd

df = pd.DataFrame()
df['text'] = response.json()['query']['pages'][0]['extract'].split('\n')
df = df[(df["text"].str.len() > 0) & (~df["text"].str.startswith("=="))]
df.tail(20)

Unnamed: 0,text
243,November 19 – The 2022 Malaysian general elect...
244,November 19–November 26 – The 2022 Central Ame...
245,November 20–December 18 – The 2022 FIFA World ...
246,November 20 – 2022 Nepalese general election: ...
247,November 21 – A 5.6 earthquake strikes near Ci...
248,"November 30 – OpenAI releases ChatGPT, an arti..."
252,December 2 – The G7 and Australia join the EU ...
253,December 5 – The National Ignition Facility ac...
254,December 7
255,The Congress of Peru removes President Pedro C...


In [18]:
from dateutil.parser import parse

prefix = ""
for (i, row) in df.iterrows():
    # If the row already has " - ", it already has the needed date prefix
    if " – " not in row["text"]:
        try:
            # If the row's text is a date, set it as the new prefix
            parse(row["text"])
            prefix = row["text"]
        except:
            # If the row's text isn't a date, add the prefix
            row["text"] = prefix + " – " + row["text"]
df = df[df["text"].str.contains(" – ")].reset_index(drop=True)
df.tail(20)

Unnamed: 0,text
178,"November 16 – NASA launches Artemis 1, the fir..."
179,November 19 – The 2022 Malaysian general elect...
180,November 19–November 26 – The 2022 Central Ame...
181,November 20–December 18 – The 2022 FIFA World ...
182,November 20 – 2022 Nepalese general election: ...
183,November 21 – A 5.6 earthquake strikes near Ci...
184,"November 30 – OpenAI releases ChatGPT, an arti..."
185,December 2 – The G7 and Australia join the EU ...
186,December 5 – The National Ignition Facility ac...
187,December 7 – The Congress of Peru removes Pres...


### Creating an Embeddings Index with `openai.Embedding`

In [30]:
response = openai.Embedding.create(
    input=df["text"].tolist(),
    model="text-embedding-ada-002",
)
print(type(response))
response.keys()

<class 'openai.openai_object.OpenAIObject'>


dict_keys(['object', 'data', 'model', 'usage'])

In [35]:
len(response['data'][0]['embedding'])

1536

In [40]:
embeddings = [ data['embedding'] for data in response['data']]
df["embeddings"] = embeddings
df

Unnamed: 0,text,embeddings
0,– 2022 (MMXXII) was a common year starting on...,"[4.6367837057914585e-05, -0.017940208315849304..."
1,– The year began with another wave in the COV...,"[-0.004722667392343283, -0.019994843751192093,..."
2,– 2022 was also dominated by wars and armed c...,"[-0.009606238454580307, -0.015301033854484558,..."
3,– The ongoing Russian invasion of Ukraine esc...,"[-0.014721134677529335, -0.007632689084857702,..."
4,January 1 – The Regional Comprehensive Econom...,"[-0.0005679309833794832, -0.02413112111389637,..."
...,...,...
193,December 24 – 2022 Fijian general election: Th...,"[-0.011724342592060566, -0.009384616278111935,..."
194,December 29 – Brazilian football legend Pelé d...,"[-0.007616756483912468, 0.004072672221809626, ..."
195,December 31 – Former Pope Benedict XVI dies at...,"[0.023532262071967125, 0.007705941330641508, -..."
196,December 7 – The world population was estimate...,"[-0.004041583277285099, -0.014363067224621773,..."


In [41]:
df.to_csv("embeddings.csv")

## Step 2

### Finding Relevant Data with Cosine Similarity

## Step 3

### Tokenizing with `tiktoken`

### Composing a Custom Text Prompt

## Step 4

### Getting a Custom Q&A Response with `openai.Completion`