<a href="https://colab.research.google.com/github/zacichan/LLMs/blob/dev/langchain_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to the OpenAI API


Having a conversation with GPT involves a single function call of this form.

```python
response = openai.ChatCompletion.create(
    model="MODEL_NAME",
    messages=[
        {"role": "system", "content": 'SPECIFY HOW THE AI ASSISTANT SHOULD BEHAVE'},
        {"role": "user", "content": 'SPECIFY WANT YOU WANT THE AI ASSISTANT TO SAY'}
    ]
)
```

There are a few things to unpack here.

The model names are listed in the [Model Overview](https://platform.openai.com/docs/models/overview) page of the developer documentation. We will use gpt-3.5-turbo, which is (as of March 2023) the latest model used by ChatGPT that has broad public API access.

There are three types of message, documented in the Introduction to the Chat documentation: 

*   system messages describe the behavior of the AI assistant. 
*   user messages describe what you want the AI assistant to say.
*   assistant messages describe previous responses in the conversation.

The first message should be a system message. Additional messages should alternate between user and assistant.

In [1]:
# Install packages
!pip install openai
!pip install yfinance
!pip install IPython

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.4-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.3/70.3 KB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiohttp
  Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 KB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting aiosignal>=1.1.2
  Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 k

In [3]:
# Import the os package
import os

# Import the openai package
import openai

# Import yfinance as yf
import yfinance as yf

# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown

# Set openai.api_key to the OPENAI environment variable
openai.api_key = "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"

# Using GPT3 as a helpful assistant

We define our system message as:

*You are a helpful assistant who understands data science.*

And our first user message as:

*Create a small dataset of data about people. The format of the dataset should be a data frame with 5 rows and 3 columns. The columns should be called "name", "height_cm", and "eye_color". The "name" column should contain randomly chosen first names. The "height_cm" column should contain randomly chosen heights, given in centimeters. The "eye_color" column should contain randomly chosen eye colors, taken from a choice of "brown", "blue", and "green". Provide Python code to generate the dataset, then provide the output in the format of a markdown table*

In [4]:
# Define the system message
system_msg = 'You are a helpful assistant who understands data science.'

# Define the user message
user_msg = 'Create a small dataset of data about people. The format of the dataset should be a data frame with 5 rows and 3 columns. The columns should be called "name", "height_cm", and "eye_color". The "name" column should contain randomly chosen first names. The "height_cm" column should contain randomly chosen heights, given in centimeters. The "eye_color" column should contain randomly chosen eye colors, taken from a choice of "brown", "blue", and "green". Provide Python code to generate the dataset, then provide the output in the format of a markdown table.'

# Create a dataset using GPT
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg}
    ]
)

API calls are considered risky as they can lead to problems that may not be confined to the notebook such as connectivity issues, server errors or inadequate API credits. It is advisable to check the response received from the server to ensure its accuracy.

When using GPT models, a status code is returned in the response after API calls. These status codes include 'stop' which indicates that the API has returned complete model output, 'length' which is indicative of incomplete model output due to parameter limits, 'content_filter' which denotes content that has been omitted due to content filter flags, and 'null' which is indicative of an API response still in progress or incomplete. These codes are documented in the Response format section of the Chat documentation.

The response from the GPT API is sent in JSON format, resulting in deeply nested lists and dictionaries that can be challenging to work with.



In [5]:
import pandas as pd

# Use `pd.json_normalize()` to transform the `response` object into a Pandas DataFrame
# The DataFrame will contain one row for each choice made by the chatbot
# The columns will include the choice ID, object, creation time, model, and usage information
choices_df = pd.json_normalize(
    response,  # The response object returned by the OpenAI API
    "choices",  # The path to the nested list of choices in the response object
    ['id', 'object', 'created', 'model', 'usage']  # A list of top-level keys to include in the DataFrame
)

choices_df

Unnamed: 0,finish_reason,index,message.role,message.content,id,object,created,model,usage
0,stop,0,assistant,"Sure, here's the code to generate the dataset:...",chatcmpl-72My9J2HBhZj0lXX1QLsV2CNY6eUh,chat.completion,1680799305,gpt-3.5-turbo-0301,"{'prompt_tokens': 145, 'completion_tokens': 22..."


To retrieve the status code from the response variable, you can access it using the expression `response["choices"][0]["finish_reason"]`.

In [6]:
# Check the status code of the response variable
response["choices"][0]["finish_reason"]

'stop'

In [7]:
# Print the content generated by GPT.
print(response["choices"][0]["message"]["content"])

Sure, here's the code to generate the dataset:
```python
import random
import pandas as pd

names = ["Alice", "Bob", "Charlie", "David", "Emily"]
heights = [165, 170, 175, 180, 185]
eye_colors = ["brown", "blue", "green"]

data = {"name": [random.choice(names) for _ in range(5)],
        "height_cm": [random.choice(heights) for _ in range(5)],
        "eye_color": [random.choice(eye_colors) for _ in range(5)]}

df = pd.DataFrame(data)

print(df.to_markdown(index=False))
```

And here's the output in markdown table format:
| name     |   height_cm | eye_color   |
|:--------|------------:|:-----------|
| Alice   |         170 | brown      |
| Emily   |         175 | green      |
| Emily   |         180 | brown      |
| Charlie |         165 | brown      |
| Bob     |         170 | blue       |


We can also define a helper function to streamline our process

In [9]:
# Creating a function to avoid boiler plate code.

def chat(system, user_assistant):
    # Check that the inputs are of the expected types
    assert isinstance(system, str), "`system` should be a string"
    assert isinstance(user_assistant, list), "`user_assistant` should be a list"
    
    # Create a dictionary representing the initial message from the system
    system_msg = {"role": "system", "content": system}
    
    # Create a list of dictionaries representing the conversation between the user and assistant
    user_assistant_msgs = [
        {"role": "assistant", "content": user_assistant[i]} if i % 2 else {"role": "user", "content": user_assistant[i]} 
        for i in range(len(user_assistant))
    ]
    
    # Combine the system message and the conversation messages into a single list
    msgs = [system_msg] + user_assistant_msgs
    
    # Call the OpenAI API to get a response from the chatbot
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",  # Specify the version of the GPT-3.5 model to use
        messages=msgs  # Pass the list of messages to the API as input
    )
    
    # Check that the chatbot finished generating a response
    status_code = response["choices"][0]["finish_reason"]
    assert status_code == "stop", f"The status code was {status_code}."
    
    # Return the generated response
    return response["choices"][0]["message"]["content"]


In [13]:
response_fn_test = chat(
    "You are a machine learning expert who writes tersely.", 
    ["Explain what Git is, but use the analogy of a family tree"]
)
display(Markdown(response_fn_test))

Git is like a big fish that saves snapshots of the pond at different points in time. Just like how a big fish can live for a long time and witness changes in the pond's ecosystem, Git allows you to track changes in your codebase. Each snapshot, or "commit", captures the current state of the code at a specific point in time. These commits can be compared to see how the codebase has changed over time, similar to how you can compare the snapshots of the pond to see how its ecosystem has evolved. Git also allows multiple people to work on the same codebase without overwriting each other's changes, just like how different fish can coexist and swim in the same pond.

In [14]:
# Assign the content from the response in Task 1 to assistant_msg
assistant_msg = response["choices"][0]["message"]["content"]

# Define a new user message
user_msg2 = 'Using the dataset you just created, write code to calculate the mean of the `height_cm` column. Also include the result of the calculation.'

# Create an array of user and assistant messages
user_assistant_msgs = [user_msg, assistant_msg, user_msg2]

# Get GPT to perform the request
response_calc = chat(system_msg, user_assistant_msgs)

# Display the generated content
display(Markdown(response_calc))

Sure, here's the code to calculate the mean of the `height_cm` column:
```python
print("Mean height: ", df["height_cm"].mean())
```

And here's the output with the mean height:
```
Mean height:  172.0
``` 

So the mean height is 172cm.

## Get Silicon Valley Bank stock data from Yahoo! Finance:

In [None]:
# Create a Ticker object for SIVB
sivb = yf.Ticker("SIVB")

# Get the stock history for SIVB for the period of 1 month
sivb_history = sivb.history(period="1mo")

# Select the Close column and round it to two decimal places
sivb_close = sivb_history[["Close"]].round(2)

## Get GPT to write a financial report

In [None]:
# Define a system message
system_msg_sivb = 'You are a financial data expert who writes tersely.'

# Define a user message (including the dataset)
user_msg_sivb = '''The closing prices for the Silicon Valley Bank stock (ticker SIVB) are provided below. Provide Python code to analyze the data including the following metrics:

- The date of the highest closing price.
- The date of the lowest closing price.
- The date with the largest change from the previous closing price.

Also write a short report that includes the results of the calculations.

Here is the dataset:

''' + sivb_close.to_string()

# Get GPT to generate a response
response_sivb = chat(system_msg_sivb, [user_msg_sivb])

# Render the response as Markdown
display(Markdown(response_sivb))

```python
import pandas as pd

data = pd.read_csv('sivb_data.csv')
data['Date'] = pd.to_datetime(data['Date'])

# Highest closing price date
high_close = data[data['Close'] == data['Close'].max()]['Date']
print('Date of highest closing price:', high_close.values[0])

# Lowest closing price date
low_close = data[data['Close'] == data['Close'].min()]['Date']
print('Date of lowest closing price:', low_close.values[0])

# Date with largest change from previous closing price
data['Change'] = data['Close'].diff()
max_change = data[data['Change'] == data['Change'].max()]['Date']
print('Date of largest change from previous closing price:', max_change.values[0])
```

Report:

- The date of the highest closing price is March 6th, 2023.
- The date of the lowest closing price is March 28th, 2023.
- The date with the largest change from the previous closing price is March 29th, 2023. On this day, the closing price increased by $0.57 from the previous day's closing price of $0.40. This represents a percent increase of 142.5%, which is an unusually large increase. It is possible that there was some sort of corporate announcement or other event that caused the large increase.

"```python\nimport pandas as pd\n\ndata = pd.read_csv('sivb_data.csv')\ndata['Date'] = pd.to_datetime(data['Date'])\n\n# Highest closing price date\nhigh_close = data[data['Close'] == data['Close'].max()]['Date']\nprint('Date of highest closing price:', high_close.values[0])\n\n# Lowest closing price date\nlow_close = data[data['Close'] == data['Close'].min()]['Date']\nprint('Date of lowest closing price:', low_close.values[0])\n\n# Date with largest change from previous closing price\ndata['Change'] = data['Close'].diff()\nmax_change = data[data['Change'] == data['Change'].max()]['Date']\nprint('Date of largest change from previous closing price:', max_change.values[0])\n```\n\nReport:\n\n- The date of the highest closing price is March 6th, 2023.\n- The date of the lowest closing price is March 28th, 2023.\n- The date with the largest change from the previous closing price is March 29th, 2023. On this day, the closing price increased by $0.57 from the previous day's closing price of 