
# Using OpenAI API with Python

We went through the process of running LLMs locally on our machines.

This notebook demonstrates how to interact with OpenAI's powerful language models through their official API using the chat completions endpoint.

## 1. Prerequisites

Before running this notebook, you need to set up your OpenAI API access.

### 1.1 Getting an OpenAI API Key

To use the OpenAI API, you need an API key:

1. Create an account at [OpenAI's website](https://openai.com/)
2. Navigate to the [API keys page](https://platform.openai.com/api-keys)
3. Create a new secret key
4. Store this key securely - it's like a password!

### 1.2 Setting Up Your Environment

First, install the OpenAI Python package:

In [1]:
%%bash
pip install openai



Then, set up your API key. For security, it's best to use environment variables:

In [2]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables from .env file
load_dotenv()

# Option 1: Use environment variable from .env file
api_key = os.getenv("OPENAI_API_KEY")

# Option 2: If .env file doesn't exist, you can set the API key directly (not recommended for production)
if not api_key:
    # Uncomment the line below and replace with your actual API key
    # api_key = "your_actual_api_key_here"
    print("Warning: No API key found. Please set OPENAI_API_KEY in .env file or uncomment the line above.")

# Initialize OpenAI client
if api_key:
    client = OpenAI(api_key=api_key)
    print("OpenAI client initialized successfully!")
else:
    print("Please set your OpenAI API key to proceed.")

OpenAI client initialized successfully!


## 2. Using the Chat Completions API

The Chat Completions API is provided by Open AI and is designed for conversational interactions. As an input to the API, we provide the system message (how model should behave) and user message (input from user)

In [4]:
# We use the chat completion create method which accepts a list of messages and the model name. It returns a response object.
# The list of messages is always an array (list) of dictionaries with two keys: role and content. The first message is always the system message, which sets the behavior of the assistant. The second message is the user message, which is the input from the user.
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "讲一个中文笑话."}
    ]
)

print(response.choices[0].message.content) #We print the content of the first choice in the response object. The response object contains a list of choices, and each choice has a message with the content we want.

有一个人去餐厅吃饭，他对服务员说：“我想要一份鸡排。”服务员回答：“对不起，我们的鸡排卖完了。” 这个人想了一下，然后说：“那我就要鸡的排骨。” 就连服务员都忍不住笑出声来，因为鸡是没有排骨的。


## 3. Understanding the Response Structure

The API returns a structured response with useful metadata which we will print:

In [None]:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

# Print the full response structure
print("Full response object:")
print(response)

# Access specific parts
print("\nJust the message content:")
print(response.choices[0].message.content)

print("\nModel used:")
print(response.model)

print("\nUsage statistics:")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

Full response object:
ChatCompletion(id='chatcmpl-BugbE4CsZZTy2VJ7JzEKdEY6KoT88', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hello! As an artificial intelligence, I don't have feelings, but I'm here and ready to assist you. How can I help you today?", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1752849820, model='gpt-4-0613', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=30, prompt_tokens=23, total_tokens=53, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

Just the message content:
Hello! As an artificial intelligence, I don't have feelings, but I'm here and ready to assist you. How can I help you today?

Model used:
gpt-4-0

As you can see we have useful data from the response, more importantly, we have an understanding on how much tokens this call leveraged. Models are typically priced by a combination of the input and output tokens, so in this case, we have paid for 54 tokens. We will discuss pricing in more details at the end of this lab.

## 4. Creating a Simple Chat Interface (Without Memory)

Let's build a chat interface without memory.

In a notebook environment, we can't interact in a kind of chat interface where we provide input, get output, and then provide input again i.e. using an input() loop.

Instead, let's create a simple function and use it in separate cells to simulate a chat without memory:

### Cell 1: Define the function

In [6]:
# This function accepts a user message as an input (we will type that in the next cell) and returns a response from the OpenAI API.
def get_response_no_memory(user_message):
    """Get a response from OpenAI (no conversation history)""" 
    #""" This is a docstring. It describes the function and its parameters.""" Very useful for documentation and understanding the code.
    # Sometimes docstrings are not necessary, but they are very useful for understanding the code. It is like commenting the code, but in a more structured way.

    #We call the model with the user message which we will set below
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_message}
        ]
    )
    
    return response.choices[0].message.content

### Cell 2: First interaction

In [7]:
# First question (you can insert anything)
question = "What is artificial intelligence?"
answer = get_response_no_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")

You: What is artificial intelligence?
Assistant: Artificial Intelligence, often referred to as AI, is a branch of computer science that aims to imbue software with the ability to analyze its environment using either predetermined rules and strategies or patterns learned from previous experiences. AI software is designed to make decisions based on real-time data or inputs, interpreting, learning, planning, problem-solving, and making informed decisions. It's utilized in a wide range of applications, including virtual assistants like Siri and Alexa, recommendation systems like the ones used by Netflix and Amazon, and autonomous vehicles.


### Cell 3: Follow-up question (without memory)

In [8]:
# Second question (the AI won't remember the previous interaction)
question = "Can you elaborate more on that?"
answer = get_response_no_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")

You: Can you elaborate more on that?
Assistant: Certainly! As a helpful assistant, my main role is to provide support and assistance across a range of tasks. I can assist with things like setting reminders, finding information online, answering questions, managing tasks, and even providing recommendations or advice based on user needs. I use advanced AI technology to understand and respond to requests, helping to facilitate your needs more efficiently and effectively. As an AI, I don’t need rest, I'm here for you 24/7, ready to help whenever you need me.


You'll notice that when you ask "Can you elaborate more on that?" the assistant doesn't know what "that" refers to, because it has no memory of the previous exchange.

## 5. Creating a Chat Interface With Memory

Now let's build a version that maintains conversation history:

### Cell 1: Setup conversation memory

In [9]:
# Initialize conversation with system message. We will add more messages to the conversation memory (history) as we go.
conversation_memory = [
    {"role": "system", "content": "You are a helpful assistant."}
]

#Function to add user message to conversation history and get a response from OpenAI
def chat_with_memory(user_message):
    """Chat with the AI while maintaining conversation history"""
    
    # Add user message to history. Now, the conversation memory contains the system message and the user message.
    conversation_memory.append({"role": "user", "content": user_message})
    
    # Get response from OpenAI
    response = client.chat.completions.create(
        model="gpt-4",
        messages=conversation_memory
    )
    
    # Extract assistant's response
    assistant_response = response.choices[0].message.content
    
    # Add assistant response to conversation history. This is the response from the AI. This will be added to the conversation memory and will be usedin the future.
    conversation_memory.append({"role": "assistant", "content": assistant_response})
    
    # Return the response and token usage
    return assistant_response, response.usage.total_tokens

# Function to display the conversation history. We loop through the conversation memory and print each message. We skip the system message to keep the output clean.
def show_conversation():
    """Display the current conversation"""
    for message in conversation_memory:
        if message["role"] == "system":
            continue  # Skip system message
        print(f"{message['role'].capitalize()}: {message['content']}\n") #We capitalize the role to make it look nicer. This is just for display purposes. The role is either user or assistant.

### Cell 2: First interaction with memory

In [10]:
# First question
question = "What is artificial intelligence?"
answer, tokens = chat_with_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")
print(f"[Tokens used: {tokens}]")

You: What is artificial intelligence?
Assistant: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines or computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI can involve anything from Google's search algorithms to IBM's Watson to autonomous weapons. It's often categorized as either weak AI, which is designed to perform a narrow task (like facial recognition), or strong AI, which is a system with generalized human cognitive abilities.
[Tokens used: 127]


### Cell 3: Follow-up question (with memory)

In [11]:
# Second question (now the AI remembers the previous interaction)
question = "Can you elaborate more on that?"
answer, tokens = chat_with_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")
print(f"[Tokens used: {tokens}]")

You: Can you elaborate more on that?
Assistant: Certainly! Artificial Intelligence (AI) is a broad branch of computer science which is focused on building smart machines capable of performing tasks that typically require human intelligence. 

The goal of AI is to create systems that can function intelligently and independently. These can include specific applications such as:

- Gaming: AI technology is used to generate responsive or intelligent behaviors primarily in non-player characters (NPCs), similar to human-like intelligence.

- Natural Language Processing: Allows machines to understand and respond to voice commands or text inputs in natural human languages.

- Expert Systems: This involves programming computers to make decisions in real-life situations. They are used for complex problem-solving and decision-making processes.

- Vision Systems: These systems understand, interpret, and comprehend visual input on the computer.

AI can be categorized into two primary types: 

- Nar

### Cell 4: View the entire conversation

In [12]:
# See the entire conversation so far
print("Full conversation history:")
print("-" * 30)
show_conversation()

Full conversation history:
------------------------------
User: What is artificial intelligence?

Assistant: Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines or computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI can involve anything from Google's search algorithms to IBM's Watson to autonomous weapons. It's often categorized as either weak AI, which is designed to perform a narrow task (like facial recognition), or strong AI, which is a system with generalized human cognitive abilities.

User: Can you elaborate more on that?

Assistant: Certainly! Artificial Intelligence (AI) is a broad branch of computer science which is focused on building smart machines capable of performing tasks that typically require human intelligence. 

The goal of AI is to create systems that can func

### Cell 5: Reset conversation (if needed)

In [13]:
# Reset conversation if you want to start fresh
conversation_memory = [
    {"role": "system", "content": "You are a helpful assistant."}
]
print("Conversation has been reset!")

Conversation has been reset!


## 6. Streaming Responses

Streaming allows you to see the response as it's being generated:

In [14]:
import time

def stream_response(user_message):
    """Stream a response from OpenAI without storing conversation history"""
    
    ## Response from the model is streamed in chunks because we set the stream parameter to true. We stoer that in a variable called stream.
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_message}
        ],
        stream=True
    )
    
    print(f"You: {user_message}") # Print the user message
    print("Assistant: ", end="", flush=True)  # Print the assistant message without a newline. The flush=True argument makes sure the output is printed immediately.
    
    # Process the stream
    full_response = "" # The response will be empty at first. We will add the chunks to this variable.
    for chunk in stream: # We loop through the stream and get each chunk of data. Each chunk is a part of the response. chunk can be called anything, but we call it chunk to make it clear that it is a part of the response.
        if chunk.choices[0].delta.content is not None: # Check if the content is not None. This is to avoid errors in case the content is None.
            content_chunk = chunk.choices[0].delta.content # Get the content of the chunk. This is the part of the response we want to print.
            full_response += content_chunk # Add the chunk to the full response. This will be the final response we will return.
            print(content_chunk, end="", flush=True) # Print the chunk without a newline. The flush=True argument makes sure the output is printed immediately.
            time.sleep(0.01)  # Small delay to make it more readable
    
    print("\n")  # Add a newline after the response
    return full_response

Test the streaming function:

In [15]:
# Try the streaming function
stream_response("Write a short poem about programming")

You: Write a short poem about programming
Assistant: In the realm where logic is king,
Smitten by the coding string,
Programmers weave their skillful charm,
Letting no system succumb to harm.

Bits and bytes in lines compose,
A world unseen yet close,
In languages diverse they speak,
Solutions to problems that we seek.

Python, Java, C++, Shell,
In each one, they do dwell,
Binary tales they narrate,
Infinite loops they negate.

Through the wires, flows their art,
Striking like a coder's dart,
Debugging errors, squashing bugs,
Fuelled by coffee in hefty mugs.

Building bridges twixt man and machine,
Crafting realities once unseen,
In the heart of silicon they program,
Master creators in a digital realm.



"In the realm where logic is king,\nSmitten by the coding string,\nProgrammers weave their skillful charm,\nLetting no system succumb to harm.\n\nBits and bytes in lines compose,\nA world unseen yet close,\nIn languages diverse they speak,\nSolutions to problems that we seek.\n\nPython, Java, C++, Shell,\nIn each one, they do dwell,\nBinary tales they narrate,\nInfinite loops they negate.\n\nThrough the wires, flows their art,\nStriking like a coder's dart,\nDebugging errors, squashing bugs,\nFuelled by coffee in hefty mugs.\n\nBuilding bridges twixt man and machine,\nCrafting realities once unseen,\nIn the heart of silicon they program,\nMaster creators in a digital realm."

## 7. Streaming with Memory

Let's combine streaming with conversation memory:

In [16]:
# Initialize conversation with system message
streaming_conversation = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def stream_chat_with_memory(user_message):
    """Chat with memory and stream the response"""
    
    # Add user message to history
    streaming_conversation.append({"role": "user", "content": user_message})
    
    # Get streaming response
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=streaming_conversation,
        stream=True
    )
    
    print(f"You: {user_message}")
    print("Assistant: ", end="", flush=True)
    
    # Process the stream
    assistant_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content_chunk = chunk.choices[0].delta.content
            assistant_response += content_chunk
            print(content_chunk, end="", flush=True)
            time.sleep(0.01)
    
    print("\n")  # Add a newline after the response
    
    # Add assistant response to conversation history
    streaming_conversation.append({"role": "assistant", "content": assistant_response})
    
    return assistant_response

Test the streaming chat with memory:

In [17]:
# First streaming question with memory
stream_chat_with_memory("What are the three laws of robotics?")

You: What are the three laws of robotics?
Assistant: The Three Laws of Robotics, as proposed by science fiction author Isaac Asimov, are:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.



'The Three Laws of Robotics, as proposed by science fiction author Isaac Asimov, are:\n\n1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.\n2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.\n3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.'

In [18]:
# Follow-up streaming question with memory
stream_chat_with_memory("Who created these laws?")

You: Who created these laws?
Assistant: These laws were created by the prolific science fiction author Isaac Asimov. He introduced them in his 1942 short story "Runaround," although they also prominently featured in many of his other works.



'These laws were created by the prolific science fiction author Isaac Asimov. He introduced them in his 1942 short story "Runaround," although they also prominently featured in many of his other works.'

## 8. Understanding the Different Message Roles

The OpenAI Chat API uses three main roles in messages:

In [19]:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        # System message - sets behavior and context
        {"role": "system", "content": "You are a pirate who only speaks in pirate slang."},
        
        # User messages - what the user says
        {"role": "user", "content": "Hello, how are you today?"},
        
        # Assistant messages - previous responses from the assistant
        {"role": "assistant", "content": "Arrr! I be feelin' mighty fine today, me hearty!"},
        
        # Another user message
        {"role": "user", "content": "Tell me about the weather."}
    ]
)

print(response.choices[0].message.content)

Shiver me timbers! 'Tis a bonny day, matey. The sky be as blue as the open sea and the sun be shinin' bright. Perfect day for settin' sail, aye!



## 9. Understanding the Context Window

OpenAI models have different context window limitations:

- **GPT-3.5-Turbo**: 4,096 or 16,384 tokens (depending on version)
- **GPT-4**: 8,192 or 32,768 tokens (depending on version)
- **GPT-4 Turbo**: Up to 128,000 tokens

Unlike with local models, OpenAI manages tokens for you:
1. If you exceed the limit, the API will return an error
2. You're charged based on the number of tokens you use
3. The API provides token usage statistics with each request

Let's see tokens in action:

In [20]:
# Create a longer conversation
long_messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# Add some messages to the history
for i in range(5):
    long_messages.append({"role": "user", "content": f"This is test message {i+1}. Tell me something interesting about space."})
    response = client.chat.completions.create(
        model="gpt-4",
        messages=long_messages
    )
    assistant_msg = response.choices[0].message.content
    long_messages.append({"role": "assistant", "content": assistant_msg})
    print(f"Exchange {i+1} - Total tokens: {response.usage.total_tokens}")

Exchange 1 - Total tokens: 88
Exchange 2 - Total tokens: 180
Exchange 3 - Total tokens: 293
Exchange 4 - Total tokens: 394
Exchange 5 - Total tokens: 508


## 10. Managing Costs and Tokens

When using the OpenAI API, you need to be aware of costs:

1. **Token Counting**: Each request and response consumes tokens that you pay for
2. **Model Selection**: More powerful models cost more per token
3. **Context Window**: Longer conversations cost more because more tokens are sent

Tips for managing costs:

In [21]:
# Use a cheaper model for less complex tasks
response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # Cheaper than GPT-4
    messages=[{"role": "user", "content": "Summarize the benefits of exercise."}]
)

# Control maximum tokens to limit response length
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me about quantum physics."}],
    max_tokens=100  # Limit response length
)

# Use temperature to control randomness. Higher values make the output more random, while lower values make it more focused and deterministic.
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a creative story."}],
    temperature=0.7  # Higher for more creativity, lower for more deterministic
)

## 11. Handling Conversation History for Long Contexts

For long conversations, you need strategies to manage the context window:

In [22]:
# Example: Keep only the most recent N messages. N can be adjusted based on your needs. In this case, we keep the last 10 messages.
def trim_conversation(messages, max_messages=10):
    # Always keep the system message (first message)
    if len(messages) > max_messages + 1:
        system_message = messages[0]
        recent_messages = messages[-(max_messages):]
        return [system_message] + recent_messages
    return messages

# Example: Summarize the conversation periodically. We use AI to summarize the conversation and replace it with a single summary message. This is useful for long conversations where you want to keep the context but reduce the number of messages.
def summarize_conversation(messages):
    # Create a summary request
    summary_request = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Summarize this conversation concisely."},
            *messages
        ]
    )
    summary = summary_request.choices[0].message.content
    
    # Replace the conversation with the summary
    return [
        messages[0],  # Keep system message
        {"role": "system", "content": f"Previous conversation summary: {summary}"}
    ]

# When to use:
# if len(messages) > 20:
#     messages = summarize_conversation(messages)

## 12. Comparing Local LLMs vs. OpenAI API

| Feature | Local LLMs (Ollama) | OpenAI API |
|---------|---------------------|------------|
| Setup | Download models locally | API key only |
| Cost | Free (after download) | Pay per token |
| Privacy | Data stays on your device | Data sent to OpenAI |
| Performance | Limited by your hardware | State-of-the-art models |
| Reliability | Depends on your system | Managed service |
| Context Window | Typically smaller | Up to 128K tokens |
| Memory Management | Manual implementation | Handled via API |

## 13. Resources for Further Learning

- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
- [OpenAI Cookbook](https://github.com/openai/openai-cookbook)
- [OpenAI Python Library](https://github.com/openai/openai-python)
- [Token Usage Calculator](https://platform.openai.com/tokenizer)