
# 使用 Python 调用 OpenAI API

我们已经完成了在本地机器上运行LLM的过程。

本笔记本演示如何通过OpenAI官方API使用聊天完成端点与OpenAI强大的语言模型进行交互。

## 1. 先决条件

在运行此笔记本之前，您需要设置OpenAI API访问权限。

### 1.1 获取OpenAI API密钥

要使用OpenAI API，您需要一个API密钥：

1. 在[OpenAI网站](https://openai.com/)创建账户
2. 导航到[API密钥页面](https://platform.openai.com/api-keys)
3. 创建一个新的秘密密钥
4. 安全地存储此密钥 - 它就像密码一样！

### 1.2 设置您的环境

首先，安装OpenAI Python包：

In [None]:
%%bash
pip install openai

然后，设置您的API密钥。为了安全起见，最好使用环境变量：

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables from .env file
load_dotenv()

# Option 1: Use environment variable from .env file
api_key = os.getenv("OPENAI_API_KEY")

# Option 2: If .env file doesn't exist, you can set the API key directly (not recommended for production)
if not api_key:
    # Uncomment the line below and replace with your actual API key
    # api_key = "your_actual_api_key_here"
    print("Warning: No API key found. Please set OPENAI_API_KEY in .env file or uncomment the line above.")

# Initialize OpenAI client
if api_key:
    client = OpenAI(api_key=api_key)
    print("OpenAI client initialized successfully!")
else:
    print("Please set your OpenAI API key to proceed.")

## 2. 使用聊天完成API

聊天完成API由OpenAI提供，专为对话交互而设计。作为API的输入，我们提供系统消息（模型应该如何表现）和用户消息（用户的输入）

In [None]:
# We use the chat completion create method which accepts a list of messages and the model name. It returns a response object.
# The list of messages is always an array (list) of dictionaries with two keys: role and content. The first message is always the system message, which sets the behavior of the assistant. The second message is the user message, which is the input from the user.
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "讲一个中文笑话."}
    ]
)

print(response.choices[0].message.content) #We print the content of the first choice in the response object. The response object contains a list of choices, and each choice has a message with the content we want.

## 3. 理解响应结构

API返回一个带有有用元数据的结构化响应，我们将打印出来：

In [None]:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

# Print the full response structure
print("Full response object:")
print(response)

# Access specific parts
print("\nJust the message content:")
print(response.choices[0].message.content)

print("\nModel used:")
print(response.model)

print("\nUsage statistics:")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

如您所见，我们从响应中获得了有用的数据，更重要的是，我们了解了这次调用使用了多少token。模型的定价通常基于输入和输出token的组合，所以在这种情况下，我们为54个token付费。我们将在本实验室的最后更详细地讨论定价。

## 4. 创建简单的聊天界面（无记忆）

让我们构建一个无记忆的聊天界面。

在笔记本环境中，我们无法在一种聊天界面中进行交互，即我们提供输入，获得输出，然后再次提供输入，即使用input()循环。

相反，让我们创建一个简单的函数并在单独的单元格中使用它来模拟无记忆的聊天：

### 单元格1：定义函数

In [None]:
# This function accepts a user message as an input (we will type that in the next cell) and returns a response from the OpenAI API.
def get_response_no_memory(user_message):
    """Get a response from OpenAI (no conversation history)""" 
    #""" This is a docstring. It describes the function and its parameters.""" Very useful for documentation and understanding the code.
    # Sometimes docstrings are not necessary, but they are very useful for understanding the code. It is like commenting the code, but in a more structured way.

    #We call the model with the user message which we will set below
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_message}
        ]
    )
    
    return response.choices[0].message.content

### 单元格2：第一次交互

In [None]:
# First question (you can insert anything)
question = "What is artificial intelligence?"
answer = get_response_no_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")

### 单元格3：后续问题（无记忆）

In [None]:
# Second question (the AI won't remember the previous interaction)
question = "Can you elaborate more on that?"
answer = get_response_no_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")

您会注意到，当您问
时，助手不知道
指的是什么，因为它没有之前交流的记忆。

## 5. 创建带记忆的聊天界面

现在让我们构建一个维护对话历史的版本：

### 单元格1：设置对话记忆

In [None]:
# Initialize conversation with system message. We will add more messages to the conversation memory (history) as we go.
conversation_memory = [
    {"role": "system", "content": "You are a helpful assistant."}
]

#Function to add user message to conversation history and get a response from OpenAI
def chat_with_memory(user_message):
    """Chat with the AI while maintaining conversation history"""
    
    # Add user message to history. Now, the conversation memory contains the system message and the user message.
    conversation_memory.append({"role": "user", "content": user_message})
    
    # Get response from OpenAI
    response = client.chat.completions.create(
        model="gpt-4",
        messages=conversation_memory
    )
    
    # Extract assistant's response
    assistant_response = response.choices[0].message.content
    
    # Add assistant response to conversation history. This is the response from the AI. This will be added to the conversation memory and will be usedin the future.
    conversation_memory.append({"role": "assistant", "content": assistant_response})
    
    # Return the response and token usage
    return assistant_response, response.usage.total_tokens

# Function to display the conversation history. We loop through the conversation memory and print each message. We skip the system message to keep the output clean.
def show_conversation():
    """Display the current conversation"""
    for message in conversation_memory:
        if message["role"] == "system":
            continue  # Skip system message
        print(f"{message['role'].capitalize()}: {message['content']}\n") #We capitalize the role to make it look nicer. This is just for display purposes. The role is either user or assistant.

### 单元格2：第一次带记忆的交互

In [None]:
# First question
question = "What is artificial intelligence?"
answer, tokens = chat_with_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")
print(f"[Tokens used: {tokens}]")

### 单元格3：后续问题（带记忆）

In [None]:
# Second question (now the AI remembers the previous interaction)
question = "Can you elaborate more on that?"
answer, tokens = chat_with_memory(question)

print(f"You: {question}")
print(f"Assistant: {answer}")
print(f"[Tokens used: {tokens}]")

### 单元格4：查看整个对话

In [None]:
# See the entire conversation so far
print("Full conversation history:")
print("-" * 30)
show_conversation()

### 单元格5：重置对话（如果需要）

In [None]:
# Reset conversation if you want to start fresh
conversation_memory = [
    {"role": "system", "content": "You are a helpful assistant."}
]
print("Conversation has been reset!")

## 6. 流式响应

流式传输允许您在生成响应时查看响应：

In [None]:
import time

def stream_response(user_message):
    """Stream a response from OpenAI without storing conversation history"""
    
    ## Response from the model is streamed in chunks because we set the stream parameter to true. We stoer that in a variable called stream.
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_message}
        ],
        stream=True
    )
    
    print(f"You: {user_message}") # Print the user message
    print("Assistant: ", end="", flush=True)  # Print the assistant message without a newline. The flush=True argument makes sure the output is printed immediately.
    
    # Process the stream
    full_response = "" # The response will be empty at first. We will add the chunks to this variable.
    for chunk in stream: # We loop through the stream and get each chunk of data. Each chunk is a part of the response. chunk can be called anything, but we call it chunk to make it clear that it is a part of the response.
        if chunk.choices[0].delta.content is not None: # Check if the content is not None. This is to avoid errors in case the content is None.
            content_chunk = chunk.choices[0].delta.content # Get the content of the chunk. This is the part of the response we want to print.
            full_response += content_chunk # Add the chunk to the full response. This will be the final response we will return.
            print(content_chunk, end="", flush=True) # Print the chunk without a newline. The flush=True argument makes sure the output is printed immediately.
            time.sleep(0.01)  # Small delay to make it more readable
    
    print("\n")  # Add a newline after the response
    return full_response

测试流式函数：

In [None]:
# Try the streaming function
stream_response("Write a short poem about programming")

## 7. 带记忆的流式传输

让我们将流式传输与对话记忆结合起来：

In [None]:
# Initialize conversation with system message
streaming_conversation = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def stream_chat_with_memory(user_message):
    """Chat with memory and stream the response"""
    
    # Add user message to history
    streaming_conversation.append({"role": "user", "content": user_message})
    
    # Get streaming response
    stream = client.chat.completions.create(
        model="gpt-4",
        messages=streaming_conversation,
        stream=True
    )
    
    print(f"You: {user_message}")
    print("Assistant: ", end="", flush=True)
    
    # Process the stream
    assistant_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            content_chunk = chunk.choices[0].delta.content
            assistant_response += content_chunk
            print(content_chunk, end="", flush=True)
            time.sleep(0.01)
    
    print("\n")  # Add a newline after the response
    
    # Add assistant response to conversation history
    streaming_conversation.append({"role": "assistant", "content": assistant_response})
    
    return assistant_response

测试带记忆的流式聊天：

In [None]:
# First streaming question with memory
stream_chat_with_memory("What are the three laws of robotics?")

In [None]:
# Follow-up streaming question with memory
stream_chat_with_memory("Who created these laws?")

## 8. 理解不同的消息角色

OpenAI聊天API在消息中使用三个主要角色：

In [None]:
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        # System message - sets behavior and context
        {"role": "system", "content": "You are a pirate who only speaks in pirate slang."},
        
        # User messages - what the user says
        {"role": "user", "content": "Hello, how are you today?"},
        
        # Assistant messages - previous responses from the assistant
        {"role": "assistant", "content": "Arrr! I be feelin' mighty fine today, me hearty!"},
        
        # Another user message
        {"role": "user", "content": "Tell me about the weather."}
    ]
)

print(response.choices[0].message.content)

## 9. 理解上下文窗口

OpenAI模型有不同的上下文窗口限制：

- **GPT-3.5-Turbo**: 4,096或16,384个token（取决于版本）
- **GPT-4**: 8,192或32,768个token（取决于版本）
- **GPT-4 Turbo**: 高达128,000个token

与本地模型不同，OpenAI为您管理token：
1. 如果您超出限制，API将返回错误
2. 您根据使用的token数量付费
3. API在每个请求中提供token使用统计信息

让我们看看token的实际应用：

In [None]:
# Create a longer conversation
long_messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# Add some messages to the history
for i in range(5):
    long_messages.append({"role": "user", "content": f"This is test message {i+1}. Tell me something interesting about space."})
    response = client.chat.completions.create(
        model="gpt-4",
        messages=long_messages
    )
    assistant_msg = response.choices[0].message.content
    long_messages.append({"role": "assistant", "content": assistant_msg})
    print(f"Exchange {i+1} - Total tokens: {response.usage.total_tokens}")

## 10. 管理成本和Token

使用OpenAI API时，您需要了解成本：

1. **Token计数**: 每个请求和响应都消耗您需要付费的token
2. **模型选择**: 更强大的模型每个token的成本更高
3. **上下文窗口**: 更长的对话成本更高，因为发送了更多token

管理成本的技巧：

In [None]:
# Use a cheaper model for less complex tasks
response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # Cheaper than GPT-4
    messages=[{"role": "user", "content": "Summarize the benefits of exercise."}]
)

# Control maximum tokens to limit response length
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me about quantum physics."}],
    max_tokens=100  # Limit response length
)

# Use temperature to control randomness. Higher values make the output more random, while lower values make it more focused and deterministic.
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a creative story."}],
    temperature=0.7  # Higher for more creativity, lower for more deterministic
)

## 11. 处理长上下文的对话历史

对于长对话，您需要管理上下文窗口的策略：

In [None]:
# Example: Keep only the most recent N messages. N can be adjusted based on your needs. In this case, we keep the last 10 messages.
def trim_conversation(messages, max_messages=10):
    # Always keep the system message (first message)
    if len(messages) > max_messages + 1:
        system_message = messages[0]
        recent_messages = messages[-(max_messages):]
        return [system_message] + recent_messages
    return messages

# Example: Summarize the conversation periodically. We use AI to summarize the conversation and replace it with a single summary message. This is useful for long conversations where you want to keep the context but reduce the number of messages.
def summarize_conversation(messages):
    # Create a summary request
    summary_request = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Summarize this conversation concisely."},
            *messages
        ]
    )
    summary = summary_request.choices[0].message.content
    
    # Replace the conversation with the summary
    return [
        messages[0],  # Keep system message
        {"role": "system", "content": f"Previous conversation summary: {summary}"}
    ]

# When to use:
# if len(messages) > 20:
#     messages = summarize_conversation(messages)

## 12. 本地LLM与OpenAI API的比较

| 功能 | 本地LLM (Ollama) | OpenAI API |
|---------|---------------------|------------|
| 设置 | 本地下载模型 | 仅需API密钥 |
| 成本 | 免费（下载后） | 按token付费 |
| 隐私 | 数据保留在您的设备上 | 数据发送到OpenAI |
| 性能 | 受您的硬件限制 | 最先进的模型 |
| 可靠性 | 取决于您的系统 | 托管服务 |
| 上下文窗口 | 通常较小 | 高达128K token |
| 记忆管理 | 手动实现 | 通过API处理 |

## 13. 进一步学习资源

- [OpenAI API文档](https://platform.openai.com/docs/api-reference)
- [OpenAI Cookbook](https://github.com/openai/openai-cookbook)
- [OpenAI Python库](https://github.com/openai/openai-python)
- [Token使用计算器](https://platform.openai.com/tokenizer)