# OpenAI API - Advanced Level Demo

Welcome to the **Advanced Level** OpenAI API tutorial! This notebook covers production-ready patterns:

1. **AI Agents** - Autonomous agents with tool loops
2. **Embeddings & Semantic Search** - Vector representations for similarity
3. **Batch Processing** - Efficient large-scale API usage
4. **Async Operations** - Concurrent API calls
5. **Production Patterns** - Error handling, retries, cost optimization
6. **Audio Capabilities** - Speech-to-text and text-to-speech

## Prerequisites
- Completed Entry and Middle Level notebooks
- Understanding of function calling and streaming
- OpenAI API key configured

---

## Reference Documentation
- [Agents Overview](https://platform.openai.com/docs/guides/agents)
- [Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
- [Batch API](https://platform.openai.com/docs/guides/batch)
- [Best Practices](https://platform.openai.com/docs/guides/production-best-practices)

In [3]:
# Setup - Import all required libraries
import os
import json
import time
import asyncio
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from openai import OpenAI, AsyncOpenAI
from dotenv import load_dotenv
import numpy as np
from tqdm import tqdm

load_dotenv()

# =============================================================================
# GLOBAL CONFIGURATION
# =============================================================================
# Set the model to use throughout this notebook
MODEL = "gpt-4o-mini"  # Change to "gpt-4o" for more capable model
# =============================================================================

# Synchronous client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Async client for concurrent operations
async_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("✓ OpenAI clients initialized (sync + async)!")
print(f"✓ Using model: {MODEL}")

✓ OpenAI clients initialized (sync + async)!
✓ Using model: gpt-4o-mini


---

## 1. Building AI Agents

An **agent** is an AI system that can:
- Use tools to gather information
- Make decisions based on results
- Loop until the task is complete

The key pattern: **Agentic Loop** - repeatedly call the model until it stops requesting tools.

In [2]:
# Define a comprehensive set of tools for our agent
agent_tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search a product database for items matching a query",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "books", "all"]},
                    "max_results": {"type": "integer", "description": "Maximum results to return"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_product_details",
            "description": "Get detailed information about a specific product",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "string", "description": "The product ID"}
                },
                "required": ["product_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "check_inventory",
            "description": "Check if a product is in stock",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "string", "description": "The product ID"},
                    "quantity": {"type": "integer", "description": "Quantity needed"}
                },
                "required": ["product_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "place_order",
            "description": "Place an order for a product",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "string", "description": "The product ID"},
                    "quantity": {"type": "integer", "description": "Quantity to order"},
                    "shipping_address": {"type": "string", "description": "Delivery address"}
                },
                "required": ["product_id", "quantity", "shipping_address"]
            }
        }
    }
]

print(f"Defined {len(agent_tools)} tools for the agent")

Defined 4 tools for the agent


In [3]:
# Simulated tool implementations (in production, these would call real systems)
def search_database(query: str, category: str = "all", max_results: int = 5) -> str:
    """Simulated product search."""
    products = [
        {"id": "ELEC001", "name": "Wireless Headphones", "category": "electronics", "price": 79.99},
        {"id": "ELEC002", "name": "Bluetooth Speaker", "category": "electronics", "price": 49.99},
        {"id": "BOOK001", "name": "Python Programming", "category": "books", "price": 34.99},
        {"id": "CLTH001", "name": "Running Shoes", "category": "clothing", "price": 89.99},
    ]
    
    # Filter by category and query
    results = [p for p in products if category == "all" or p["category"] == category]
    results = [p for p in results if query.lower() in p["name"].lower()][:max_results]
    
    return json.dumps({"results": results, "total_found": len(results)})

def get_product_details(product_id: str) -> str:
    """Get detailed product info."""
    details = {
        "ELEC001": {"name": "Wireless Headphones", "price": 79.99, "description": "Premium noise-canceling headphones", "rating": 4.5},
        "ELEC002": {"name": "Bluetooth Speaker", "price": 49.99, "description": "Portable waterproof speaker", "rating": 4.2},
        "BOOK001": {"name": "Python Programming", "price": 34.99, "description": "Complete guide to Python", "rating": 4.8},
    }
    return json.dumps(details.get(product_id, {"error": "Product not found"}))

def check_inventory(product_id: str, quantity: int = 1) -> str:
    """Check product availability."""
    inventory = {"ELEC001": 50, "ELEC002": 0, "BOOK001": 100}
    stock = inventory.get(product_id, 0)
    return json.dumps({"product_id": product_id, "in_stock": stock >= quantity, "available": stock})

def place_order(product_id: str, quantity: int, shipping_address: str) -> str:
    """Place an order (simulated)."""
    order_id = f"ORD-{int(time.time())}"
    return json.dumps({"success": True, "order_id": order_id, "product_id": product_id, "quantity": quantity, "address": shipping_address})

# Map function names to implementations
tool_implementations = {
    "search_database": search_database,
    "get_product_details": get_product_details,
    "check_inventory": check_inventory,
    "place_order": place_order
}

print("Tool implementations ready!")

Tool implementations ready!


In [4]:
# The Agentic Loop - the core pattern for building agents
def run_agent(user_request: str, max_iterations: int = 10, verbose: bool = True) -> str:
    """
    Run an agent that can use tools to complete a task.
    
    The agent loop:
    1. Send request to model with tools
    2. If model wants to use tools, execute them
    3. Add tool results to conversation
    4. Repeat until model gives final answer
    """
    messages = [
        {
            "role": "system",
            "content": """You are a helpful shopping assistant. You can search for products, 
            get details, check inventory, and place orders. Always verify availability before 
            placing orders. Be thorough and provide helpful information to the user."""
        },
        {"role": "user", "content": user_request}
    ]
    
    iteration = 0
    while iteration < max_iterations:
        iteration += 1
        if verbose:
            print(f"\n--- Agent Iteration {iteration} ---")
        
        # Call the model
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=agent_tools,
            tool_choice="auto"
        )
        
        assistant_message = response.choices[0].message
        
        # Check if we're done (no tool calls = final answer)
        if not assistant_message.tool_calls:
            if verbose:
                print("Agent finished with final response")
            return assistant_message.content
        
        # Process tool calls
        messages.append(assistant_message)
        
        for tool_call in assistant_message.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            
            if verbose:
                print(f"  Calling: {func_name}({func_args})")
            
            # Execute the function
            func_to_call = tool_implementations[func_name]
            result = func_to_call(**func_args)
            
            if verbose:
                print(f"  Result: {result[:100]}..." if len(result) > 100 else f"  Result: {result}")
            
            # Add result to messages
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })
    
    return "Agent reached maximum iterations without completing the task."

print("Agent function defined!")

Agent function defined!


In [5]:
# Test the agent with a complex multi-step request
result = run_agent("I'm looking for headphones. Can you find some, check if they're in stock, and tell me about them?")

print("\n" + "=" * 60)
print("FINAL AGENT RESPONSE:")
print("=" * 60)
print(result)


--- Agent Iteration 1 ---
  Calling: search_database({'query': 'headphones'})
  Result: {"results": [{"id": "ELEC001", "name": "Wireless Headphones", "category": "electronics", "price": 79...

--- Agent Iteration 2 ---
  Calling: get_product_details({'product_id': 'ELEC001'})
  Result: {"name": "Wireless Headphones", "price": 79.99, "description": "Premium noise-canceling headphones",...
  Calling: check_inventory({'product_id': 'ELEC001'})
  Result: {"product_id": "ELEC001", "in_stock": true, "available": 50}

--- Agent Iteration 3 ---
Agent finished with final response

FINAL AGENT RESPONSE:
I found a pair of headphones that might interest you:

### Wireless Headphones
- **Price:** $79.99
- **Description:** Premium noise-canceling headphones
- **Rating:** 4.5 out of 5

#### Availability
These headphones are currently in stock, with 50 units available.

Would you like to place an order for them or need more information?


---

## 2. Embeddings & Semantic Search

**Embeddings** convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors, enabling:
- Semantic search (find related content)
- Clustering and classification
- Recommendation systems
- RAG (Retrieval-Augmented Generation)

In [6]:
# Generate embeddings for text
def get_embedding(text: str, model: str = "text-embedding-3-small") -> List[float]:
    """Get embedding vector for a text string."""
    response = client.embeddings.create(
        input=text,
        model=model
    )
    return response.data[0].embedding

# Example: Get embedding for a sentence
sample_text = "The quick brown fox jumps over the lazy dog."
embedding = get_embedding(sample_text)

print(f"Text: '{sample_text}'")
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 10 values: {embedding[:10]}")

Text: 'The quick brown fox jumps over the lazy dog.'
Embedding dimensions: 1536
First 10 values: [-0.018449828028678894, -0.007201016414910555, 0.0036607810761779547, -0.05420747399330139, -0.022751403972506523, 0.036975789815187454, 0.029032466933131218, 0.023918794468045235, 0.011191711761057377, -0.020645027980208397]


In [None]:
# Semantic Search Implementation
def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    vec1, vec2 = np.array(vec1), np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Build a simple knowledge base
knowledge_base = [
    "Python is a popular programming language for data science and machine learning.",
    "JavaScript is primarily used for web development and runs in browsers.",
    "Machine learning models can recognize patterns in large datasets.",
    "Cloud computing provides on-demand access to computing resources.",
    "APIs allow different software applications to communicate with each other.",
    "Database optimization improves query performance and reduces latency.",
]

# Pre-compute embeddings for the knowledge base
print("Building knowledge base embeddings...")
kb_embeddings = [get_embedding(text) for text in knowledge_base]
print(f"Created {len(kb_embeddings)} embeddings")

def semantic_search(query: str, top_k: int = 3) -> List[tuple]:
    """Search the knowledge base for relevant content."""
    query_embedding = get_embedding(query)
    
    # Calculate similarities
    similarities = []
    for i, kb_emb in enumerate(kb_embeddings):
        sim = cosine_similarity(query_embedding, kb_emb)
        similarities.append((sim, knowledge_base[i]))
    
    # Sort by similarity and return top results
    similarities.sort(reverse=True)
    return similarities[:top_k]

# Test semantic search
query = "How can I build AI applications?"
results = semantic_search(query)

print(f"\nQuery: '{query}'")
print("\nTop results:")
for score, text in results:
    print(f"  [{score:.3f}] {text}")

Building knowledge base embeddings...
Created 6 embeddings

Query: 'How can I build AI applications?'

Top results:
  [0.417] APIs allow different software applications to communicate with each other.
  [0.327] Machine learning models can recognize patterns in large datasets.
  [0.272] Python is a popular programming language for data science and machine learning.


In [None]:
# RAG (Retrieval-Augmented Generation) Pattern
def rag_query(user_question: str) -> str:
    """Answer a question using RAG - retrieve relevant context, then generate."""
    # Step 1: Retrieve relevant context
    relevant_docs = semantic_search(user_question, top_k=2)
    context = "\n".join([doc for _, doc in relevant_docs])
    
    # Step 2: Generate answer with context
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {
                "role": "system",
                "content": f"""Answer the user's question based on the following context. 
                If the context doesn't contain relevant information, say so.
                
                Context:
                {context}"""
            },
            {"role": "user", "content": user_question}
        ]
    )
    
    return response.choices[0].message.content

# Test RAG
answer = rag_query("What programming language should I learn for machine learning?")
print("RAG Answer:")
print(answer)

RAG Answer:
Based on the context, Python is a popular programming language for machine learning. It is widely used in the field and has many libraries and frameworks that support machine learning tasks.


---

## 3. Async Operations for Concurrency

Use async operations to make multiple API calls concurrently, dramatically improving throughput for batch operations.

In [None]:
# Async API calls for concurrent processing
async def async_chat_completion(prompt: str, request_id: int) -> dict:
    """Make an async chat completion request."""
    response = await async_client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=50
    )
    return {
        "id": request_id,
        "prompt": prompt,
        "response": response.choices[0].message.content
    }

async def process_batch_async(prompts: List[str]) -> List[dict]:
    """Process multiple prompts concurrently."""
    tasks = [
        async_chat_completion(prompt, i) 
        for i, prompt in enumerate(prompts)
    ]
    results = await asyncio.gather(*tasks)
    return results

# Test async batch processing
test_prompts = [
    "What is 2+2?",
    "Name a color.",
    "What is the capital of France?",
    "Name a fruit.",
    "What day comes after Monday?"
]

print(f"Processing {len(test_prompts)} prompts concurrently...")
start_time = time.time()

# Run async function (use await in Jupyter, asyncio.run() in scripts)
results = await process_batch_async(test_prompts)

elapsed = time.time() - start_time
print(f"Completed in {elapsed:.2f} seconds")
print("\nResults:")
for r in results:
    print(f"  [{r['id']}] {r['prompt'][:30]}... -> {r['response'][:50]}...")

Processing 5 prompts concurrently...
Completed in 1.22 seconds

Results:
  [0] What is 2+2?... -> 2 + 2 equals 4....
  [1] Name a color.... -> Azure....
  [2] What is the capital of France?... -> The capital of France is Paris....
  [3] Name a fruit.... -> Apple....
  [4] What day comes after Monday?... -> The day that comes after Monday is Tuesday....


---

## 4. Production Patterns

Essential patterns for production-grade applications: retries, rate limiting, cost tracking, and error handling.

In [6]:
# Production-ready API wrapper with retries and cost tracking
from openai import APIError, RateLimitError, APIConnectionError
import random

@dataclass
class APIUsageTracker:
    """Track API usage and costs."""
    total_prompt_tokens: int = 0
    total_completion_tokens: int = 0
    total_requests: int = 0
    failed_requests: int = 0
    
    # Approximate costs per 1M tokens (update based on current pricing)
    COST_PER_1M_PROMPT = 0.15  # for gpt-4o-mini
    COST_PER_1M_COMPLETION = 0.60
    
    def add_usage(self, prompt_tokens: int, completion_tokens: int):
        self.total_prompt_tokens += prompt_tokens
        self.total_completion_tokens += completion_tokens
        self.total_requests += 1
    
    def add_failure(self):
        self.failed_requests += 1
    
    @property
    def estimated_cost(self) -> float:
        prompt_cost = (self.total_prompt_tokens / 1_000_000) * self.COST_PER_1M_PROMPT
        completion_cost = (self.total_completion_tokens / 1_000_000) * self.COST_PER_1M_COMPLETION
        return prompt_cost + completion_cost
    
    def summary(self) -> str:
        return f"""
API Usage Summary:
  Requests: {self.total_requests} ({self.failed_requests} failed)
  Prompt tokens: {self.total_prompt_tokens:,}
  Completion tokens: {self.total_completion_tokens:,}
  Estimated cost: ${self.estimated_cost:.4f}
"""

# Initialize tracker
usage_tracker = APIUsageTracker()

def robust_chat_completion(
    messages: List[dict],
    max_retries: int = 3,
    base_delay: float = 1.0,
    track_usage: bool = True
) -> Optional[str]:
    """
    Production-ready chat completion with:
    - Exponential backoff retry
    - Rate limit handling
    - Usage tracking
    - Error logging
    """
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=MODEL,
                messages=messages
            )
            
            # Track usage
            if track_usage and response.usage:
                usage_tracker.add_usage(
                    response.usage.prompt_tokens,
                    response.usage.completion_tokens
                )
            
            return response.choices[0].message.content
            
        except RateLimitError as e:
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {delay:.1f}s before retry {attempt + 1}/{max_retries}")
            time.sleep(delay)
            
        except APIConnectionError as e:
            delay = base_delay * (2 ** attempt)
            print(f"Connection error. Waiting {delay:.1f}s before retry {attempt + 1}/{max_retries}")
            time.sleep(delay)
            
        except APIError as e:
            print(f"API error: {e}")
            usage_tracker.add_failure()
            return None
    
    usage_tracker.add_failure()
    return None

# Test the robust function
result = robust_chat_completion([
    {"role": "user", "content": "What is the speed of light?"}
])
print("Response:", result)
print(usage_tracker.summary())

Response: The speed of light in a vacuum is approximately \( 299,792,458 \) meters per second, commonly rounded to \( 300,000 \) kilometers per second (or about \( 186,282 \) miles per second). This constant is denoted by the symbol \( c \) and is a fundamental constant in physics, forming the basis of Einstein's theory of relativity.

API Usage Summary:
  Requests: 1 (0 failed)
  Prompt tokens: 14
  Completion tokens: 80
  Estimated cost: $0.0001



In [None]:
# Rate Limiter for controlled throughput
class RateLimiter:
    """Simple rate limiter using token bucket algorithm."""
    def __init__(self, requests_per_minute: int = 60):
        self.requests_per_minute = requests_per_minute
        self.min_interval = 60.0 / requests_per_minute
        self.last_request_time = 0
    
    def wait(self):
        """Wait if necessary to respect rate limits."""
        elapsed = time.time() - self.last_request_time
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_request_time = time.time()

# Initialize rate limiter (adjust based on your tier)
rate_limiter = RateLimiter(requests_per_minute=60)

def rate_limited_completion(messages: List[dict]) -> str:
    """Make a rate-limited API call."""
    rate_limiter.wait()
    return robust_chat_completion(messages)

# Process multiple requests with rate limiting
prompts = ["Count to 3", "Name a planet", "What color is grass?"]
print("Processing with rate limiting...")
for prompt in prompts:
    result = rate_limited_completion([{"role": "user", "content": prompt}])
    print(f"  {prompt} -> {result[:50] if result else 'Failed'}...")

Processing with rate limiting...
  Count to 3 -> 1, 2, 3....
  Name a planet -> Mars....
  What color is grass? -> Grass is typically green, although its color can v...


---

## 5. Audio Capabilities

OpenAI provides both **Text-to-Speech (TTS)** and **Speech-to-Text (STT/Whisper)** APIs.

In [None]:
# Text-to-Speech (TTS)
def text_to_speech(text: str, output_file: str = "output.mp3", voice: str = "alloy"):
    """
    Convert text to speech.
    
    Available voices: alloy, echo, fable, onyx, nova, shimmer
    """
    response = client.audio.speech.create(
        model="tts-1",  # or "tts-1-hd" for higher quality
        voice=voice,
        input=text
    )
    
    # Save to file
    response.stream_to_file(output_file)
    return output_file

# Generate speech (uncomment to run)
# output = text_to_speech(
#     "Hello! This is a demonstration of OpenAI's text to speech capabilities.",
#     "demo_speech.mp3",
#     voice="nova"
# )
# print(f"Audio saved to: {output}")

print("TTS function defined!")
print("Available voices: alloy, echo, fable, onyx, nova, shimmer")

TTS function defined!
Available voices: alloy, echo, fable, onyx, nova, shimmer


In [None]:
# Speech-to-Text (Whisper)
def speech_to_text(audio_file: str, language: str = None) -> dict:
    """
    Transcribe audio to text using Whisper.
    
    Supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
    """
    with open(audio_file, "rb") as f:
        transcript = client.audio.transcriptions.create(
            model="whisper-1",
            file=f,
            language=language,  # Optional: specify language code (e.g., "en", "es")
            response_format="verbose_json"  # Get detailed output with timestamps
        )
    return transcript

# Transcribe audio (uncomment when you have an audio file)
# result = speech_to_text("your_audio.mp3")
# print(f"Transcription: {result.text}")
# print(f"Language: {result.language}")
# print(f"Duration: {result.duration}s")

print("STT (Whisper) function defined!")
print("Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm")

STT (Whisper) function defined!
Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm


---

## 6. Batch API (For Large-Scale Processing)

The Batch API is designed for large-scale, non-time-sensitive workloads with 50% cost savings.

In [4]:
# Batch API - For processing large numbers of requests asynchronously
# Batch API provides 50% cost savings for non-time-sensitive workloads

def create_batch_file(requests: List[dict], filename: str = "batch_requests.jsonl"):
    """Create a JSONL file for batch processing."""
    with open(filename, 'w') as f:
        for i, req in enumerate(requests):
            batch_request = {
                "custom_id": f"request-{i}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": MODEL,
                    "messages": req["messages"],
                    "max_tokens": req.get("max_tokens", 100)
                }
            }
            f.write(json.dumps(batch_request) + "\n")
    return filename

def submit_batch(filename: str):
    """Submit a batch job to OpenAI."""
    # Upload the file
    with open(filename, "rb") as f:
        batch_file = client.files.create(file=f, purpose="batch")
    
    # Create the batch
    batch = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )
    
    return batch

# Example batch requests (uncomment to run)
batch_requests = [
    {"messages": [{"role": "user", "content": "What is 1+1?"}]},
    {"messages": [{"role": "user", "content": "What is 2+2?"}]},
    {"messages": [{"role": "user", "content": "What is 3+3?"}]},
]

filename = create_batch_file(batch_requests)
batch = submit_batch(filename)
print(f"Batch submitted: {batch.id}")
print(f"Status: {batch.status}")

print("Batch API functions defined!")
print("Use for large-scale processing with 50% cost savings")

Batch submitted: batch_6988eaf2cd508190a0dd7023047fedac
Status: validating
Batch API functions defined!
Use for large-scale processing with 50% cost savings


In [7]:
# Final usage summary
print("=" * 60)
print("SESSION SUMMARY")
print("=" * 60)
print(usage_tracker.summary())

SESSION SUMMARY

API Usage Summary:
  Requests: 1 (0 failed)
  Prompt tokens: 14
  Completion tokens: 80
  Estimated cost: $0.0001



---

## Summary

Congratulations! You've completed the Advanced Level tutorial and learned:

1. **AI Agents**: Building autonomous agents with agentic loops that use tools iteratively
2. **Embeddings & RAG**: Semantic search, vector similarity, and retrieval-augmented generation
3. **Async Operations**: Concurrent API calls for improved throughput
4. **Production Patterns**: Retries, rate limiting, cost tracking, and error handling
5. **Audio APIs**: Text-to-speech and speech-to-text capabilities
6. **Batch API**: Large-scale processing with cost savings

### Best Practices Checklist

- Always implement retry logic with exponential backoff
- Track token usage and costs in production
- Use async for concurrent operations
- Pre-compute embeddings for static content
- Use Batch API for bulk processing (50% savings)
- Handle rate limits gracefully

### Further Learning
- OpenAI Cookbook: https://cookbook.openai.com/
- API Reference: https://platform.openai.com/docs/api-reference
- Best Practices: https://platform.openai.com/docs/guides/production-best-practices