# OpenAI Responses API

Welcome to AI Makerspace! This notebook demonstrates the powerful new **OpenAI Responses API**, which provides a streamlined interface for working with OpenAI's language models without the complexity of tool calling.

The Responses API offers several key advantages:
- **Simplified interface** - Direct text generation without tool schemas
- **Built-in reasoning controls** - Adjust effort levels for different use cases  
- **Structured output parsing** - Native Pydantic model support
- **Multimodal capabilities** - Text and image inputs in a single request
- **Streaming support** - Real-time response generation
- **Developer instructions** - System-level guidance separate from user content

This notebook walks through the core features with practical examples, showing how you can integrate this API into your AI applications for cleaner, more maintainable code.


### Setup and Authentication

First, we need to set up our OpenAI API credentials. We'll use `getpass` to securely input the API key without exposing it in the notebook.


In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


Now we'll initialize the OpenAI client using our API key. This client will be used for all subsequent API calls.

In [None]:
from openai import OpenAI
client = OpenAI()

### Basic Response Generation

Let's start with a simple example using the new `responses.create()` method. This is the core interface for the Responses API - notice how clean and straightforward it is compared to the traditional chat completions API.


In [None]:
response = client.responses.create(
    model="gpt-5",
    input="Define what 'AI Engineering' is."
)

print(response.output_text)

AI engineering is the discipline of systematically designing, building, deploying, and operating AI-enabled systems so they are reliable, safe, maintainable, and valuable in real-world use. It integrates machine learning with software, data, and systems engineering, along with governance and human-centered practices.

Key elements include:
- End-to-end lifecycle: problem framing, data and model development, evaluation, deployment, monitoring, and continuous improvement.
- Architecture and infrastructure: data pipelines, feature stores, training/serving platforms, vector/RAG components, orchestration, and scalable infrastructure.
- MLOps/LLMOps: versioning, CI/CD/CT for models and prompts, automated evaluations, rollout strategies (shadow, canary, A/B).
- Quality, safety, and risk: robustness, reliability, security, privacy, fairness, interpretability, and rigorous testing/validation.
- Operations: observability, drift and performance monitoring, incident response, cost/latency manageme

### Reasoning Control and Instructions

One of the powerful features of the Responses API is the ability to control the reasoning effort and provide developer instructions. Here we're using:

- `reasoning={"effort": "low"}` - Controls how much computational effort the model puts into reasoning
- `instructions` - Developer-level instructions that guide the model's behavior (separate from user content)


In [None]:
response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    instructions="Talk like a wizard.",
    input="How to write an efficient loop with NumPy?",
)

print(response.output_text)

Ah, seeker of speed, heed these arcane laws of NumPy, where loops of Pythonland are but lumbering ogres, and vectorized spells dance like lightning.

Principles of swiftness
- Prefer vectorization over Python loops: operate on whole arrays at once. NumPy’s ufuncs (sin, exp, add, etc.) run in C and are vastly faster.
- Broadcast, don’t iterate: shape your arrays to align and let broadcasting do the work instead of nested loops.
- Reduce along axes: use sum, mean, max, any, argmax, etc., with axis=... to avoid manual accumulation.
- Preallocate: conjure the full output array once; do not grow lists or arrays inside loops.
- Keep data contiguous and typed: use a consistent dtype and contiguous memory (arr.flags). Copy only when needed.
- Use BLAS-backed ops: dot, matmul (@), einsum call into highly optimized libraries.
- Avoid fake magic: np.vectorize and apply_along_axis look fancy but are still Python loops; they’re for convenience, not speed.
- Minimize temporaries: fuse operations whe

### Message-Based Input Format

The Responses API also supports the familiar message format with roles. Here we demonstrate:

- Using `input` as a list of message objects instead of a simple string
- `developer` role - A new role type for system-level instructions
- `user` role - Standard user input

This approach gives you more granular control over the conversation structure while maintaining the simplified API interface.

> NOTE: This should be *roughly* equivalent to the above cell - since we're using the same "instructions" in our `developer` role.


In [None]:
response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    input=[
        {
            "role": "developer",
            "content": "Talk like a wizard."
        },
        {
            "role": "user",
            "content": "How to write an efficient loop with NumPy?"
        }
    ]
)

print(response.output_text)

Hark, seeker of swiftness! In the realm of NumPy, thou dost not loop as in mortal Python; thou summonest arrays to dance as one through vectorized sorcery. Behold these spells for efficient “loops”:

- Prefer vectorization over Python loops
  - Replace elementwise for-loops with array-wide operations (ufuncs): +, -, *, /, **, sqrt, sin, etc.
  - Example: y = a*x + b is instant across the whole array x.

- Wield broadcasting to align shapes without copies
  - Let shapes expand with singleton dimensions instead of repeating data.
  - Example (pairwise squared distances):
    X: shape (n, d), Y: shape (m, d)
    D2 = ((X[:, None, :] - Y[None, :, :])**2).sum(axis=-1)

- Summon reductions along axes
  - Use axis to avoid loops: sum, mean, max, argmax, std, any, all.
  - Example: col_means = X.mean(axis=0)

- Favor in-place enchantments to save memory and time
  - a += b, a *= 2, or use out= and where= in ufuncs.
  - Example: np.add(a, b, out=a)

- Choose spells built on BLAS
  - dot, matmul

### Structured Output with Pydantic

One of the most exciting features is native structured output parsing using Pydantic models. Instead of trying to parse JSON from text responses, the API can directly return structured data.

Key features:
- `responses.parse()` method for structured output
- `text_format` parameter accepts Pydantic models
- Automatic validation and type checking
- Clean, typed responses that integrate seamlessly with Python applications


In [None]:
from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = client.responses.parse(
    model="gpt-5",
    input=[
        {
            "role": "developer",
            "content": "Extract the event information."
        },
        {
            "role": "user",
            "content": "Alice and Bob are going to the AI Engineering Bootcamp kickoff on September 9th at 7PM. Registration closes Tuesday (9/9) at Noon EDT.",
        },
    ],
    text_format=CalendarEvent,
)

print(response.output_parsed)

name='AI Engineering Bootcamp kickoff' date='2025-09-09T19:00:00' participants=['Alice', 'Bob']


### Multimodal Input - Text and Images

The Responses API provides excellent support for multimodal inputs, allowing you to combine text and images in a single request. This example shows:

- `input_text` and `input_image` content types
- Direct image URL support
- Unified processing of text and visual information

This makes it easy to build applications that need to analyze images, documents, or other visual content alongside text instructions.


In [None]:
response = client.responses.create(
    model="gpt-5",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Describe the visual appearance of the people in this image.",
                },
                {
                    "type": "input_image",
                    "image_url": "https://d2426xcxuh3ht5.cloudfront.net/rnDUeTj9Sqysz9tCeGdB_AIE_Cohort_7_Banner.jpg"
                }
            ]
        }
    ]
)

print(response.output_text)

- Person on the left:
  - Adult man, light complexion, short facial hair/stubble.
  - Wearing a black hoodie and a black cap with a white speckled pattern on the front.
  - Smiling with teeth visible; facing slightly to the right.
  - Outlined with a neon pink glow.

- Person on the right:
  - Adult man with hair pulled back and a short goatee/mustache.
  - Wearing a blue-and-white cloud/tie-dye style polo shirt.
  - Arms crossed and smiling.
  - Wearing a dark smartwatch; outlined with a blue glow.


### Streaming Responses with Structured Output

For real-time applications, the Responses API supports streaming even with structured output. This example demonstrates:

- `responses.stream()` context manager for streaming
- Event-based streaming with different event types:
  - `response.output_text.delta` - Incremental text updates
  - `response.refusal.delta` - Safety refusal messages  
  - `response.error` - Error handling
  - `response.completed` - Stream completion
- `get_final_response()` to retrieve the complete parsed result
- Structured output streaming with Pydantic models

This is perfect for building responsive UIs that show progress while maintaining type safety.


In [None]:
from typing import List

class EntitiesModel(BaseModel):
    attributes: List[str]
    colors: List[str]
    animals: List[str]

with client.responses.stream(
    model="gpt-5-mini",
    input=[
        {"role": "system", "content": "Extract entities from the input text"},
        {
            "role": "user",
            "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
        },
    ],
    text_format=EntitiesModel,
) as stream:
    for event in stream:
        if event.type == "response.refusal.delta":
            print(event.delta, end="")
        elif event.type == "response.output_text.delta":
            print(event.delta, end="")
        elif event.type == "response.error":
            print(event.error, end="")
        elif event.type == "response.completed":
            print("Completed")
            # print(event.response.output)

    final_response = stream.get_final_response()
    print(final_response)

{"attributes":["quick","lazy","piercing"],"colors":["brown","blue"],"animals":["fox","dog"]}Completed
ParsedResponse[EntitiesModel](id='resp_68b880fc55588192a367c429226609220bab634bdea19ee3', created_at=1756922108.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-mini-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_68b880fd9e148192afc973bb5dd588440bab634bdea19ee3', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ParsedResponseOutputMessage[EntitiesModel](id='msg_68b8810105348192a8e7641c0c1da5f90bab634bdea19ee3', content=[ParsedResponseOutputText[EntitiesModel](annotations=[], text='{"attributes":["quick","lazy","piercing"],"colors":["brown","blue"],"animals":["fox","dog"]}', type='output_text', logprobs=[], parsed=EntitiesModel(attributes=['quick', 'lazy', 'piercing'], colors=['brown', 'blue'], animals=['fox', 'dog']))], role='assistant', status='completed', type='message')], parallel_tool_calls=T

## Summary

The OpenAI Responses API represents a significant step forward in making AI integration simpler and more powerful. Key takeaways:

✅ **Cleaner code** - Less boilerplate, more focus on your application logic  
✅ **Built-in structure** - Native Pydantic support eliminates JSON parsing headaches  
✅ **Flexible control** - Fine-tune reasoning effort and output style  
✅ **Multimodal ready** - Text and image inputs work seamlessly together  
✅ **Production ready** - Streaming support for responsive applications  

---

### Advanced Configuration Options

The final example showcases some advanced configuration options:

- `reasoning={"effort": "minimal"}` - Even lower computational effort for simple tasks
- `text={"verbosity": "low"}` - Control output verbosity for concise responses
- Fine-tuning the balance between speed, cost, and output quality

This final configuration will give you the closest possible performance to GPT-4o, just in case you miss the old model.

In [None]:
result = client.responses.create(
    model="gpt-5",
    input="Write a haiku about code.",
    reasoning={ "effort": "minimal" },
    text={ "verbosity": "low" },
)

print(result.output_text)

Silent loops hum low,  
logic blossoms into light—  
bugs blink, then fade out.
