# Structured Outputs with Gemini

This notebook explores two powerful features for building capable AI agents with Large Language Models (LLMs): **Tools (Function Calling)** and **Structured Outputs**. We will use the `google-genai` library to interact with Google's Gemini models.

**Learning Objectives:**

1.  **Understand and implement tool use (function calling)** to allow an LLM to interact with external systems.
2.  **Enforce structured data formats (JSON)** from an LLM for reliable data extraction.
3.  **Leverage Pydantic models** to define and manage complex data structures for both function arguments and structured outputs, improving code robustness and clarity.

## 1. Setup

First, let's install the necessary Python libraries.

!pip install -q google-generativeai pydantic python-dotenv

### Configure Gemini API Key

To use the Gemini API, you need an API key. 

1.  Get your key from [Google AI Studio](https://aistudio.google.com/app/apikey).
2.  Create a file named `.env` in the root of this project.
3.  Add the following line to the `.env` file, replacing `your_api_key_here` with your actual key:
    ```
    GEMINI_API_KEY="your_api_key_here"
    ```
The code below will load this key from the `.env` file.

In [1]:
import os
from pathlib import Path

from dotenv import load_dotenv

REPOSITORY_ROOT_DIR = Path().absolute().parent.parent
DOTENV_FILE_PATH = REPOSITORY_ROOT_DIR / ".env"
print(f"Trying to load environment variables from `{DOTENV_FILE_PATH}`")

if not DOTENV_FILE_PATH.exists():
    raise FileNotFoundError(f"Environment file `{DOTENV_FILE_PATH}` not found.")

load_dotenv(dotenv_path=DOTENV_FILE_PATH)

assert "GOOGLE_API_KEY" in os.environ, "`GOOGLE_API_KEY` is not set"

print("Environment variables loaded successfully.")

Trying to load environment variables from `/Users/pauliusztin/Documents/01_projects/TAI/course-ai-agents/.env`
Environment variables loaded successfully.


In [2]:
import json

from google import genai
from google.genai import types
from pydantic import BaseModel, Field

### Initialize the Gemini Client

In [3]:
client = genai.Client()

### Define Constants

We will use the `gemini-2.5-flash` model, which is fast, cost-effective, and supports advanced features like tool use.

In [4]:
MODEL_ID = "gemini-2.5-flash"

## 2. Implementing structured outputs from scratch using JSON

Sometimes, you don't need the LLM to take an action, but you need its output in a specific, machine-readable format. Forcing the output to be JSON is a common way to achieve this.

We can instruct the model to do this by **prompting** clearly describing the desired JSON structure in the prompt.

### Example: Extracting Metadata from a Document

Let's imagine we have a markdown document and we want to extract key information like a summary, tags, and keywords into a clean JSON object.

In [5]:
DOCUMENT = """
# Q3 2023 Financial Performance Analysis

The Q3 earnings report shows a 20% increase in revenue and a 15% growth in user engagement, 
beating market expectations. These impressive results reflect our successful product strategy 
and strong market positioning.

Our core business segments demonstrated remarkable resilience, with digital services leading 
the growth at 25% year-over-year. The expansion into new markets has proven particularly 
successful, contributing to 30% of the total revenue increase.

Customer acquisition costs decreased by 10% while retention rates improved to 92%, 
marking our best performance to date. These metrics, combined with our healthy cash flow 
position, provide a strong foundation for continued growth into Q4 and beyond.
"""

prompt = f"""
Analyze the following document and extract metadata from it. 
The output must be a single, valid JSON object with the following structure:
<json>
{{ "summary": "A concise summary of the article.", "tags": ["list", "of", "relevant", "tags"], "keywords": ["list", "of", "key", "concepts"] }}
</json>

Here is the document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt)

print("--- Raw LLM Output ---")
print(response.text)

--- Raw LLM Output ---
```json
{
  "summary": "The Q3 2023 earnings report showcases a 20% increase in revenue and 15% growth in user engagement, surpassing market expectations. This performance is attributed to a successful product strategy, strong market positioning, significant growth in digital services, and effective new market expansion. The company also improved customer acquisition costs by 10% and achieved a 92% retention rate, maintaining a healthy cash flow for continued growth.",
  "tags": [
    "Financial Performance",
    "Q3 2023",
    "Earnings Report",
    "Business Growth",
    "Market Strategy"
  ],
  "keywords": [
    "revenue increase",
    "user engagement",
    "digital services",
    "new market expansion",
    "customer acquisition costs",
    "retention rates",
    "cash flow",
    "market expectations"
  ]
}
```


In [6]:
def extract_json_from_response(response: str) -> dict:
    """
    Extracts JSON from a response string that is wrapped in <json> or ```json tags.
    """
    response = response.replace("<json>", "").replace("</json>", "")
    response = response.replace("```json", "").replace("```", "")

    return json.loads(response)

In [7]:
# You can now reliably parse the JSON string
parsed_repsonse = extract_json_from_response(response.text)

print("\n----- Parsed JSON Object -----")
print(f"Type of the parsed response: `{type(parsed_repsonse)}`")
print("--------------------------------")
print(json.dumps(parsed_repsonse, indent=2))


----- Parsed JSON Object -----
Type of the parsed response: `<class 'dict'>`
--------------------------------
{
  "summary": "The Q3 2023 earnings report showcases a 20% increase in revenue and 15% growth in user engagement, surpassing market expectations. This performance is attributed to a successful product strategy, strong market positioning, significant growth in digital services, and effective new market expansion. The company also improved customer acquisition costs by 10% and achieved a 92% retention rate, maintaining a healthy cash flow for continued growth.",
  "tags": [
    "Financial Performance",
    "Q3 2023",
    "Earnings Report",
    "Business Growth",
    "Market Strategy"
  ],
  "keywords": [
    "revenue increase",
    "user engagement",
    "digital services",
    "new market expansion",
    "customer acquisition costs",
    "retention rates",
    "cash flow",
    "market expectations"
  ]
}


## 3. Implementing structured outputs from scratch using Pydantic

While prompting for JSON is effective, it can be fragile. A more robust and modern approach is to use **Pydantic**. Pydantic allows you to define data structures as Python classes. This gives you:

- **A single source of truth**: The Pydantic model defines the structure.
- **Automatic schema generation**: You can easily generate a JSON Schema from the model.
- **Data validation**: You can validate the LLM's output against the model to ensure it conforms to the expected structure and types.

Let's recreate the previous example using Pydantic.

In [8]:
class DocumentMetadata(BaseModel):
    """A class to hold structured metadata for a document."""

    summary: str = Field(description="A concise, 1-2 sentence summary of the document.")
    tags: list[str] = Field(
        description="A list of 3-5 high-level tags relevant to the document."
    )
    keywords: list[str] = Field(
        description="A list of specific keywords or concepts mentioned."
    )

### Injecting Pydantic Schema into the Prompt

We can generate a JSON Schema from our Pydantic model and inject it directly into the prompt. This is a more formal way of telling the LLM what structure to follow.

In [9]:
schema = DocumentMetadata.model_json_schema()
schema

{'description': 'A class to hold structured metadata for a document.',
 'properties': {'summary': {'description': 'A concise, 1-2 sentence summary of the document.',
   'title': 'Summary',
   'type': 'string'},
  'tags': {'description': 'A list of 3-5 high-level tags relevant to the document.',
   'items': {'type': 'string'},
   'title': 'Tags',
   'type': 'array'},
  'keywords': {'description': 'A list of specific keywords or concepts mentioned.',
   'items': {'type': 'string'},
   'title': 'Keywords',
   'type': 'array'}},
 'required': ['summary', 'tags', 'keywords'],
 'title': 'DocumentMetadata',
 'type': 'object'}

In [10]:
prompt = f"""
Please analyze the following document and extract metadata from it. 
The output must be a single, valid JSON object that conforms to the following JSON Schema:
<json>
{json.dumps(schema, indent=2)}
</json>

Here is the document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt)

parsed_repsonse = extract_json_from_response(response.text)

print("\n----- Parsed JSON Object -----")
print(f"Type of the parsed response: `{type(parsed_repsonse)}`")
print("--------------------------------")
print(json.dumps(parsed_repsonse, indent=2))


----- Parsed JSON Object -----
Type of the parsed response: `<class 'dict'>`
--------------------------------
{
  "summary": "The Q3 2023 earnings report showcases a 20% increase in revenue and 15% growth in user engagement, exceeding market expectations due to successful product strategy and market positioning. This strong financial performance is further supported by leading digital services growth, successful new market expansion, decreased customer acquisition costs, and improved retention rates, providing a solid foundation for future growth.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Business Growth",
    "Q3 2023",
    "Market Analysis"
  ],
  "keywords": [
    "Q3 2023",
    "revenue increase",
    "user engagement",
    "market expectations",
    "product strategy",
    "market positioning",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ]
}


Now, we can validate the output with Pydantic:

In [11]:
try:
    document_metadata = DocumentMetadata.model_validate(parsed_repsonse)
    print("\nValidation successful!")
    print("\n--- Pydantic Validated Object ---")
    print(f"Type of the validated response: `{type(document_metadata)}`")
    print(document_metadata.model_dump_json(indent=2))
except Exception as e:
    print(f"\nValidation failed: {e}")


Validation successful!

--- Pydantic Validated Object ---
Type of the validated response: `<class '__main__.DocumentMetadata'>`
{
  "summary": "The Q3 2023 earnings report showcases a 20% increase in revenue and 15% growth in user engagement, exceeding market expectations due to successful product strategy and market positioning. This strong financial performance is further supported by leading digital services growth, successful new market expansion, decreased customer acquisition costs, and improved retention rates, providing a solid foundation for future growth.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Business Growth",
    "Q3 2023",
    "Market Analysis"
  ],
  "keywords": [
    "Q3 2023",
    "revenue increase",
    "user engagement",
    "market expectations",
    "product strategy",
    "market positioning",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ]
}


## 4. Implementing structured ouputs using Gemini and Pydantic

Using Gemini's `GenerateContentConfig` we can enforce the output as a Pydantic object without any special prompt engineering.

We can instruct the model to do this by setting `response_mime_type` to `"application/json"` in the generation configuration, which forces the model's output to be a valid JSON object and the `response_schema` to our Pydantic object.

**Note:** If you use only the `response_mime_type="application/json"` setting you can output raw JSON formats.

In [12]:
config = types.GenerateContentConfig(
    response_mime_type="application/json", response_schema=DocumentMetadata
)

prompt = f"""
Analyze the following document and extract its metadata.

Document:
--- 
{DOCUMENT}
--- 
"""

response = client.models.generate_content(
    model=MODEL_ID, contents=prompt, config=config
)
print(f"Type of the response: `{type(response.parsed)}`")
print(response.parsed.model_dump_json(indent=2))

Type of the response: `<class '__main__.DocumentMetadata'>`
{
  "summary": "The Q3 2023 earnings report details a strong financial performance, with a 20% revenue increase and 15% user engagement growth, exceeding market expectations. This success is attributed to effective product strategies, new market expansion, reduced customer acquisition costs, and improved retention rates.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Business Growth",
    "Market Analysis",
    "Q3 2023"
  ],
  "keywords": [
    "Q3 2023",
    "Revenue increase",
    "User engagement",
    "Market expectations",
    "Product strategy",
    "Digital services",
    "Customer acquisition costs",
    "Retention rates",
    "Cash flow"
  ]
}
