# Video Description with Pydantic AI and Gemini

This notebook demonstrates how to use Pydantic AI with Gemini 2.5 Flash to generate detailed descriptions of video content.

## Import Libraries and Load Environment

Before starting, make sure you have placed your Google Gemini credentials in the `.env` file:

```bash
cp env.example .env
```
then edit `.env` and modify GEMINI_API_KEY with your key.

In [None]:
import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from pydantic_ai import Agent, BinaryContent
from IPython.display import Video, display

# Load environment variables
assert load_dotenv(), "Please prepare a .env file with your GEMINI_API_KEY"
assert os.getenv("GEMINI_API_KEY"), "GEMINI_API_KEY not found in .env file"

# This is needed to use asyncio within jupyter
import nest_asyncio

nest_asyncio.apply()

## Define Response Schema

We'll use Pydantic models to structure our video description output. This ensures consistent, structured responses that can be easily processed downstream. For example, you could use this in a content management system to automatically tag and categorize videos.

In [2]:
from typing import List
from typing_extensions import Literal


class VideoDescription(BaseModel):
    """Structured output for video content analysis."""

    summary: str = Field(
        description="A concise 2-3 sentence summary of what happens in the video"
    )

    quality: Literal[
        "poor",
        "ok",
        "good",
    ] = Field(description="Overall technical quality of the video (e.g., resolution, stability)")
    
    main_subjects: List[str] = Field(
        description="Key people, objects, or animals visible in the video"
    )
    
    setting: str = Field(
        description="Description of the location/environment where the video takes place"
    )
    
    visual_style: str = Field(
        description="Description of the video's visual characteristics (lighting, colors, camera work, etc.)"
    )

## Create Pydantic AI Agent

Set up the agent with Gemini for video content analysis.

In [3]:
from pydantic_ai.models.google import GoogleModelSettings


# Remove thinking to avoid long delays and timeouts
settings = GoogleModelSettings(
    google_thinking_config={"thinking_budget": 0},
)

video_agent = Agent(
    model="gemini-2.5-flash-lite",
    output_type=VideoDescription,
    instructions="""
    You are an expert video content analyzer. Watch the provided video carefully and provide:
    1. A clear summary of what happens
    2. Identification of main subjects (people, objects, animals)
    3. Description of the setting/environment
    4. Key actions taking place
    5. Visual style characteristics
    6. Overall mood and tone
    7. Duration estimate
    8. Any notable or interesting details

    Be detailed but concise. Focus on what you can actually observe in the video.
    """,
    model_settings=settings,
)

## Helper Function for Video Processing

Let's define a helper function to load and format video files for analysis.

In [4]:
def load_video_for_analysis(video_path):
    """Load video file and format it for Pydantic AI."""
    with open(video_path, 'rb') as f:
        video_bytes = f.read()
    
    # Create binary content for Pydantic AI
    # (see note at the end about using File API for longer videos)
    video_content = BinaryContent(
        data=video_bytes,
        media_type='video/mp4'  # Adjust based on your video format
    )
    
    return video_content

## Video Analysis

Analyze a video file to see the structured description output. Make sure to place your video file in the same directory as this notebook.

In [5]:
from pprint import pprint

# Replace with your video file path
video_path = "dog.mp4"  # Change this to your video file

# Display the video in the notebook
print("Original video:")
display(Video(video_path))

# Load and analyze the video
video_content = load_video_for_analysis(video_path)

print("\nAnalyzing video content...")
result = video_agent.run_sync(
    ["Please provide a detailed description of this video.", video_content]
)

print("\nVideo Analysis Results:")

pprint(result.output.model_dump(), width=80, depth=None)

Original video:



Analyzing video content...

Video Analysis Results:
{'main_subjects': ['dog', 'yellow cones', 'person'],
 'quality': 'good',
 'setting': 'outdoor grassy field with agility equipment',
 'summary': 'A dog is seen running through a series of yellow cones on a '
            'grassy field. A person is visible in the background, seemingly '
            'guiding or encouraging the dog.',
 'visual_style': 'daylight, eye-level shot, clear focus on the dog and cones'}


## Analyzing longer videos

If you want to analyze a longer video, you will need to use the File API:
```python
myfile = client.files.upload(file="path/to/video.mp4")

result = video_agent.run_sync(
    ["Please provide a detailed description of this video.", myfile]
)
```
Unfortunately, the file API is not supported in our testing environment.