# Exercise: Video Salient Moments Detection with Pydantic AI and Gemini

In this exercise you will build a system to detect salient moments in videos, based on Google Gemini.

## Import Libraries and Load Environment

Before starting, make sure you have placed your Google Gemini credentials in the `.env` file:

```bash
cp env.example .env
```
then edit `.env` and modify GEMINI_API_KEY with your key.

In [None]:
import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from pydantic_ai import Agent, BinaryContent
from IPython.display import Video, display
from typing import List, Optional

# Load environment variables
assert load_dotenv(), "Please prepare a .env file with your GEMINI_API_KEY"
assert os.getenv("GEMINI_API_KEY"), "GEMINI_API_KEY not found in .env file"

# This is needed to use asyncio within jupyter
import nest_asyncio

nest_asyncio.apply()

## Define Response Schema

We'll use Pydantic models to structure our salient moments detection. This structured output makes it easy to create video highlights, generate thumbnails at key moments, or build interactive video players with chapter markers.

In this exercise we will build a nested schema, where a video-level pydantic object contains a list of SalientMoment objects, where each salient moment is described with its own schema. Let's do it together:

In [None]:
from typing_extensions import Literal


# TODO: Define a SalientMoment schema (aka data model) with the following fields:
# - timestamp: str - Approximate timestamp when this moment occurs
# - moment_type: Literal[...] - Category of salient moment. A classification from
#   "action_peak", "emotional_highlight", "visual_transition", "climax", "scene_change", "other"
# - description: str - Clear description of what makes this moment significant
# - visual_cues: List[str] - Visual elements that make this moment stand out
class SalientMoment(BaseModel):
    """A single salient moment in the video."""

    ... #complete


# TODO: This is now the video-level schema that contains a list of SalientMoment objects.
class VideoSalientMoments(BaseModel):
    """Complete analysis of salient moments in a video."""

    video_summary: str = Field(
        description="Brief 1-2 sentence summary of the overall video content"
    )
    
    total_moments_found: int = Field(
        description="Number of salient moments identified"
    )
    
    # TODO: make `moments` a List of SalientMoment objects, and use 
    # Field(...) to add an appropriate description.
    # HINT: use List[SalientMoment] as type
    moments: ... #complete
    
    most_significant_moment: Optional[str] = Field(
        description="Timestamp of the single most important moment, if one stands out"
    )
    
    pacing_analysis: str = Field(
        description="Brief analysis of how the salient moments are distributed throughout the video"
    )

## Create Pydantic AI Agent

Set up the agent with Gemini for salient moments detection.

In [None]:
from pydantic_ai.models.google import GoogleModelSettings


# Remove thinking to avoid long delays and timeouts
settings = GoogleModelSettings(
    google_thinking_config={"thinking_budget": 0},
)

# TODO: set up the agent. Make sure to set:
# - model to "gemini-2.5-flash-lite"
# - output_type to VideoSalientMoments
# - instructions to guide the model to identify salient moments as per the exercise description.
#.  We provide an example, but feel free to play with it!
# - model_settings to the settings defined above
moments_agent = Agent(
    model=..., #complete
    output_type=..., #complete
    instructions="""
    You are an expert video analyst specializing in identifying salient moments. 
    Watch the video carefully and identify:
    
    1. Key moments that would be most interesting to viewers
    2. Points where significant changes occur (visual, emotional, narrative)
    3. Peaks of action, emotion, or visual interest
    4. Important transitions or reveals
    5. Moments that would make good thumbnails or highlights
    
    Focus on moments that:
    - Capture viewer attention
    - Represent turning points
    - Show peak action or emotion
    - Have strong visual impact
    - Advance the story or message
    
    Provide approximate timestamps and be specific about why each moment is salient.
    """,
    model_settings=... #complete
)

## Helper Function for Video Processing

Let's define a helper function to load and format video files for salient moments analysis.

In [None]:
from pprint import pprint


def load_video_for_moments_analysis(video_path):
    """Load video file and format it for Pydantic AI."""
    with open(video_path, 'rb') as f:
        video_bytes = f.read()
    
    # Create binary content for Pydantic AI
    video_content = BinaryContent(
        data=video_bytes,
        media_type='video/mp4'  # Adjust based on your video format
    )
    
    return video_content

## Video Analysis

Analyze a video to identify its salient moments. Use the `../cross.mp4` file that we provide.

In [None]:
# Replace with your video file path
video_path = "../cross.mp4"  # Change this to your video file

# Display the video in the notebook
print("Original video:")
display(Video(video_path, width=400))

# Load and analyze the video for salient moments
video_content = load_video_for_moments_analysis(video_path)

print("\nAnalyzing salient moments...")
result = moments_agent.run_sync([
    "Identify and analyze the salient moments in this video.", 
    video_content
])

print("\nSalient Moments Analysis:")
pprint(result.output.model_dump())

This structured approach to salient moment detection makes it easy to build automated video editing tools, create dynamic thumbnails, or generate chapter markers for long-form content.

For example, you could analyze a lot of footage of motocross and find salient moments like this one, then cut them out and mount them together in a highlight video.