# Video Temporal Understanding with Amazon Nova Premier

This notebook demonstrates the powerful video analysis capabilities of Amazon Nova Premier, focusing on temporal understanding - the model's ability to identify sequences of events, understand timing relationships, and extract meaningful insights from video content over time.

## What You'll Learn

In this notebook, you will explore how Amazon Nova Premier analyzes video content to extract temporal information, identify key events, and understand sequences of actions. The notebook uses:

1. **Amazon S3** - For storing and accessing video files
2. **Amazon Bedrock** - For invoking Amazon Nova Premier via the API

You'll learn how to:
- Process video content using S3 URI references
- Extract precise temporal information from videos
- Create structured outputs from video analysis
- Identify actions and events at specific timestamps
- Generate comprehensive action timelines

## Environment Setup

Let's begin by setting up our environment with the necessary dependencies and configurations.

### Install Required Packages

First, let's install the necessary Python packages for handling video and AWS services.

In [None]:
# All dependencies are installed from requirements.txt in module 1
# No need to install them again

In [None]:
%store -r

In [None]:
import boto3
from IPython.display import Video
import json
import time
import os

# Set up AWS clients using stored region name
boto3.setup_default_session(region_name=region_name)

# Initialize AWS service clients
account_id = boto3.client("sts").get_caller_identity().get("Account")
s3_client = boto3.client("s3")
bedrock_client = boto3.client("bedrock-runtime", region_name="us-west-2") 

# Define bucket name and video path
bucket_name = f"mmu-workshop-{account_id}"
video_path = "video/Meridian_Clip.mp4"

# Check if bucket exists and create video folder if needed
r = s3_client.list_buckets(Prefix=bucket_name)
if r["Buckets"][0]["Name"].startswith(bucket_name):
    bucket_name = r["Buckets"][0]["Name"]
    s3_client.put_object(Bucket=bucket_name, Key="video/")
    print(f"Successfully created video/ folder in {bucket_name}")

### Upload Video to Amazon S3

Next, we'll upload our sample video to S3 so that Nova Premier can access it via S3 URI.

### About Our Sample Video

For this demonstration, we'll use a clip from "Meridian," a short film from Netflix Open Content. This video contains several interesting temporal elements:

- A vintage car driving on a winding mountain road
- Rain beginning to fall partway through the clip
- A character appearing in the rearview mirror
- Various camera angle changes

The video file `Meridian_Clip.mp4` will be uploaded to your S3 bucket for processing by Amazon Nova Premier. The notebook includes code to handle this upload process automatically.

In [None]:
def interactive_sleep(seconds: int):
    """Display an interactive progress indicator with dots."""
    dots = ""
    for _ in range(seconds):
        dots += "."
        print(dots, end="\r")
        time.sleep(1)


def upload_directory(path, bucket_name, s3_path):
    """
    Upload all files from a local directory to an S3 bucket.
    
    Args:
        path (str): Local directory path to upload
        bucket_name (str): S3 bucket name
        s3_path (str): Target path in S3 bucket
    """
    for root, _, files in os.walk(path):
        for file in files:
            local_file_path = os.path.join(root, file)
            print(local_file_path)
            s3_key = os.path.join(s3_path, os.path.relpath(local_file_path, path))
            # Upload the file with the new S3 key
            s3_client.upload_file(local_file_path, bucket_name, s3_key)


# Upload video directory contents to S3
upload_directory("video", bucket_name, "video")

### Preview the Video

Let's view the video clip to get familiar with its content before analyzing it with Nova Premier.

In [None]:
#View the video within the notebook

Video(video_path, width=400)

## Video Analysis with Amazon Nova Premier

Amazon Nova Premier offers sophisticated capabilities for analyzing video content. Unlike simpler approaches that might just extract frames and process them individually, Nova Premier can understand the temporal relationships between events and the narrative flow of video content.

The model can process videos via S3 URIs, which allows it to analyze videos of significant size (up to 1GB) without running into payload size limitations. This approach is particularly valuable for real-world applications where videos may be lengthy or high resolution.

In [None]:
# Define the model ID for Amazon Nova Premier
PREMIER_MODEL_ID = "us.amazon.nova-premier-v1:0"

In [None]:
# Create the S3 URI for the video
# Nova Premier can access videos directly from S3 using this format
uri = f"s3://{bucket_name}/{video_path}"
print(f"Video S3 URI: {uri}")

### Task 1: Summarizing Video Content

First, let's ask Amazon Nova Premier to create a concise summary of the video, identifying key moments and elements in the scene. This demonstrates the model's ability to understand the overall narrative structure of video content.

Video summarization has many practical applications:
- Content cataloging and indexing
- Creating accessible descriptions for visually impaired users
- Generating metadata for search and recommendation systems
- Quick content previews for users

In [None]:
def invoke_nova_video(prompt, system_message=None, temperature=0.3):
    """
    Send a video to Amazon Nova Premier with a specific prompt.
    
    Args:
        prompt (str): The prompt to send to the model
        system_message (str, optional): System message for context
        temperature (float, optional): Temperature for model inference
        
    Returns:
        tuple: Full model response and the text content
    """
    # Default system message if none provided
    if system_message is None:
        system_message = """
        You are an expert video and media analyst. You analyze video 
        to extract detailed fact based insights accurately.
        """
    
    # Format system message
    system_list = [{"text": system_message}]
    
    # Format user message with video and prompt
    message_list = [
        {
            "role": "user",
            "content": [
                {
                    "video": {
                        "format": "mp4",
                        "source": {
                            "s3Location": {
                                "uri": uri
                            }
                        }
                    }
                },
                {"text": prompt}
            ]
        }
    ]
    
    # Set inference parameters
    inf_params = {
        "maxTokens": 1024, 
        "topP": 0.1, 
        "topK": 20, 
        "temperature": temperature
    }
    
    # Prepare the request
    native_request = {
        "schemaVersion": "messages-v1",
        "messages": message_list,
        "system": system_list,
        "inferenceConfig": inf_params,
    }
    
    # Invoke the model
    response = bedrock_client.invoke_model(
        modelId=PREMIER_MODEL_ID, 
        body=json.dumps(native_request)
    )
    
    # Parse the response
    model_response = json.loads(response["body"].read())
    
    # Extract the text content
    content_text = model_response["output"]["message"]["content"][0]["text"]
    
    return model_response, content_text


# Define expert video analyst system message
system_message = """
You are an expert video and media analyst. You analyze video to extract detailed fact based insights accurately.
"""

# Ask for a concise summary of the video
prompt = "Create a concise summary of this video. Identify and describe the key moments or events, limiting your summary to 5 main points in bullet points."

# Invoke the model
model_response, content_text = invoke_nova_video(prompt, system_message)

# Display the results
print("[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(content_text)

### Task 2: Temporal Event Detection

A key capability of Amazon Nova Premier is identifying precisely when specific events occur in a video. This temporal understanding is crucial for many applications:

- **Content Moderation**: Identifying when potentially problematic content appears
- **Highlight Extraction**: Pinpointing exciting moments in sports or entertainment content
- **Video Indexing**: Creating searchable timestamps for specific actions or events
- **Scene Detection**: Identifying transitions between different scenes or settings
- **Behavioral Analysis**: Tracking when specific actions or behaviors occur

Let's explore this capability through several examples, asking the model to identify when specific events happen in our sample video.

#### Example 1: Weather Event Detection

Let's ask Nova Premier to identify exactly when it begins to rain in the video. This tests the model's ability to detect subtle environmental changes.

In [None]:
# Detect when rain begins in the video
prompt = "Identify when it begins to rain in the video. Output your response as a timestamp with the format MM:SS"

# Invoke the model
model_response, content_text = invoke_nova_video(prompt, system_message)

# Display the results
print("[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(content_text)

#### Example 2: Character Appearance Detection

Now let's ask when a specific character first appears in the video. This demonstrates the model's ability to track characters throughout a video sequence.

In [None]:
# Detect when a woman first appears in the video
prompt = "At what point in the video does a women first appear. Output your response as a timestamp with the format MM:SS"

# Invoke the model
model_response, content_text = invoke_nova_video(prompt, system_message)

# Display the results
print("[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(content_text)

#### Example 3: Cinematography Analysis

Let's explore the model's understanding of cinematography by asking it to identify when a specific type of camera shot appears in the video.

In [None]:
# Detect when we see a close-up shot of the man
prompt = "At what point in the video do we see a close up shot of the man in the video. Output your response as a timestamp with the format MM:SS"

# Invoke the model
model_response, content_text = invoke_nova_video(prompt, system_message)

# Display the results
print("[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(content_text)

### Task 3: Comprehensive Action Mapping

For sophisticated video analysis applications, we often need to map all actions across an entire video. This creates a complete temporal understanding of the content, enabling:

- **Video Search**: Making video content searchable by action or event
- **Content Navigation**: Allowing users to jump to specific events
- **Automated Summarization**: Creating timestamped summaries of key moments
- **Accessibility Features**: Generating enhanced descriptions with timing information
- **Behavioral Analysis**: Understanding sequences and patterns of actions

Amazon Nova Premier can generate structured outputs that identify actions throughout a video's duration, creating a complete temporal map of events.

#### Structured Action Timeline Generation

Let's ask Amazon Nova Premier to create a comprehensive timeline of all human actions occurring throughout the video, with precise timestamps. 

For this example, we'll demonstrate how to guide the model to produce a structured JSON output format that could be easily consumed by downstream applications. This approach is particularly valuable for creating programmatically accessible video analysis results.

In [None]:
# Define a prompt for generating a structured timeline of actions
prompt = """
Analyze the video and identify all human actions or activities occurring throughout its duration. 

Follow these guidelines for your task:
1. List each action with its corresponding timestamp range.
2. Describe each action succinctly
3. Output the timestamp in MM:SS format.
4. DO NOT list identical actions consecutively in your output
5. Your output should be in the following sample json schema:
    {
    "actions": [
        {
            "action": "the teacher enters the room",
            "timestamp": "00:15"
        },
        {
            "action": "the students sit down", 
            "timestamp": "00:32"

        }
    ]
}
"""

# Invoke the model with structured output request
model_response, content_text = invoke_nova_video(prompt, system_message)

# Display the results
print("[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(content_text)

# Optional: Parse the JSON response to work with it programmatically
try:
    # Extract the JSON part from the response (removing code block markers if present)
    json_text = content_text.strip()
    if json_text.startswith("```json"):
        json_text = json_text.replace("```json", "", 1)
    if json_text.endswith("```"):
        json_text = json_text.replace("```", "", 1)
    
    actions_data = json.loads(json_text)
    print("\n[Parsed Action Count]")
    print(f"Successfully parsed {len(actions_data['actions'])} actions from the timeline")
except Exception as e:
    print(f"\nError parsing JSON output: {e}")

# Conclusion

In this notebook, we've explored the powerful video analysis capabilities of Amazon Nova Premier, with a particular focus on temporal understanding. These capabilities enable a wide range of applications that require understanding when and how events unfold in video content.

## Key Capabilities Demonstrated

### 1. Comprehensive Video Content Understanding
Amazon Nova Premier demonstrated sophisticated understanding of video content including:
- Scene recognition and narrative comprehension
- Character identification and relationships
- Environmental context and setting details
- Cinematographic elements and techniques

### 2. Precise Temporal Localization
The model showed remarkable ability to pinpoint specific events with accurate timestamps:
- Environmental changes (rain starting)
- Character appearances and actions
- Camera technique transitions
- Narrative developments

### 3. Structured Output Generation
We demonstrated how to guide the model to produce structured outputs that:
- Follow specific JSON formats
- Include precise timing information
- Organize events chronologically
- Describe actions concisely and consistently

## Potential Applications

The capabilities demonstrated in this notebook can be applied to many real-world scenarios:

1. **Content Moderation**
   - Identifying timestamps of potentially problematic content
   - Flagging specific moments for human review

2. **Media Production**
   - Automated scene indexing and cataloging
   - Finding specific shots or camera angles

3. **Educational Content**
   - Creating navigation points for instructional videos
   - Generating timestamped summaries of presentations

4. **Accessibility**
   - Enhanced audio descriptions with precise timing
   - Better navigation for users with visual impairments

5. **Content Discovery**
   - Making video content searchable by specific actions or events
   - Generating preview highlights automatically

## Next Steps

For more information on video understanding with Amazon Nova models, refer to the [AWS Video Understanding documentation](https://docs.aws.amazon.com/nova/latest/userguide/prompting-video-understanding.html). This resource provides additional guidance on prompt engineering for optimal video analysis results.