# TwelveLabs Pegasus on Amazon Bedrock Workshop

TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. TwelveLabs' advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies. Amazon Bedrock now offers two TwelveLabs models: TwelveLabs Pegasus 1.2, which provides comprehensive video understanding and analysis, and TwelveLabs Marengo Embed 2.7, which generates high-quality embeddings for video, text, audio, and image content. These models empower developers to build applications that can intelligently process, analyze, and derive insights from video data at scale.

### TwelveLabs Video Understanding Models
TwelveLabs’ video understanding models consist of a family of deep neural networks built on our multimodal foundation model for video understanding that you can use for the following downstream tasks:
- Search using natural language queries
- Analyze videos to generate text

Videos contain multiple types of information, including visuals, sounds, spoken words, and texts. The human brain combines all types of information and their relations with each other to comprehend the overall meaning of a scene. For example, you’re watching a video of a person jumping and clapping, both visual cues, but the sound is muted. You might realize they’re happy, but you can’t understand why they’re happy without the sound. However, if the sound is unmuted, you could realize they’re cheering for a soccer team that scored a goal.

Thus, an application that analyzes a single type of information can’t provide a comprehensive understanding of a video. TwelveLabs’ video understanding models, however, analyze and combine information from all the modalities to accurately interpret the meaning of a video holistically, similar to how humans watch, listen, and read simultaneously to understand videos.

Our video understanding models have the ability to identify, analyze, and interpret a variety of elements, including but not limited to the following:
| Element | Modality | Example |
|---------|----------|---------|
| People, including famous individuals | Visual | Michael Jordan, Steve Jobs |
| Actions | Visual | Running, dancing, kickboxing |
| Objects | Visual | Cars, computers, stadiums |
| Animals or pets | Visual | Monkeys, cats, horses |
| Nature | Visual | Mountains, lakes, forests |
| Text displayed on the screen (OCR) | Visual | License plates, handwritten words, number on a player's jersey |
| Brand logos | Visual | Nike, Starbucks, Mercedes |
| Shot techniques and effects | Visual | Aerial shots, slow motion, time-lapse |
| Counting objects | Visual | Number of people in a crowd, items on a shelf, vehicles in traffic |
| Sounds | Audio | Chirping (birds), applause, fireworks popping or exploding |
| Human speech | Audio | "Good morning. How may I help you?" |
| Music | Audio | Ominous music, whistling, lyrics |

### Modalities
Modalities represent the types of information that the models process and analyze in a video. These modalities are central to both indexing and searching video content.

The models support the following modalities: 

- **Visual**: Analyzes visual content in a video, including actions, objects, events, text (through Optical Character Recognition, or OCR), and brand logos.
- **Audio**: Analyzes audio content in a video, including ambient sounds, music, and human speech.

## Part 0: Setup

### Dependencies

In [None]:
%pip install -r requirements.txt -Uq

In [None]:
import boto3, botocore
import json
import re
from IPython.display import HTML, display

### Configure boto3

In [None]:
AWS_REGION = "us-west-2" # TODO: Replace with your AWS region

In [None]:
# Initialize AWS session
session = boto3.Session(profile_name='default') # TODO: Replace with your AWS profile

# Initialize AWS clients
bedrock_client = session.client('bedrock-runtime', region_name=AWS_REGION)
s3_client = session.client('s3')

### Configure S3

In [None]:
# S3 Configuration
S3_BUCKET_NAME = "<YOUR_S3_BUCKET_NAME>" # TODO: Replace with your S3 bucket name
S3_VIDEOS_PATH = "videos"

# Validate S3 bucket name
if S3_BUCKET_NAME == "<YOUR_S3_BUCKET_NAME>" or S3_BUCKET_NAME == "":
    raise ValueError("Please replace <YOUR_S3_BUCKET_NAME> with your S3 bucket name")

#### Set up sample dataset to S3 bucket

In [None]:
# AWS Account ID for S3 bucket ownership
aws_account_id = session.client('sts').get_caller_identity()["Account"]

print(f"AWS Account ID: {aws_account_id}")
print(f"S3 Bucket: {S3_BUCKET_NAME}")
print(f"S3 Videos Path: {S3_VIDEOS_PATH}")

# Verify bucket access
try:
    s3_client.head_bucket(Bucket=S3_BUCKET_NAME)
    print(f"✅ Successfully connected to S3 bucket: {S3_BUCKET_NAME}")
except Exception as e:
    print(f"❌ Error accessing S3 bucket: {e}")
    print("Please ensure the bucket exists and you have proper permissions.")

#### Netflix Open Content

The [Netflix Open Content](https://opencontent.netflix.com/) is an open source content available under the [Creative Commons Attribution 4.0 International Public License](https://www.google.com/url?q=https%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby%2F4.0%2Flegalcode&sa=D&sntz=1&usg=AOvVaw3DDX6ldzWtAO5wOs5KkByf).

The assets are available for download at: http://download.opencontent.netflix.com/

We will be utilizing a subset of the videos for demonstrating how to utilize the TwelveLabs models on Amazon Bedrock.

In [None]:
# Sample video S3 URIs
sample_videos = [
    # 's3://download.opencontent.netflix.com/TechblogAssets/CosmosLaundromat/encodes/CosmosLaundromat_2048x858_24fps_SDR.mp4',
    # 's3://download.opencontent.netflix.com/TechblogAssets/Meridian/encodes/Meridian_3840x2160_5994fps_SDR.mp4',
    's3://download.opencontent.netflix.com/TechblogAssets/Sparks/encodes/Sparks_4096x2160_5994fps_SDR.mp4'
]

In [None]:
# Unsigned S3 client
public_s3_client = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED))

In [None]:
def parse_s3_uri(s3_uri: str) -> tuple[str, str]:
    """
    Parses an S3 URI like s3://bucket-name/path/to/object and returns (bucket, key)

    Args:
        s3_uri (str): The S3 URI to parse
        
    Returns:
        tuple[str, str]: The bucket and key
    """
    pattern = r'^s3://([^/]+)/(.+)$'
    match = re.match(pattern, s3_uri)
    if not match:
        raise ValueError(f"Invalid S3 URI format: {s3_uri}")
    return match.group(1), match.group(2)


def copy_public_s3_object_to_private_bucket(public_s3_uri: str, dest_bucket: str, dest_key: str, aws_profile: str = 'default') -> None:
    """
    Copies a public S3 object to a private bucket

    Args:
        public_s3_uri (str): The S3 URI of the public object to copy
        dest_bucket (str): The name of the private bucket to copy to
        dest_key (str): The key of the object to copy to
        aws_profile (str): The AWS profile to use for the authenticated client
    """

    # Parse source bucket and key
    source_bucket, source_key = parse_s3_uri(public_s3_uri)

    # Anonymous client to read public object
    anon_s3 = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED))

    print(f"Downloading from {public_s3_uri}...")
    response = anon_s3.get_object(Bucket=source_bucket, Key=source_key)
    data = response['Body'].read()

    print(f"Uploading to s3://{dest_bucket}/{dest_key} ...")
    s3_client.put_object(Bucket=dest_bucket, Key=dest_key, Body=data)

    print("✅ Copy completed successfully!")

In [None]:
# Copy videos to the S3 bucket
for video_uri in sample_videos:
    # Extract the filename from the S3 key
    _, src_key = parse_s3_uri(video_uri)
    filename = src_key.split("/")[-1]
    dest_key = f"{S3_VIDEOS_PATH}/{filename}"
    copy_public_s3_object_to_private_bucket(
        public_s3_uri=video_uri,
        dest_bucket=S3_BUCKET_NAME,
        dest_key=dest_key
    )

### Bedrock model access


Follow the [Bedrock model access documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) to enable access to TwelveLabs models on Bedrock. Make sure to enable access in the same region you are running this workshop.

In [None]:
MODEL_ID_REGIONS = {
    "us-west-2": "us.twelvelabs.pegasus-1-2-v1:0",
    "eu-west-1": "eu.twelvelabs.pegasus-1-2-v1:0"
}

In [None]:
try:
    MODEL_ID = MODEL_ID_REGIONS[AWS_REGION]
    print(MODEL_ID)
except KeyError:
    raise ValueError(f"Pegasus 1.2 is not supported for {AWS_REGION}")

## Part 1: Using Pegasus on Bedrock

#### Select the video to analyze

In [None]:
s3_response = s3_client.list_objects_v2(Bucket=S3_BUCKET_NAME, Prefix=S3_VIDEOS_PATH)

# List all object keys
if 'Contents' in s3_response:
    object_keys = [obj['Key'] for obj in s3_response['Contents']]
    for key in object_keys:
        print(key)
    print(f"\nTotal objects found: {len(object_keys)}")
else:
    print("No objects found in the specified bucket and prefix.")


In [None]:
video_s3_key = "<YOUR_VIDEO_S3_KEY>" # TODO: Replace with your video S3 key

# Validate video S3 key
if video_s3_key == "<YOUR_VIDEO_S3_KEY>" or video_s3_key == "":
    raise ValueError("Please replace <YOUR_VIDEO_S3_KEY> with your video S3 key")

In [None]:
def play_video(video_url: str, start_time: float) -> None:
    """
    Play a video at a specific start time.

    Args:
        video_url (str): The URL of the video to play.
        start_time (float): The start time of the video in seconds.
    """

    # HTML code for the video player
    html_code = f"""
    <video width="640" controls>
        <source src="{video_url}#t={start_time}" type="video/mp4">
    </video>
    """
    display(HTML(html_code))

#### View the video

In [None]:
# Generate presigned URL for the video
presigned_url = s3_client.generate_presigned_url(
    "get_object",
    Params={"Bucket": S3_BUCKET_NAME, "Key": video_s3_key},
    ExpiresIn=3600
)

# Play the video
play_video(presigned_url, 0)

### Part 1a: Analyze with Pegasus on Bedrock

#### TwelveLabs Pegasus

Pegasus is a generative model for video-to-text generation. Pegasus analyzes multiple modalities to generate contextually relevant text based on the content of your videos.

***Key features***
- **Video-to-text generation**: Creates detailed textual descriptions based on video content
- **Extended processing capacity**: Processes videos up to 1 hour in length
- **Granular visual comprehension**: Analyzes objects, on-screen text, and numerical content
- **Temporal grounding**: Accurately identifies timestamps of specific events
- **Multimodal understanding**: Combines visual, audio, and textual information for comprehensive analysis

***Use cases***
- **Content summarization**: Generate concise summaries of video content
- **Detailed descriptions**: Create comprehensive textual descriptions of visual scenes
- **Timestamp identification**: Answer questions about when specific events occur in videos
- **Content analysis**: Extract key information from video content for further processing


#### Pegasus 1.2 on Bedrock

The TwelveLabs Pegasus 1.2 model provides comprehensive video understanding and analysis capabilities. It can analyze video content and generate textual descriptions, insights, and answers to questions about the video.

Use this information to make inference calls to TwelveLabs models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations.

- Provider — TwelveLabs
- Categories — Video understanding, content analysis
- Model ID — `twelvelabs.pegasus-1-2-v1:0`
- Input modality — Video
- Output modality — Text
- Max video size — 1 hour long video (< 2GB file size)

**Resources:**
- [AWS Docs: TwelveLabs Pegasus 1.2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-pegasus.html)
- [TwelveLabs Docs: Pegasus](https://docs.twelvelabs.io/v1.3/docs/concepts/models/pegasus)

In [None]:
prompt = "What is the video about?"

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0
}

response = bedrock_client.invoke_model(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Parse the response and print the model outputs
response_body = json.loads(response.get("body").read())
print(response_body["message"])

### Part 1b: Pegasus streaming response

In [None]:
prompt = "What is the video about?"

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0
}

streaming_response = bedrock_client.invoke_model_with_response_stream(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Extract and print the response text in real-time.
message = ""
for event in streaming_response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    print(chunk["message"], end="")

print(message)

## Part 2: Video Analysis with Pegasus

Pegasus analyzes videos to generate text based on their content using a multimodal approach. This method analyzes the visuals, sounds, spoken words, and relationships between them. As a result, it provides a comprehensive understanding of your videos, capturing nuances that might be overlooked when using an unimodal interpretation.

The platform generates the following types of text:
- **Topics and hashtags:** Represent a swift breakdown of the essence of a video.
- **Summaries:** Encapsulate the key points of a video, presenting the most important information clearly and concisely.
- **Highlights:** List the key events in order. Unlike chapters, they spotlight primary topics.
- **Chapters:** A chapter in a video typically focuses on a particular topic or theme. The platform chronologically lists all the chapters in your video for a thorough content breakdown.
- **Open-ended text (your own prompt):** Custom outputs based on your prompts, including, but not limited to, tables of content, action items, memos, reports, marketing copy, and comprehensive analyses.


### Part 2a: Summaries, hashtags, and highlights

In [None]:
# Generate a summary of the video
prompt = "Summarize the video"

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0
}

streaming_response = bedrock_client.invoke_model_with_response_stream(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Extract and print the response text in real-time.
message = ""
for event in streaming_response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    print(chunk["message"], end="")

print(message)

In [None]:
# Generate relevant hashtags for the video
prompt = "Generate hashtags for the video"

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0
}

streaming_response = bedrock_client.invoke_model_with_response_stream(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Extract and print the response text in real-time.
message = ""
for event in streaming_response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    print(chunk["message"], end="")

print(message)

In [None]:
# Generate highlights of the video
prompt = "What are the highlighted moments of this video?"

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0
}

streaming_response = bedrock_client.invoke_model_with_response_stream(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Extract and print the response text in real-time.
message = ""
for event in streaming_response["body"]:
    chunk = json.loads(event["chunk"]["bytes"])
    print(chunk["message"], end="")

print(message)

### Part 2b: Structured outputs

Structured outputs for Pegasus lets users specify the structured output format as a JSON schema. Structured outputs can be useful for building automated integrations to application workflows such as metadata enrichment.

In [None]:
# Using JSON Schema to generate structured output
prompt = """
Generate metadata for the video with the following fields:
- title
- description
- mood
- genre
"""

request_body = {
    "inputPrompt": prompt,
    "mediaSource": {
        "s3Location": {
            "uri": f"s3://{S3_BUCKET_NAME}/{video_s3_key}",
            "bucketOwner": aws_account_id
        }
    },
    "temperature": 0,
    "responseFormat": {
        "jsonSchema": {
            "name": "video_metadata",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "string",
                        "description": "The title of the video"
                    },
                    "description": {
                        "type": "string",
                        "description": "The description of the video"
                    },
                    "mood": {
                        "type": "string",
                        "description": "The mood of the video"
                    },
                    "genre": {
                        "type": "string",
                        "description": "The genre of the video"
                    }
                },
                "required": ["title", "description", "mood", "genre"]
            }
        }
    }
}

response = bedrock_client.invoke_model(
    modelId=MODEL_ID,
    body=json.dumps(request_body),
    contentType="application/json",
    accept="application/json"
)

# Parse the response and print the model outputs
response_body = json.loads(response.get("body").read())
message_data = json.loads(response_body["message"])
print(json.dumps(message_data, indent=4))

## Cleanup

#### Empty S3 bucket

In [None]:
# List objects and prepare for deletion
response = s3_client.list_objects_v2(Bucket=S3_BUCKET_NAME)

# Empty S3 bucket
try:
    if 'Contents' in response:
        objects_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
        s3_client.delete_objects(
            Bucket=S3_BUCKET_NAME,
            Delete={'Objects': objects_to_delete}
        )
        print(f"✅ Bucket '{S3_BUCKET_NAME}' emptied successfully.")
    else:
        print(f"✅ Bucket '{S3_BUCKET_NAME}' is already empty.")
except Exception as e:
    print(f"❌ Error emptying bucket '{S3_BUCKET_NAME}': {e}")