# Amazon Nova Premier - Temporal Understanding

##### In this notebook, you will interact with Amazon Nova Premier to complete some temporal understanding tasks using a video. You will need following services to complete the notebook:

1. Amazon S3 - You will store your video in Amazon S3.

2. Amazon Bedrock - You will access Amazon Nova Premier using the Amazon Bedrock Invoke Model API.

3. Amazon Trancribe - You will use Amazon Transcribe to extract the video transcript for video Q&A where Nova will use the video's visual and transcript to answer questions.



## Setup

##### Install the Python packages that this notebook uses.

In [None]:
!pip install webvtt-py boto3

##### Import libraries 

In [None]:
import sagemaker
from sagemaker import get_execution_role

#Amazon Bedrock imports
import boto3
from botocore.exceptions import ClientError

#Transcript extraction import
import webvtt

#Helper utilities
from IPython.display import  Video
import pprint
import shutil
import tempfile
import time
import json 
import base64
import logging



##### Upload your video to Amazon S3 and update the variables below


You will use a clip from Meridian, a film from Netflix Open Content, for this notebook.
In the 'Video' folder, locate the video 'Meridian_Clip.mp4.' Download the video and store it in an Amazon S3 bucket of your choice. Create a new bucket if needed and create a folder within the bucket to store the video. Also, create another folder within the bucket to store the transcript for the video (we will extract the transcript in this notebook, simply create an empty transcript folder for now).

In [None]:
bucket= "{bucket-name}"  #update with the name of your bucket
video_path= "{video-folder-name}"  #update with the folder you created to store the video
transcription_output_path= "{transcript-folder-name}" #update with the folder you created to store the transcript

 Create the boto3 clients for services used

In [None]:
#credentials and clients
aws_account_id  = boto3.client('sts').get_caller_identity()['Account']  

sess = sagemaker.Session()
role = get_execution_role()
print(sess)
print(role)


region = boto3.Session().region_name 
print(region)

s3_client = boto3.client('s3')
bedrock_client = boto3.client(service_name='bedrock-runtime', 
                              region_name=region)
transcribe_client = boto3.client('transcribe')

#### Note:
Ensure the IAM role you are using for your notebook, which is shown in the cell ouptut above as '(arn:aws:iam::{account ID}:role/{role name}' has the required permissions to access Amazon Bedrock, Amazon Transcribe, and read from your Amazon S3 bucket

Now, we'll define a function to list all videos in your bucket so that you can select the Meridian video clip

In [None]:
def get_videos(prefix):
    all_videos = []
    paginator = s3_client.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket=bucket, Prefix=prefix)
    for page in pages:
        for obj in page.get('Contents', []):
            all_videos.append(obj['Key'])
    return all_videos

In [None]:
# Search for and select the Meridian video from your Amazon S3 bucket

videos=get_videos(video_path)
selected_video=videos[1] #change the index number to find Meridian_Clip.mp4
print(selected_video)

In [None]:
#define a local path for the video 

local_path =selected_video.split('/')[-1]
print(local_path)

#download the video locally 

try:
    s3_client.download_file(bucket, selected_video, local_path)
    print(f"Successfully downloaded to {local_path}")
except Exception as e:
    print(f"Error downloading file: {e}")

In [None]:
#View the video within the notebook

Video(local_path)

### Analyze video with Nova Premier

In [None]:
#Define the variable for Amazon Nova Premier

PREMIER_MODEL_ID= "us.amazon.nova-premier-v1:0"

In [None]:
#Store the Amazon S3 uri in a variable to use in the payload to Nova

uri = "s3://{0}/{1}".format(bucket, selected_video)
print(uri)


#### Task 1: Summarize the video

In [None]:
#define a system role 

system_message= """

You are an expert video and media analyst. You analyze video to extract detailed fact based insights accurately.

"""


#Send video using Amazon S3 location to Amazon Nova with InvokeModel API.

system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": "Create a concise summary of this video. Identify and describe the key moments or events, limiting your summary to 5 main points in bullet points."
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)

#### Task 2: Identify events or items of interest

Prompt Amazon Nova Premier to identify when it begins to rain in the video

In [None]:
#Send video using Amazon S3 location to Amazon Nova with InvokeModel.

system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": "Identify when it begins to rain in the video. Output your response as a timestamp with the format MM:SS"
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)

Prompt Amazon Nova Premier to identify when a character appears

In [None]:
#Send video using Amazon S3 location to Amazon Nova with InvokeModel.

system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": "At what point in the video does a women first appear. Output your response as a timestamp with the format MM:SS"
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)

Prompt Amazon Nova Premier to identify specific type of camera shot

In [None]:
#Send video using Amazon S3 location to Amazon Nova with InvokeModel.

system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": "At what point in the video do we see a close up shot of the man in the video. Output your response as a timestamp with the format MM:SS"
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)

#### Task 3: Identify possible segments in the video

Prompt Amazon Nova Premier to identify segments by actions across the video duration

In [None]:
prompt = """
Analyze the video and identify all human actions or activities occurring throughout its duration. 

Follow these guidelines for your task:
1. List each action with its corresponding timestamp range.
2. Describe each action succinctly
3. Output the timestamp in MM:SS format.
4. DO NOT list identical actions consecutively in your output
5. Your output should be in the following sample json schema:
    {
    "actions": [
        {
            "action": "the teacher enters the room",
            "timestamp": "00:15"
        },
        {
            "action": "the students sit down", 
            "timestamp": "00:32"

        }
    ]
}
"""


system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": prompt
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)

#### Task 4: Analyze the video with its transcript

For some tasks you will need to analyze a video along with the speech heard in the video. For example, a task that requires Amazon Nova to answer questions about actions and intentions in video content can use the video transcript. For this task we will use Amazon Transcribe to extract the video's dialogue. Then, we pass the dialogue and the video to Amazon Nova Premier for analysis

First, we define a function to analyze the video with Amazon Transcribe and download the transcipt as a webvtt file locally

In [None]:
def getTranscript(videoFile):
    file_name_parsed=videoFile.rsplit('/', 1)[-1]
    job_name = "transcription-{0}-{1}".format(file_name_parsed,round(time.time()))
    job_uri = "s3://{0}/{1}/{2}".format(bucket, video_path, file_name_parsed)

    transcribe_client.start_transcription_job(
        TranscriptionJobName = job_name,
        Media = {
            'MediaFileUri': job_uri
        },
        OutputBucketName = bucket,
        OutputKey = "{0}/{1}/".format(transcription_output_path, file_name_parsed),
        LanguageCode = 'en-US', 
        Subtitles = {
            'Formats': [
                'vtt'
            ]
       }
    )

    while True:
        status = transcribe_client.get_transcription_job(TranscriptionJobName = job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            print('transcription for {0} complete'.format(videoFile))
            break
        print("processing {0}".format(videoFile))
        time.sleep(5)
    outputVTT = str(status["TranscriptionJob"].get("Subtitles").get('SubtitleFileUris')[0].split('/')[-1])

    #download vtt file locally
    with open(outputVTT, 'wb') as f:
        s3_client.download_fileobj(bucket,'{}/{}/{}'.format(transcription_output_path,file_name_parsed,outputVTT), f)
    return outputVTT

In [None]:
#Use the previously define function to extract the video transcript and store it  locally
transcript_vtt = getTranscript(selected_video)
print(transcript_vtt)

#read the video transcript as a vtt file and store in a variable to be used in your prompt
with open(transcript_vtt, 'r', encoding='utf-8') as file:
            vtt_transcript = file.read()


In [None]:
#define a new system prompt to give Nova Premier context

system_message= """

You are an expert video and media analyst. You analyze videos and their transcripts in WEBVTT format to extract detailed insights accurately based on user queries.

"""


#define your Q&A prompt with the transcript VTT file include in the prompt 
prompt = """

Transcript:

""" + vtt_transcript + """

What is the man in the video determined to do? Explain your answer
"""


In [None]:
#view your prompt

print(prompt)

In [None]:
system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": prompt
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)


    

In [None]:
prompt = """

Transcript:

""" + vtt_transcript + """

At what points in the video is the man thinking about the details of the case he is investigating?



Follow these guidelines for your task:
1. Use both the video above and the captions in the prompt 
2. Format timestamps with minute and seconds as follows: "MM:SS"
"""

In [None]:
system_list = [
    {
        "text": system_message
    }
]

message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": uri
                        }
                    }
                }
            },
            {
                "text": prompt
            }
        ]
    }
]

inf_params = {"maxTokens": 1024, "topP": 0.1, "topK": 20, "temperature": 0.3}


native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

# Invoke the model and extract the response body.
response = bedrock_client.invoke_model(modelId=PREMIER_MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)


    

## Conclusion

You've successfully tested some video understanding capabilities using Amazon Nova Premier. 

What you've accomplished:
- Tested prompts for temporal understanding tasks
- Explored video analysis capabilities
- Learned prompt patterns for video understanding

Build on these examples for your specific use cases. Also reference the [AWS Video Understanding documentation](https://docs.aws.amazon.com/nova/latest/userguide/prompting-video-understanding.html) for advanced prompting

    