# Semantic Video Search

Semantic video search is an advanced technology that enables users to find and retrieve video content based on its meaning and context, rather than relying solely on keywords or metadata. Content creators produce videos with rich, multidimensional information. The semantic search algorithm analyzes and understands this content using AI technologies like computer vision, natural language processing, and now, generative AI. This allows users to interact with the system, searching for specific concepts, objects, or actions within videos.

![Semantic Video Search](./static/images/04-semantic-video-search.png)

Semantic video search is crucial for the media and entertainment industry as it dramatically improves content discovery, user engagement, and the overall viewing experience. In an era of content overload, users demand more efficient ways to find relevant videos. Traditional search methods often fall short, leading to frustrated users and underutilized content libraries. Semantic search allows media companies to unlock the full potential of their video archives, improve recommendation systems, and create more personalized viewing experiences.

However, implementing effective semantic video search comes with significant challenges. The sheer volume and complexity of video data make it difficult to analyze and index content accurately. Variations in video quality, language, and cultural context can lead to misinterpretations. Generative AI offers a promising solution to enhance semantic video search capabilities. By leveraging large language models and multimodal AI, generative AI can provide more nuanced and context-aware analysis of video content. It can generate detailed descriptions of scenes, identify complex actions and emotions, and even understand subtle cultural references, bridging the gap between user intent and video content.

In this lab, you'll create a multi-modal vector database using visual and audio metadata generated from previously labs to build a multi-modal(MM) search database. By the end of the lab, you'll be able to qury against this database using natural language or images, and find quickly find the relevant shots from the video.

# Prerequisites

To run this notebook, you need to have run the previous notebook:[01-video-time-segmentation](01-video-time-segmentation.ipynb), where you segmented the video using audio, visual and semantic information.

### Import python packages

In [None]:
import json
import boto3
from botocore.exceptions import ClientError
import os
import time
import re
from IPython.display import display, JSON, HTML
import subprocess
from PIL import Image
import base64
from termcolor import colored
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import random
import datetime

### Retrieve saved values from previous notebooks
To run this notebook, you need to have run the previous notebook: 00_prerequisites.ipynb, where you installed package dependencies and gathered some information from the SageMaker environment.

In [None]:
%store -r

In [None]:
video_path = video['path']
rek_client = boto3.client("rekognition")
bedrock_runtime = boto3.client("bedrock-runtime")
region = sagemaker_resources['region']
oss_host = session['AOSSCollectionEndpoint']

# Architecture

This hands-on workflow uses AWS services from SageMaker.  It takes video frame, shots, and audio transcription as inputs and produces a seachable vector index in opesearch serverless.

[TBD]()

## Randomly Sample a subset of shots from the video

For a better and uninterrupted lab experience, we will randomly sample 20 shots from the original videos in chronical order. This approach is necessary due to the limited capacity in the workshop environment. Do you so will still maintain the integrity of the exercise while ensuring that all participants can complete the lab without getting throttled. If you are not in the workshop sandbox environment, please feel free to increase the number of shots.

In [None]:
def sample_shots(original_list, sample_size=10):
    # Check if the sample size is larger than the list
    if sample_size >= len(original_list):
        return original_list

    # Create a set of random indices
    indices = set(random.sample(range(len(original_list)), sample_size))
    
    # Use a list comprehension to select items at these indices
    return [item for i, item in enumerate(original_list) if i in indices]

In [None]:
sampled_shots = sample_shots(video['shots'].shots, sample_size=20)

for shot in sampled_shots:
    print(f"Sampled shot id: {shot['id']} ===================\n")
    display(Image.open(shot['composite_images'][0]['file']))

## Detect Celebrities Using Amazon Rekognition
[Amazon Rekognition](https://aws.amazon.com/rekognition/) can be used to recognize international, widely known celebrities like actors, sportspeople, and online content creators. The metadata provided by the celebrity recognition API significantly reduces the repetitive manual effort required to tag content and make it readily searchable. In the following section, we'll leverage this feature to help us detect any celebrities in the shots extracted in the previous step.

In [None]:
def detect_celebrities(shot):
    start_frame_id = shot['start_frame_id']
    end_frame_id = shot['end_frame_id']
    video_asset_dir = shot['video_asset_dir']

    frames = range(start_frame_id, end_frame_id + 1)

    celebrities = set()

    for frame_id in frames:
        try:
            #image_path = f"{video_asset_dir}/frames/frames{frame_id+1:07d}.jpg"
            image_path = f"{video_asset_dir}/frames/frames{frame_id+1:07d}.jpg"
            with open(image_path, 'rb') as image_file:
                image_bytes = image_file.read()      

            # Call Rekognition to detect celebrities
            response = rek_client.recognize_celebrities(
                Image={'Bytes': image_bytes}
            )

            min_confidence = 95.0 # change this value if the accuracy is low.

            for celebrity in response.get('CelebrityFaces', []):
                if celebrity.get('MatchConfidence', 0.0) >= min_confidence:
                    celebrities.add(celebrity['Name'])

        except ClientError as e:
            pass

    public_figures = ', '.join(celebrities)

    shot["public_figure"] = public_figures
    
    return {
            "shot_id": shot['id'],
            "public_figure": public_figures
        }

In [None]:
print("===== [Celebrities detected in each shot] ======\n")
for shot in sampled_shots:
    print(detect_celebrities(shot))

## Process Audio Transcription

Convert close caption to timestamp with complete sentences

In [None]:
def process_transcript(s):
    subtitle_blocks = re.findall(
        r"(\d+\n(\d{2}:\d{2}:\d{2}.\d{3}) --> (\d{2}:\d{2}:\d{2}.\d{3})\n(.*?)(?=\n\d+\n|\Z))",
        s,
        re.DOTALL,
    )

    sentences = [block[3].replace("\n", " ").strip() for block in subtitle_blocks]
    startTimes = [block[1] for block in subtitle_blocks]
    endTimes = [block[2] for block in subtitle_blocks]

    startTimes_ms = [time_to_ms(time) for time in startTimes]
    endTimes_ms = [time_to_ms(time) for time in endTimes]

    filtered_sentences = []
    filtered_startTimes_ms = []
    filtered_endTimes_ms = []

    startTime_ms = -1
    endTime_ms = -1
    sentence = ""
    for i in range(len(sentences)):
        if startTime_ms == -1:
            startTime_ms = startTimes_ms[i]
        sentence += " " + sentences[i]
        if (
            sentences[i].endswith(".")
            or sentences[i].endswith("?")
            or sentences[i].endswith("!")
            or i == len(sentences) - 1
        ):
            endTime_ms = endTimes_ms[i]
            filtered_sentences.append(sentence.strip())
            filtered_startTimes_ms.append(startTime_ms)
            filtered_endTimes_ms.append(endTime_ms)
            startTime_ms = -1
            endTime_ms = -1
            sentence = ""

    processed_transcript = []
    for i in range(len(filtered_sentences)):
        processed_transcript.append(
            {
                "sentence_startTime": filtered_startTimes_ms[i],
                "sentence_endTime": filtered_endTimes_ms[i],
                "sentence": filtered_sentences[i],
            }
        )

    return processed_transcript

def time_to_ms(time_str):
    h, m, s, ms = re.split(r"[:|.]", time_str)
    return int(h) * 3600000 + int(m) * 60000 + int(s) * 1000 + int(ms)

In [None]:
with open(video['transcript'].vtt_file, encoding="utf-8") as f:
    transcript = f.read()

processed_transcript = process_transcript(transcript)

processed_transcript

## Align sentences to shots

In [None]:
def add_shot_transcript(shot_startTime, shot_endTime, transcript):
    relevant_transcript = ""
    for item in transcript:
        if item["sentence_startTime"] >= shot_endTime:
            break
        if item["sentence_endTime"] <= shot_startTime:
            continue
        delta_start = max(item["sentence_startTime"], shot_startTime)
        delta_end = min(item["sentence_endTime"], shot_endTime)
        if delta_end - delta_start >= 500:
            relevant_transcript += item["sentence"] + "; "
    return relevant_transcript

In [None]:
for shot in sampled_shots:
    relevant_transcript = add_shot_transcript(shot['start_ms'], shot['end_ms'], processed_transcript)
    shot['transcript'] = relevant_transcript
    print({
        'shot_id': shot['id'],
        'transcript': relevant_transcript
    })

## Create the Shot Description 
For given images belong to a shot whithin a video, leverage an LLM to extract key elements from the images.

In [None]:
def get_shot_description(model_id, composite_images, celebrities):

    system_prompts = [{"text": """
    You are an expert video content analyst specializing in generating rich, contextual metadata for semantic search systems. 
    Your task is to analyze video shots presented in a sequence of frame images and provide a detailed but concise description 
    of a video shot based on the given frame images. Focus on creating a cohesive narrative of the entire shot rather than 
    describing each frame individually.
    """}]
    
    prompt = """
    <celebrities>
    {{CELEBRITIES}}
    </celebrities>
    
   Context:
    - Each image contains a sequence of consecutive video frames, read from left to right and top to bottom.
    - Your goal is to generate metadata that makes the video content easily discoverable through various search queries.
    - ALL identified <celebrities> MUST be integrated into descriptions.

    STRICT VALIDATION REQUIREMENTS:
    1. STOP AND CHECK BEFORE OUTPUTTING:
       - Are there any names in the "celebrities" field? 
       - If YES, verify these names appear in the description text
       - If NO match found, rewrite description to include celebrity names
       
    2. REQUIRED FORMAT FOR DESCRIPTIONS WITH CELEBRITIES:
       - MUST start with celebrity names and their actions
       - Example format: "[Celebrity Name] appears/is shown/portrays..."
       - NEVER output generic terms ("a man", "someone") when celebrity identity is known
    
    3. AUTOMATIC ERROR CHECKING:
       If (celebrities.length > 0):
          If (description does not contain ALL celebrity names):
             MUST rewrite description
    
    Description Template When Celebrities Present:
    "[Celebrity Name 1] [action/appearance], [clothing/setting details]. [Additional context]. [Celebrity Name 2 if present] [their action/appearance]..."

    REQUIRED PRE-SUBMISSION CHECKS:
    □ Are all celebrity names from <celebrities> present in description?
    □ Does description start with a celebrity name (not generic terms)?
    □ Are all celebrities actively described (not passively mentioned)?
    □ Have you avoided generic terms like "a man" or "someone"?
    
    INCORRECT (Reject):
    "A man in a white shirt and tie is shown..."
    (When celebrities field contains "Kevin Kilner")
    
    CORRECT (Accept):
    "Kevin Kilner appears in a white shirt and tie..."
    
    Input Description:
    <input_description>
    - Sequence of images representing video frames
    - List of known celebrities (if applicable)
    </input_description>

    Step-by-step Instructions:
    <instructions>
    1. Analyze the visual content:
       a. First priority: Identify any celebrities or notable individuals
       b. Check for dark/empty frames:
          - If frames are black or empty, use specialized template
          - Set appropriate technical descriptors
          - Mark confidence scores as 100 for verified empty content
          - Use "None" or "Undefined" for inapplicable categories
       c. If celebrities identified, prepare description using required template
       d. Identify key objects, actions, and settings in the scene
       e. Detect any text or graphics visible in the frames
       f. Recognize brands, logos, or products
    
    2. Determine temporal aspects:
       a. Identify any scene transitions or significant changes in the sequence
       b. Note any recurring elements across multiple frames
    
    3. Synthesize a detailed description:
       a. REQUIRED: If celebrities present, use template format
       b. MUST start with celebrity identification and actions
       c. Integrate setting, atmosphere, and context
       d. Include all identified celebrities in natural narrative flow
       e. Run pre-submission checks before finalizing
    
    4. Final Validation:
       a. Run through pre-submission checklist
       b. Verify celebrity integration in description (if applicable)
       c. Confirm no generic terms used for identified people
       d. For dark frames, verify all technical descriptors are accurate
    
    5. Special Cases Handling:
        a. For dark/empty frames:
           - Use technical description template
           - Set appropriate null values
           - Mark relevant technical indicators
           - Note possible transition purpose
        b. For partially visible content:
           - Note visibility issues
           - Describe what can be confidently identified
           - Adjust confidence scores accordingly
    
    6. Final Output Preparation:
        a. Skip the preamble; go straight into the description
        b. Check for proper formatting and syntax
    """.replace("{{CELEBRITIES}}", celebrities)

    
    message = {
        "role":"user",
        "content":[]
    }
                         
    for composite in composite_images:

        with open(composite['file'], "rb") as image_file:
            image_string = image_file.read()

        message["content"].append({
            "image":{
                "format": "jpeg",
                "source":{
                    "bytes": image_string
                }
            }
                
        })

    message["content"].append({
        "text": prompt
    })
    
    # Base inference parameters to use.
    inference_config = {"temperature": .1}
    
    # Additional inference parameters to use.
    additional_model_fields = {"top_k": 200}
    
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=[message],
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )
    output_message = response['output']['message']
    
    return output_message["content"][0]["text"]

The following step take a while to complete. (> 20 minutes). Consider using a smaller model (e.g. Haiku)

In [None]:
%%time
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

for shot in sampled_shots:
    description = get_shot_description(
        model_id = model_id, 
        composite_images = shot['composite_images'], 
        celebrities = shot['public_figure']
    )
    shot['shot_description'] = description

## Generate Embeddings for Shots

In [None]:
def get_embedding(model_id, input_data):
    accept = "application/json"
    content_type = "application/json"

    if 'text' in model_id:
        body = json.dumps({
            "inputText": input_data,
            "dimensions": 1024,
            "normalize": True
        })
    elif 'image' in model_id:
        # Read image from file and encode it as base64 string.
        with open(input_data, "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')
        
        body = json.dumps({
            "inputImage": input_image,
            "embeddingConfig": {
                "outputEmbeddingLength": 1024
            }
        })
    else:
        raise ValueError("Invalid embedding_type. Choose 'text' or 'image'.")

    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type,
    )
    response_body = json.loads(response["body"].read())
    embedding = response_body.get("embedding")

    return embedding

## Building the OpenSearch Serverless Vector Index

OpenSearch Serverless (OSS) is a fully managed, on-demand search and analytics service provided by Amazon Web Services (AWS). It allows users to deploy, operate, and scale OpenSearch clusters without the need for infrastructure management.

An index in OpenSearch is a collection of documents that share similar characteristics. In this case, we're focusing on a vector index, which is designed to store and search vector embeddings efficiently.

### Here is the Index Configuration

The index contain following attributes:
- `video_path`: Path to the video file (text field)
- `shot_id`: Unique identifier for each shot (text field)
- `shot_startTime`: Start time of the shot (text field)
- `shot_endTime`: End time of the shot (text field)
- `shot_description`: Description of the shot (text field)
- `shot_celebrities`: Celebrities identified in the shot (text field)
- `shot_transcript`: Audo Transcript of the shot (text field)

These are metadata fileds we can use retrive for each search query as well as use them to filter down results. 

- `shot_image_vector`: Vector representation of the shot image
- `shot_desc_vector`: Vector representation of the shot description

Both `shot_image_vector` and `shot_desc_vector` are configured as `knn_vector` fields. you will use these two field to conduct vector similarity search to find the closest matching camera shot corresponding to your text query or input image.

In [None]:
# Establish client connection OSS
def get_opensearch_client(host, region):
    host = host.split("://")[1] if "://" in host else host
    credentials = boto3.Session().get_credentials()
    auth = AWSV4SignerAuth(credentials, region, "aoss")

    oss_client = OpenSearch(
        hosts=[{"host": host, "port": 443}],
        http_auth=auth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        pool_maxsize=20,
    )

    return oss_client


# Create Opensearch Severless Index
def create_opensearch_index(oss_client, index_name, len_embedding=1024):

    exist = oss_client.indices.exists(index_name)
    if not exist:
        print("Creating index")
        index_body = {
            "mappings": {
                "properties": {
                    "video_path": {"type": "text"},
                    "shot_id": {"type": "text"},
                    "shot_startTime": {"type": "text"},
                    "shot_endTime": {"type": "text"},
                    "shot_description": {"type": "text"},
                    "shot_celebrities": {"type": "text"},
                    "shot_transcript": {"type": "text"},
                    "shot_image_vector": {
                        "type": "knn_vector",
                        "dimension": len_embedding,
                        "method": {
                            "engine": "nmslib",
                            "space_type": "cosinesimil",
                            "name": "hnsw",
                            "parameters": {"ef_construction": 512, "m": 16},
                        },
                    },
                    "shot_desc_vector": {
                        "type": "knn_vector",
                        "dimension": len_embedding,
                        "method": {
                            "engine": "nmslib",
                            "space_type": "cosinesimil",
                            "name": "hnsw",
                            "parameters": {"ef_construction": 512, "m": 16},
                        },
                    },
                }
            },
            "settings": {
                "index": {
                    "number_of_shards": 2,
                    "knn.algo_param": {"ef_search": 512},
                    "knn": True,
                }
            },
        }
        response = oss_client.indices.create(index_name, body=index_body)

        print(response)

In [None]:
index_name = "video_search_index"

oss_client = get_opensearch_client(oss_host, region)
create_opensearch_index(oss_client, index_name)

In [None]:
for shot in sampled_shots:

    # generate text embedding from description
    shot_desc_vector = get_embedding(
        model_id='amazon.titan-embed-text-v2:0',
        input_data=shot['shot_description']
    )
    
    # generate mm embedding from composite frames
    shot_image_vector = get_embedding(
        model_id='amazon.titan-embed-image-v1',
        input_data=shot['composite_images'][0]['file']
    )

    #build the payload to index in OSS
    payload = json.dumps(
            {
                "video_path": video_path,
                "shot_id": shot['id'],
                "shot_startTime": shot['start_ms'],
                "shot_endTime": shot['end_ms'],
                "shot_description": shot['shot_description'],
                "shot_celebrities": shot['public_figure'],
                "shot_transcript": shot['transcript'],
                "shot_desc_vector": shot_desc_vector,
                "shot_image_vector": shot_image_vector,
            }
        )
    response = oss_client.index(
                    index=index_name,
                    body=payload,
                    params={"timeout": 60},
                )

## Perform Video Semantic Search

But we will first make sure the inserted data in OpenSearch is ready to be searched.

In [None]:
print("Waiting for the recent inserted data to be searchable in OpenSearch...")

while True:
    try:
        result = oss_client.search(index=index_name, body={"query": {"match_all": {}}})
        if result['hits']['total']['value'] == len(sampled_shots):
            print("\nData is now available for search!")
            break
    except Exception as e:
        print(".", end="", flush=True)
        time.sleep(10)

In [None]:
user_query = "Scott driving a car"

query_embedding = get_embedding('amazon.titan-embed-text-v2:0', user_query)

In [None]:
aoss_query = {
        "size": 10,
        "query": {
            "script_score": {
                "query": {"bool": {"should": []}},
                "script": {
                    "lang": "knn",
                    "source": "knn_score",
                    "params": {
                        "field": "shot_desc_vector",
                        "query_value": query_embedding,
                        "space_type": "cosinesimil",
                    },
                },
            }
        },
        "_source": [
            "video_path",
            "shot_id",
            "shot_startTime",
            "shot_endTime",
            "shot_description",
            "shot_celebrities",
            "shot_transcript",
        ],
    }

In [None]:
response = oss_client.search(body=aoss_query, index=index_name)
hits = response["hits"]["hits"]

responses = []
for hit in hits:
    if hit["_score"] >= 0:  # Set score threshold
        responses.append(
            {
                "video_path": hit["_source"]["video_path"],
                "shot_id": hit["_source"]["shot_id"],
                "shot_startTime": hit["_source"]["shot_startTime"],
                "shot_endTime": hit["_source"]["shot_endTime"],
                "shot_description": hit["_source"]["shot_description"],
                "shot_celebrities": hit["_source"]["shot_celebrities"],
                "shot_transcript": hit["_source"]["shot_transcript"],
                "score": hit["_score"],
            }
        )

## Shows the top 2 search results

In [None]:
def render_with_original_video(top_hit):
    
    video_path = top_hit['video_path']
    video_start = top_hit['shot_startTime']/1000
    
    display(HTML(f"""
    <video alt="test" controls id="{top_hit['shot_id']}" width="100" >
      <source src="{video_path}">
    </video>
    
    <script>
    video = document.getElementById("{top_hit['shot_id']}")
    video.currentTime = {video_start};
    </script>
    """))
    
    
def display_shot_segemnt_results(response, top_results=2):

    css_style = """
    <style>
        .video-container {
            display: flex;
            justify-content: space-around;
            flex-wrap: wrap;
        }
        .video {
            flex: 1;
            min-width: 400px;
            margin: 10px;
        }
        video {
            width: 100%;
            height: auto;
        }
    </style>
    """

    html_content = "<div class='video-container'>\n"
    
    for idx in range(top_results):
        # convert format of timestamps
        video_start = responses[idx]['shot_startTime']/1000
        video_end = responses[idx]['shot_endTime']/1000
    
        converted_start = str(datetime.timedelta(seconds = video_start))
        converted_end = str(datetime.timedelta(seconds = video_end))
        output_file = f"shot-{responses[idx]['shot_id']}.mp4"
        _ = subprocess.run(
            [
                "/usr/bin/ffmpeg",
                "-ss",
                converted_start,
                "-to",
                converted_end,
                "-i",
                responses[idx]['video_path'], # path to video
                "-c",
                "copy",
                output_file,
            ],
            stderr=subprocess.PIPE
        )
        html_content += f"""
            <div class="video">
                <h5>Shot Id: {responses[idx]['shot_id']}, Time Range: {video_start} ms - {video_end} ms</p>
                <video controls>
                    <source src="{output_file}" type="video/mp4">
                    Your browser does not support the video tag.
                </video>
            </div>
        """
    # render the shots
    html_content += "</div>"
    
    display(HTML(css_style + html_content))

In [None]:
print(colored("====== [TOP results] =======", 'green'))
display_shot_segemnt_results(responses, top_results=3)

print(colored("\n====== [Display top hit as part Of original video] =======\n", 'green'))

top_hit = responses[0]

video_start = top_hit['shot_startTime']/1000
video_end = top_hit['shot_endTime']/1000

print(f"Shot Id: {top_hit['shot_id']}, Time Range: {video_start} ms - {video_end} ms")
render_with_original_video(top_hit)

### Multi-Model Search

In [None]:
def random_sample_image(shots):
    shot = random.choice(shots) if shots else None
    frame_locations = shot['composite_images'][0]['layout']
    frame_info = random.choice(frame_locations) if frame_locations else None
    return frame_info[0]

random_frame = random_sample_image(sampled_shots)
image = Image.open(random_frame)
image.show()

In [None]:
image_embedding = get_embedding('amazon.titan-embed-image-v1', random_frame)

In [None]:
aoss_query = {
        "size": 10,
        "query": {
            "script_score": {
                "query": {"bool": {"should": []}},
                "script": {
                    "lang": "knn",
                    "source": "knn_score",
                    "params": {
                        "field": "shot_image_vector",
                        "query_value": image_embedding,
                        "space_type": "cosinesimil",
                    },
                },
            }
        },
        "_source": [
            "video_path",
            "shot_id",
            "shot_startTime",
            "shot_endTime",
            "shot_description",
            "shot_celebrities",
            "shot_transcript",
        ],
    }

In [None]:
response = oss_client.search(body=aoss_query, index=index_name)
hits = response["hits"]["hits"]

responses = []
for hit in hits:
    if hit["_score"] >= 0:  # Set score threshold
        responses.append(
            {
                "video_path": hit["_source"]["video_path"],
                "shot_id": hit["_source"]["shot_id"],
                "shot_startTime": hit["_source"]["shot_startTime"],
                "shot_endTime": hit["_source"]["shot_endTime"],
                "shot_description": hit["_source"]["shot_description"],
                "shot_celebrities": hit["_source"]["shot_celebrities"],
                "shot_transcript": hit["_source"]["shot_transcript"],
                "score": hit["_score"],
            }
        )

In [None]:
print(colored("====== [TOP results] =======", 'green'))
display_shot_segemnt_results(responses, top_results=3)

print(colored("\n====== [Display top hit as part Of original video] =======\n", 'green'))

top_hit = responses[0]

video_start = top_hit['shot_startTime']/1000
video_end = top_hit['shot_endTime']/1000

print(f"Shot Id: {top_hit['shot_id']}, Time Range: {video_start} ms - {video_end} ms")
render_with_original_video(top_hit)

## Clean Up

In [None]:
# try:
#     response = oss_client.indices.delete(index=index_name)
#     print(f"Index '{index_name}' deleted successfully")
# except Exception as e:
#     print(f"Error deleting index '{index_name}': {str(e)}")