# Project Week 1: ActivityNet Video Data Preparation and Indexing

In this example we will use the ActivityNet dataset https://github.com/activitynet/ActivityNet. 

 - Select the 10 videos with more moments.
 - Download these videos onto your computer.
 - Extract the frames for every video.
 - Read the textual descriptions of each video.
 - Index the video data in OpenSearch.

 In this week, you will index the video data and make it searchable with OpenSearch. You should refer to the OpenSearch tutorial laboratory.

## Select videos
Download the `activity_net.v1-3.min.json` file containing the list of videos. The file is in the github repository of ActivityNet.
Parse this file and select the 10 videos with more moments.

In [14]:
import json
from pprint import pprint

with open('activity_net.v1-3.min.json', 'r') as json_data:
    data = json.load(json_data)

video_moments = []
for video_id, video_data in data['database'].items():
    num_moments = len(video_data.get('annotations', []))
    video_moments.append((video_id, num_moments))
    
video_moments.sort(key=lambda x: x[1], reverse=True)

top_x = 10
top_videos = video_moments[:top_x]

print(f"Top {top_x} videos with the most moments:")
for video_id, num_moments in top_videos:
    print(f"Video ID: {video_id}, Number of moments: {num_moments}")
print("\nTotal number of videos:", len(video_moments))


Top 10 videos with the most moments:
Video ID: o1WPnnvs00I, Number of moments: 23
Video ID: oGwn4NUeoy8, Number of moments: 23
Video ID: VEDRmPt_-Ms, Number of moments: 20
Video ID: qF3EbR8y8go, Number of moments: 19
Video ID: DLJqhYP-C0k, Number of moments: 18
Video ID: t6f_O8a4sSg, Number of moments: 18
Video ID: 6gyD-Mte2ZM, Number of moments: 18
Video ID: jBvGvVw3R-Q, Number of moments: 18
Video ID: PJ72Yl0B1rY, Number of moments: 17
Video ID: QHn9KyE-zZo, Number of moments: 17

Total number of videos: 19994


In [2]:
import json
from pprint import pprint

# Load both JSON files
with open('captions/val_1.json', 'r') as file1, open('captions/val_2.json', 'r') as file2:
    data1 = json.load(file1)
    data2 = json.load(file2)

# Combine the video data from both files
combined_data = {**data1, **data2}

# Extract the videos and their moments
video_moments = []
for video_id, video_data in combined_data.items():
    num_moments = len(video_data.get('timestamps', []))
    video_moments.append((video_id, num_moments))

# Sort videos by the number of moments in descending order
video_moments.sort(key=lambda x: x[1], reverse=True)

# Select the top X videos (e.g., top 10)
top_x = 20
top_videos = video_moments[:top_x]

# Print the top videos
print(f"Top {top_x} videos with the most moments:")
pprint(top_videos)

Top 20 videos with the most moments:
[('v_aPjbJ4ZNcVQ', 15),
 ('v_QKEFacWrn_8', 14),
 ('v__15t4WTR19s', 14),
 ('v_bG7hnpAeja0', 14),
 ('v_eXMF6Skt2To', 13),
 ('v_7H4-gDM3r0w', 13),
 ('v_PCoxnf59j5U', 12),
 ('v_4o8MaHTb7E4', 12),
 ('v_od1jHUzgrAU', 12),
 ('v_xsdrqauYhJs', 11),
 ('v_U0d68z5HTwE', 11),
 ('v_gXk9TiqGUHs', 11),
 ('v_DTWZhe352y8', 11),
 ('v_dSdZz_Royyc', 11),
 ('v_DQLotF3P9Fc', 11),
 ('v_TNFoUBRsngY', 11),
 ('v_nKa1e_CpvoY', 11),
 ('v_juiMCvZUYwk', 10),
 ('v_MwQTeFD0OKQ', 10),
 ('v_vc820BteGzY', 10)]


In [15]:
# Print the moments for the first given video
video_id = 't6f_O8a4sSg'
video_data = data['database'][video_id]
print(f"\nMoments for video ID {video_id}:")
for annotation in video_data.get('annotations', []):
    print(f"Start: {annotation['segment'][0]}, End: {annotation['segment'][1]}, Label: {annotation['label']}")


Moments for video ID t6f_O8a4sSg:
Start: 14.999980897195073, End: 30.681779107899008, Label: Skateboarding
Start: 34.431774332197776, End: 34.77268298895221, Label: Skateboarding
Start: 36.47722627272438, End: 37.159043586233246, Label: Skateboarding
Start: 38.86358687000541, End: 40.227221497023145, Label: Skateboarding
Start: 41.59085612404088, End: 43.63630806456748, Label: Skateboarding
Start: 45.681760005094084, End: 46.36357731860295, Label: Skateboarding
Start: 48.06812060237512, End: 49.43175522939285, Label: Skateboarding
Start: 51.136298513165016, End: 53.18175045369162, Label: Skateboarding
Start: 54.88629373746378, End: 55.90901970772708, Label: Skateboarding
Start: 57.95447164825369, End: 59.659014932025855, Label: Skateboarding
Start: 61.363558215798015, End: 63.40901015632462, Label: Skateboarding
Start: 66.47718806711453, End: 92.7271546372059, Label: Skateboarding
Start: 96.8180585182591, End: 102.27259702633003, Label: Skateboarding
Start: 105.68168359387437, End: 11

https://www.youtube.com/watch?v=QKEFacWrn_8
https://www.youtube.com/watch?v=_15t4WTR19s
https://www.youtube.com/watch?v=eXMF6Skt2To
https://www.youtube.com/watch?v=4o8MaHTb7E4
https://www.youtube.com/watch?v=od1jHUzgrAU
https://www.youtube.com/watch?v=U0d68z5HTwE
https://www.youtube.com/watch?v=IEqnfSiCIXc
https://www.youtube.com/watch?v=Ez7s36AwgLk
https://www.youtube.com/watch?v=mHVmDOxtVt0
https://www.youtube.com/watch?v=i2X7z9ywHV8


/home/wiirijo/anaconda3/bin/yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]" -a vid_urls.txt

## Video frame extraction

PyAV is a wrapper library providing you access to `ffmpeg`, a command-line video processing tool. In the example below, you will be able to extract frames from the a video shot.

In [12]:
import av
import av.datasets

content = av.datasets.curated("pexels/time-lapse-video-of-night-sky-857195.mp4")
with av.open(content) as container:
    # Signal that we only want to look at keyframes.
    stream = container.streams.video[0]
    stream.codec_context.skip_frame = "NONKEY"

    for i, frame in enumerate(container.decode(stream)):
        print(frame)
        frame.to_image().save(f"night-sky.{i:04d}.jpg", quality=80)

<av.VideoFrame, pts=0 yuv420p 1280x720 at 0x70dbc8066f40>
<av.VideoFrame, pts=75 yuv420p 1280x720 at 0x70dbdda23040>
<av.VideoFrame, pts=150 yuv420p 1280x720 at 0x70dbb7eea880>


## Keyframe Extraction

Keyframes are the frames that describe a moment.
Usually are in the middle of the segment

- Middle Frame Calc:

$middle = \frac{start+end}{2}$

In [None]:
import av
import os
import json

#Load both val_1.json and val_2.json
with open('captions/val_1.json', 'r') as file1, open('captions/val_2.json', 'r') as file2:
    val1_data = json.load(file1)
    val2_data = json.load(file2)
# Combine the video data from both files
combined_data = {**val1_data, **val2_data}
# Load the JSON file containing video metadata
with open('activity_net.v1-3.min.json', 'r') as json_data:
    activity_data = json.load(json_data)



# Directory containing the video files
video_dir = 'videos'

# Output directory for frames
output_dir = 'keyframes'
os.makedirs(output_dir, exist_ok=True)

# List all video files in the directory (already downloaded)
video_files = [f for f in os.listdir(video_dir) if f.endswith('.mp4')]

for video_file in video_files:
    # Extract video ID from the filename
    video_id = video_file.split('[')[-1].split(']')[0]
    caption_ds_id = 'v_' + video_id
    
    if caption_ds_id not in val1_data and video_id not in val2_data:
        print(f"Video ID {video_id} not found in either val_1 or val_2 datasets.")
        continue
    
    # Load the video file
    video_path = os.path.join(video_dir, video_file)
    try:
        container = av.open(video_path)
    except Exception as e:
        print(f"Could not open video {video_file}: {e}")
        continue

    # Get metadata from activity_net.v1-3.min.json
    video_metadata = activity_data['database'].get(video_id, {})

    for source, dataset in zip(['val_1', 'val_2'], [val1_data, val2_data]):
        if caption_ds_id not in dataset:
            continue
        
        annotations = dataset[caption_ds_id]['timestamps']
        captions = dataset[caption_ds_id]['sentences']
        duration = dataset[caption_ds_id]['duration']

        for idx, (timestamp, caption) in enumerate(zip(annotations, captions)):
            start_time, end_time = timestamp
            middle_time = (start_time + end_time) / 2

            try:
                container.seek(int(middle_time * av.time_base))
                for frame in container.decode(video=0):
                    frame_image_path = os.path.join(output_dir, f"{video_id}_{source}_frame_{idx}.jpg")
                    frame.to_image().save(frame_image_path)
                    print(f"Saved keyframe for {video_id} at {middle_time:.2f}s from {source}")
                    break
            except Exception as e:
                print(f"Error extracting frame for {video_id} from {source}: {e}")
    
    container.close()

Saved keyframe for Ez7s36AwgLk at 2.92s from val_1
Saved keyframe for Ez7s36AwgLk at 9.95s from val_1
Saved keyframe for Ez7s36AwgLk at 22.23s from val_1
Saved keyframe for Ez7s36AwgLk at 38.02s from val_1
Saved keyframe for Ez7s36AwgLk at 56.14s from val_1
Saved keyframe for Ez7s36AwgLk at 78.37s from val_1
Saved keyframe for Ez7s36AwgLk at 117.55s from val_1
Saved keyframe for Ez7s36AwgLk at 161.42s from val_1
Saved keyframe for Ez7s36AwgLk at 194.76s from val_1
Saved keyframe for Ez7s36AwgLk at 216.98s from val_1
Saved keyframe for Ez7s36AwgLk at 229.26s from val_1
Saved keyframe for Ez7s36AwgLk at 7.02s from val_2
Saved keyframe for Ez7s36AwgLk at 112.29s from val_2
Saved keyframe for Ez7s36AwgLk at 117.56s from val_2
Saved keyframe for Ez7s36AwgLk at 163.76s from val_2
Saved keyframe for 4o8MaHTb7E4 at 26.38s from val_1
Saved keyframe for 4o8MaHTb7E4 at 61.72s from val_1
Saved keyframe for 4o8MaHTb7E4 at 88.10s from val_1
Saved keyframe for 4o8MaHTb7E4 at 147.00s from val_1
Saved 

## Video metadata

Process the video metadata provided in the `json` file and index the video data in OpenSearch.

### Check the current OpenSearch Index

In [3]:
import pprint as pp
from opensearchpy import OpenSearch
from opensearchpy import helpers
import requests

host = 'api.novasearch.org'
port = 443

user = 'user08' # Add your user name here.
password = '55LL.TTSS' # Add your user password here. For testing only. Don't store credentials in code. 
index_name = user

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = True, # enables gzip compression for request bodies
    http_auth = (user, password),
    use_ssl = True,
    url_prefix = 'opensearch_v2',
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False
)

# Check if the index exists
try:
    indices = client.indices.get(index=index_name)
    print(f"Index '{index_name}' exists.")
except Exception as e:
    print(f"Index '{index_name}' does not exist. Error: {e}")


Index 'user08' does not exist. Error: TransportError(503, '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>503 Service Unavailable</title>\n</head><body>\n<h1>Service Unavailable</h1>\n<p>The server is temporarily unable to service your\nrequest due to maintenance downtime or capacity\nproblems. Please try again later.</p>\n</body></html>\n')


locally

In [7]:
import pprint as pp
from opensearchpy import OpenSearch
from opensearchpy import helpers
import requests

host = 'localhost'
port = 9200

# Define the connection to the local OpenSearch server
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_auth = ('admin', '.Wiirijo321'),  
    http_compress = True,  # Enables gzip compression for request bodies
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
)

index_name = "wiirijo"  # Replace with your actual index name

# Check if index exists
if client.indices.exists(index=index_name):
    resp = client.indices.open(index=index_name)
    print(resp)

    print('\n----------------------------------------------------------------------------------- INDEX SETTINGS')
    settings = client.indices.get_settings(index=index_name)
    pp.pprint(settings)

    print('\n----------------------------------------------------------------------------------- INDEX MAPPINGS')
    mappings = client.indices.get_mapping(index=index_name)
    pp.pprint(mappings)

    print('\n----------------------------------------------------------------------------------- INDEX #DOCs')
    print(client.count(index=index_name))
else:
    print("Index does not exist.")


Index does not exist.




### Delete Existing Index (if needed)

In [6]:
client.indices.delete(index=index_name, ignore=[400, 404])
print(f"Index '{index_name}' deleted.")

Index 'wiirijo' deleted.




### Creating new index with mappings
Play around here ig

In [8]:
index_body = {
   "settings": {
      "index": {
         "number_of_replicas": 0,
         "number_of_shards": 4,
         "refresh_interval": "-1", # Keep it off for now, change it to "1s" later (for searching)
         "knn": "true"
      }
   },
   "mappings": {
      "dynamic": "strict",
      "properties": {
         "video_id": {"type": "keyword"},
         "start_timestamp": {"type": "float"},
         "end_timestamp": {"type": "float"},
         "caption": {"type": "text"},
         "caption_bow": {"type": "text"},
         "caption_vec": {
            "type": "knn_vector",
            "dimension": 768,
            "method": {
               "name": "hnsw",
               "space_type": "innerproduct",
               "engine": "nmslib",
               "parameters": {
                  "m": 16,
                  "ef_construction": 200,
               }
            }
         },
         "duration": {"type": "float"},
         "resolution": {"type": "keyword"},
         "keyframe_path": {"type": "keyword"},
         "keyframe_vec": {
            "type": "knn_vector",
            "dimension": 512,
            "method": {
               "name": "hnsw",
               "space_type": "innerproduct",
               "engine": "nmslib",
               "parameters": {
                  "m": 16,
                  "ef_construction": 200,
               }
            }
         }
      }
   }
}



# Create the index
response = client.indices.create(index=index_name, body=index_body)
print(f"Index '{index_name}' created.")
# Check the index settings
settings = client.indices.get_settings(index=index_name)
print("Index settings:")
pp.pprint(settings)
# Check the index mappings
mappings = client.indices.get_mapping(index=index_name)
print("Index mappings:")
pp.pprint(mappings)



Index 'wiirijo' created.
Index settings:
{'wiirijo': {'settings': {'index': {'creation_date': '1744121578980',
                                    'knn': 'true',
                                    'number_of_replicas': '0',
                                    'number_of_shards': '4',
                                    'provided_name': 'wiirijo',
                                    'refresh_interval': '-1',
                                    'replication': {'type': 'DOCUMENT'},
                                    'uuid': 'yfQy-hbuTyulCFJr17rYfw',
                                    'version': {'created': '136407927'}}}}}
Index mappings:
{'wiirijo': {'mappings': {'dynamic': 'strict',
                          'properties': {'caption': {'type': 'text'},
                                         'caption_bow': {'type': 'text'},
                                         'caption_vec': {'dimension': 768,
                                                         'method': {'engine': 'nmslib',



## Video captions

The ActivityNetCaptions dataset https://cs.stanford.edu/people/ranjaykrishna/densevid/ dataset provides a textual description of each videos. Index the video captions on a text field of your OpenSearch index.

### Generating embeddings for Captions

In [1]:
from transformers import AutoTokenizer, AutoModel
import torch
import numpy as np

# Load the pre-trained model and tokenizer
model_name = 'sentence-transformers/all-mpnet-base-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()  # Set the model to evaluation mode

print(f"Loaded model type: {type(model)}") 

def generate_caption_embedding(caption):
    inputs = tokenizer(caption, return_tensors='pt', padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)  # Pass the entire dictionary
        # Mean pooling
        embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()
    return embeddings
    



Loaded model type: <class 'transformers.models.mpnet.modeling_mpnet.MPNetModel'>


### Generate embeddings for Keyframes

dependency:

pip install openai-clip

In [4]:
from PIL import Image
import torch
import open_clip

# Load CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
clip_model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='openai')
clip_model.to(device)
clip_model.eval()

def generate_keyframe_embedding(image_path):
    try:
        image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)
        with torch.no_grad():
            image_features = clip_model.encode_image(image)
            image_features /= image_features.norm(dim=-1, keepdim=True)  # Normalize
        embedding_vector = image_features.cpu().numpy().flatten()
        if len(embedding_vector) == 0:
            print(f"Failed to generate embedding for {image_path}")
            return None
        return embedding_vector
    except Exception as e:
        print(f"Error processing {image_path}: {e}")
        return None




### Indexing Data

In [None]:
from opensearchpy import OpenSearch
from opensearchpy import helpers
import os
import json
import numpy as np
import requests

host = 'api.novasearch.org'
port = 443

user = 'user08' # Add your user name here.
password = '55LL.TTSS' # Add your user password here. For testing only. Don't store credentials in code. 
index_name = user

# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = True, # enables gzip compression for request bodies
    http_auth = (user, password),
    use_ssl = True,
    url_prefix = 'opensearch_v2',
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False
)

# Load combined JSON data
with open('captions/val_1.json', 'r') as file1, open('captions/val_2.json', 'r') as file2:
    val1_data = json.load(file1)
    val2_data = json.load(file2)
combined_data = {**val1_data, **val2_data}

with open('activity_net.v1-3.min.json', 'r') as json_data:
    activity_data = json.load(json_data)

# Output directory for frames
video_dir = 'videos'
output_dir = 'keyframes'

# List all video files in the directory (already downloaded)
video_files = [f for f in os.listdir(video_dir) if f.endswith('.mp4')]

# extract video ID from the filename
selected_videos = []
for video_file in video_files:
    video_id = video_file.split('[')[-1].split(']')[0]
    selected_videos[video_id] = 'v_' + video_id


for clean_id, caption_ds_id in selected_videos.items():
    if caption_ds_id not in combined_data:
        print(f"Video ID {video_id} not found in either val_1 or val_2 datasets.")
        continue
    
    video_data = combined_data[caption_ds_id]
    duration = video_data['duration']
    timestamps = video_data['timestamps']
    captions = video_data['sentences']
    
    # fetch resolution from activity_net.v1-3.min.json
    if clean_id in activity_data['database']:
        video_metadata = activity_data['database'][clean_id]
        resolution = video_metadata.get('resolution', 'unknown')
    else:
        resolution = 'unknown'
        
        
    for idx, (timestamp, caption) in enumerate(zip(timestamps, captions)):
        start_time, end_time = timestamp
        caption_embedding = generate_caption_embedding(caption)
        caption_bow = ' '.join(caption.split())
        caption_vec = caption_embedding.tolist()
        
        # keyframe image extraction
        keyframe_path_val1 = os.path.join(output_dir, f"{video_id}_val_1_frame_{idx}.jpg")
        keyframe_path_val2 = os.path.join(output_dir, f"{video_id}_val_2_frame_{idx}.jpg")
        keyframe_path = keyframe_path_val1 if os.path.exists(keyframe_path_val1) else keyframe_path_val2
        if not os.path.exists(keyframe_path):
            print(f"Keyframe image not found for {video_id} at index {idx}.")
            continue

        keyframe_embedding = generate_keyframe_embedding(keyframe_path)
        keyframe_vec = keyframe_embedding.tolist()
        
        # Prepare the document to be indexed
        doc = {
            'video_id': video_id,
            'start_timestamp': start_time,
            'end_timestamp': end_time,
            'caption': caption,
            'caption_bow': caption_bow,
            'caption_vec': caption_vec,
            'duration': duration,
            'resolution': resolution,
            'keyframe_path': keyframe_path,
            'keyframe_vec': keyframe_vec
        }
        
        # Index the document
        try:
            response = client.index(index=index_name, body=doc)
            print(f"Indexed document for video {video_id} with caption '{caption}'")
        except Exception as e:
            print(f"Error indexing document for video {video_id}: {e}")
            continue
# Refresh the index to make the documents searchable
client.indices.refresh(index=index_name)
print("Index refreshed.")

Keyframe image not found for video v_uqiMw7tQ1Cc at index 0.
Keyframe image not found for video v_uqiMw7tQ1Cc at index 1.
Keyframe image not found for video v_uqiMw7tQ1Cc at index 2.
Keyframe image not found for video v_bXdq2zI1Ms0 at index 0.
Keyframe image not found for video v_bXdq2zI1Ms0 at index 1.
Keyframe image not found for video v_bXdq2zI1Ms0 at index 2.
Keyframe image not found for video v_FsS_NCZEfaI at index 0.
Keyframe image not found for video v_FsS_NCZEfaI at index 1.
Keyframe image not found for video v_FsS_NCZEfaI at index 2.
Keyframe image not found for video v_FsS_NCZEfaI at index 3.
Keyframe image not found for video v_K6Tm5xHkJ5c at index 0.
Keyframe image not found for video v_K6Tm5xHkJ5c at index 1.
Keyframe image not found for video v_K6Tm5xHkJ5c at index 2.
Keyframe image not found for video v_4Lu8ECLHvK4 at index 0.
Keyframe image not found for video v_4Lu8ECLHvK4 at index 1.
Keyframe image not found for video v_4Lu8ECLHvK4 at index 2.
Keyframe image not found

KeyboardInterrupt: 

locally

In [10]:
from opensearchpy import OpenSearch
from opensearchpy import helpers
import os
import json
import numpy as np
import requests

host = 'localhost'
port = 9200

# Define the connection to the local OpenSearch server
client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_auth = ('admin', '.Wiirijo321'),  
    http_compress = True,  # Enables gzip compression for request bodies
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
)

index_name = "wiirijo"  # Replace with your actual index name


# Load combined JSON data
with open('captions/val_1.json', 'r') as file1, open('captions/val_2.json', 'r') as file2:
    val1_data = json.load(file1)
    val2_data = json.load(file2)
combined_data = {**val1_data, **val2_data}

with open('activity_net.v1-3.min.json', 'r') as json_data:
    activity_data = json.load(json_data)

# Output directory for frames
video_dir = 'videos'
output_dir = 'keyframes'

# List all video files in the directory (already downloaded)
video_files = [f for f in os.listdir(video_dir) if f.endswith('.mp4')]

# extract video ID from the filename
selected_videos = {}
for video_file in video_files:
    video_id = video_file.split('[')[-1].split(']')[0]
    selected_videos[video_id] = 'v_' + video_id


for clean_id, caption_ds_id in selected_videos.items():
    if caption_ds_id not in combined_data:
        print(f"Video ID {video_id} not found in either val_1 or val_2 datasets.")
        continue
    
    video_data = combined_data[caption_ds_id]
    duration = video_data['duration']
    timestamps = video_data['timestamps']
    captions = video_data['sentences']
    
    # fetch resolution from activity_net.v1-3.min.json
    if clean_id in activity_data['database']:
        video_metadata = activity_data['database'][clean_id]
        resolution = video_metadata.get('resolution', 'unknown')
    else:
        resolution = 'unknown'
        
        
    for idx, (timestamp, caption) in enumerate(zip(timestamps, captions)):
        start_time, end_time = timestamp
        caption_embedding = generate_caption_embedding(caption)
        caption_bow = ' '.join(caption.split())
        caption_vec = caption_embedding.tolist()
        
        # keyframe image extraction
        keyframe_path_val1 = os.path.join(output_dir, f"{video_id}_val_1_frame_{idx}.jpg")
        keyframe_path_val2 = os.path.join(output_dir, f"{video_id}_val_2_frame_{idx}.jpg")
        keyframe_path = keyframe_path_val1 if os.path.exists(keyframe_path_val1) else keyframe_path_val2
        if not os.path.exists(keyframe_path):
            print(f"Keyframe image not found for {video_id} at index {idx}.")
            continue

        keyframe_embedding = generate_keyframe_embedding(keyframe_path)
        if keyframe_embedding is None:
            print(f"Failed to generate keyframe embedding for {keyframe_path}.")
            continue
        
        keyframe_vec = keyframe_embedding.tolist()
        
        # Prepare the document to be indexed
        doc = {
            'video_id': video_id,
            'start_timestamp': start_time,
            'end_timestamp': end_time,
            'caption': caption,
            'caption_bow': caption_bow,
            'caption_vec': caption_vec,
            'duration': duration,
            'resolution': resolution,
            'keyframe_path': keyframe_path,
            'keyframe_vec': keyframe_vec
        }
        
        # Index the document
        try:
            response = client.index(index=index_name, body=doc)
            print(f"Indexed document for video {video_id} with caption '{caption}'")
        except Exception as e:
            print(f"Error indexing document for video {video_id}: {e}")
            continue
# Refresh the index to make the documents searchable
client.indices.refresh(index=index_name)
print("Index refreshed.")



Indexed document for video eXMF6Skt2To with caption 'A man is skating in a skate park.'
Indexed document for video eXMF6Skt2To with caption ' Several people watch from the sides.'
Indexed document for video eXMF6Skt2To with caption ' A lot of kids are signing paperwork.'




Indexed document for video eXMF6Skt2To with caption ' Skaters go up the ramps.'
Indexed document for video eXMF6Skt2To with caption 'A man with red hair makes a long jump in front of an audience.'
Indexed document for video eXMF6Skt2To with caption ' Several reaction scenes from other people are shown.'




Indexed document for video eXMF6Skt2To with caption ' The man walks away from the track.'
Indexed document for video eXMF6Skt2To with caption ' The man's long jump is replayed several times in various speeds.'
Indexed document for video eXMF6Skt2To with caption ' Another man fouls in his long jump attempt.'




Indexed document for video eXMF6Skt2To with caption ' The first man is shown alternatively bending over and walking around.'
Indexed document for video eXMF6Skt2To with caption ' The first man jogs and claps.'
Indexed document for video eXMF6Skt2To with caption ' The two men hug each other.'




Indexed document for video eXMF6Skt2To with caption ' The first man walks down the track celebrating, with brief audience shots.'
Indexed document for video eXMF6Skt2To with caption ' The first man receives a flag and hugs two other people.'
Indexed document for video eXMF6Skt2To with caption ' The first man unfolds the flag and holds it up.'




Indexed document for video eXMF6Skt2To with caption 'The first man runs around with the flag draped around him.'
Indexed document for video eXMF6Skt2To with caption 'A young man skateboards in the street, then enters in a shop.'
Indexed document for video eXMF6Skt2To with caption ' Young men skateboard in different parts of a city.'




Indexed document for video eXMF6Skt2To with caption ' After, a young men wearing black clothes flips his skateboard and jumps.'
Indexed document for video eXMF6Skt2To with caption ' Young man skates on the street passing over rails and flipping and turning.'
Indexed document for video eXMF6Skt2To with caption ' A juvenile skates on a big court, he pass over rails without falling.'




Indexed document for video eXMF6Skt2To with caption ' Other person skates on a park, then pass over the rails and turning and flipping teh skateboard.'
Indexed document for video eXMF6Skt2To with caption 'A group is gathered in a boxing rink.'
Indexed document for video eXMF6Skt2To with caption ' Two of the pairs are engaged in boxing.'




Indexed document for video eXMF6Skt2To with caption ' They punch and kick at each other.'
Indexed document for video eXMF6Skt2To with caption 'A closeup of a plate of cookies is shown, with more cookies and a jar of milk in the background.'
Indexed document for video eXMF6Skt2To with caption ' A bowl is shown with a series of ingredients added and then mixed.'




Indexed document for video eXMF6Skt2To with caption ' A second bowl is shown with ingredients added and then mixed.'
Indexed document for video eXMF6Skt2To with caption ' The second bowl is mixed into the first bowl.'
Indexed document for video eXMF6Skt2To with caption ' The mixture is covered by plastic wrap.'




Indexed document for video eXMF6Skt2To with caption ' The oven is shown being turned on.'
Indexed document for video eXMF6Skt2To with caption ' A baking parchment is placed on a baking tray.'
Indexed document for video eXMF6Skt2To with caption ' The mixture is shaped into balls.'




Indexed document for video eXMF6Skt2To with caption ' The balls are rolled in white powder and then placed on the baking sheet.'
Indexed document for video eXMF6Skt2To with caption ' The baked cookies are placed on a rack.'
Indexed document for video eXMF6Skt2To with caption ' The beginning scene is repeated.'




Indexed document for video eXMF6Skt2To with caption 'The video starts with footage of people outside with red,white and blue text explaining what the video is about.'
Indexed document for video eXMF6Skt2To with caption ' Men attempt various tricks using bowling balls.'
Indexed document for video eXMF6Skt2To with caption '  A man knocks down pins in a skating pool drop in.'




Indexed document for video eXMF6Skt2To with caption '  Another man bowls a ball around bowling pins lined up in a curve on a bowling lane and the bowling ball doesn't hit any of them.'
Indexed document for video eXMF6Skt2To with caption '  One man shows another how to hold the bowling ball.'
Indexed document for video eXMF6Skt2To with caption '  A bowling ball in a skate drop breaks something inside the drop.'




Indexed document for video eXMF6Skt2To with caption '  A man bowls two bowling balls at once.'
Indexed document for video eXMF6Skt2To with caption '  Two men sitting back on chairs talk to each other.'
Indexed document for video eXMF6Skt2To with caption '  A man bowls a strike in a bowling alley.'




Indexed document for video eXMF6Skt2To with caption '  A group of men are walking in a straight line while clapping bowling pins together.'
Indexed document for video eXMF6Skt2To with caption '  Two men interview each other on brown chairs.'
Indexed document for video eXMF6Skt2To with caption '  A bowling ball hits the camera.'




Indexed document for video eXMF6Skt2To with caption '  A blue screen appears and a red, white and blue logo with a W on it appears.'
Keyframe image not found for eXMF6Skt2To at index 13.
Indexed document for video eXMF6Skt2To with caption 'A chef stands as she talks near a kitchen island.'
Indexed document for video eXMF6Skt2To with caption ' The chef grabs a bowl of salad and shows it off.'




Indexed document for video eXMF6Skt2To with caption ' The chef puts down the bowl to grab another bowl of chopped apples which she throws into a bowl.'
Indexed document for video eXMF6Skt2To with caption ' The chef grabs a cup of nuts and throws it on top of a salad.'
Indexed document for video eXMF6Skt2To with caption ' The chef throws cheese into the salad.'




Indexed document for video eXMF6Skt2To with caption ' The chef shows off her salad before proceeding to pour olive oil into a cup.'
Indexed document for video eXMF6Skt2To with caption ' The chef squeezes a lemon into the same cup.'
Indexed document for video eXMF6Skt2To with caption ' The chef pours soy sauce into the cup, too.'




Indexed document for video eXMF6Skt2To with caption ' With a spoon, the chef pours in a spice and some salt into the cup.'
Indexed document for video eXMF6Skt2To with caption ' The chef shows off shredded garlic before throwing it into the cup.'
Indexed document for video eXMF6Skt2To with caption ' With a spoon, the chef stirs all of the ingredients in the cup.'




Indexed document for video eXMF6Skt2To with caption ' The chef pours the ingredients into the salad.'
Indexed document for video eXMF6Skt2To with caption ' The chef grabs two salad utensils and tosses the salad.'
Keyframe image not found for eXMF6Skt2To at index 13.
Indexed document for video eXMF6Skt2To with caption 'A man styles his hair in front of a mirror.'




Indexed document for video eXMF6Skt2To with caption ' The man styles his facial hair in front of a mirror.'
Indexed document for video eXMF6Skt2To with caption ' The man puts a jacket on in front of the mirror.'
Indexed document for video eXMF6Skt2To with caption ' A close up of the man's shoes are shown.'




Indexed document for video eXMF6Skt2To with caption ' The man points at the mirror.'
Indexed document for video eXMF6Skt2To with caption ' The man gathers materials to clean his shoes.'
Indexed document for video eXMF6Skt2To with caption ' The man cleans his shoes.'
Indexed document for video eXMF6Skt2To with caption ' The man smells a can of polish and makes a face.'




Indexed document for video eXMF6Skt2To with caption ' The man uses the polish on his shoes.'
Indexed document for video eXMF6Skt2To with caption ' The man brushes his shoes with occasional blowing.'
Indexed document for video eXMF6Skt2To with caption ' The man shines his shoes.'




Indexed document for video eXMF6Skt2To with caption ' The man talks to the camera.'
Indexed document for video eXMF6Skt2To with caption 'A young man is sitting down on a brown couch blowing out smoke from a hookah.'
Indexed document for video eXMF6Skt2To with caption 'After the smoke clears,he begins talking excessively and making several hand motions.'




Indexed document for video eXMF6Skt2To with caption 'When he removes his hand from his throat,he takes another puff of the hookah and blows it out again.'
Indexed document for video eXMF6Skt2To with caption 'All of a sudden,he appears with a plastic cup with holes on both ends in his left hand.'
Indexed document for video eXMF6Skt2To with caption 'Another pull is taken and he blows through the cup to create rings as he exhales.'




Indexed document for video eXMF6Skt2To with caption 'A large blue hookah then appears as well as the animated version of the man smoking on his website.'
Indexed document for video eXMF6Skt2To with caption 'We see a stadium and an opening screen.'
Indexed document for video eXMF6Skt2To with caption ' We see men at a table and shots of an arena and people preparing to ride dirt bikes.'




Indexed document for video eXMF6Skt2To with caption ' A half naked lady is spraying a car with water.'
Indexed document for video eXMF6Skt2To with caption ' A man has blood on his face  and we see a person fall off their bike.'
Indexed document for video eXMF6Skt2To with caption ' We see shots of men and see a lady hold a sign to signal the start of the race.'




Indexed document for video eXMF6Skt2To with caption ' The races start and we see fire pyrotechnics.'
Indexed document for video eXMF6Skt2To with caption ' A man is falling a man is dripping with water and the lady is spraying a water from a hose.'
Indexed document for video eXMF6Skt2To with caption ' We see a man fall off his bike and bikers on their bike the crowd and a lady poses for a photo and we see biker riding.'




Indexed document for video eXMF6Skt2To with caption '  We see a lady with a sponge washing a bike and men riding their bmx bikes in the dirt.'
Indexed document for video eXMF6Skt2To with caption ' We see a lady dance and a group of ladies posing.'
Indexed document for video eXMF6Skt2To with caption ' We see men ride on sand dunes and dirt hills before the stadium and a river.'




Indexed document for video eXMF6Skt2To with caption ' We see fireworks in the stadium and bikers.'
Indexed document for video eXMF6Skt2To with caption ' The title ending title screen loads.'
Index refreshed.


In [6]:
caption = "A person is skateboarding in a park."
embedding = generate_caption_embedding(caption)
print(embedding.shape) 

(768,)


In [9]:
mapping = client.indices.get_mapping(index=index_name)
print(mapping)


{'wiirijo': {'mappings': {'dynamic': 'strict', 'properties': {'caption': {'type': 'text'}, 'caption_bow': {'type': 'text'}, 'caption_vec': {'type': 'knn_vector', 'dimension': 768, 'method': {'engine': 'nmslib', 'space_type': 'innerproduct', 'name': 'hnsw', 'parameters': {'ef_construction': 200, 'm': 16}}}, 'duration': {'type': 'float'}, 'end_timestamp': {'type': 'float'}, 'keyframe_path': {'type': 'keyword'}, 'keyframe_vec': {'type': 'knn_vector', 'dimension': 512, 'method': {'engine': 'nmslib', 'space_type': 'innerproduct', 'name': 'hnsw', 'parameters': {'ef_construction': 200, 'm': 16}}}, 'resolution': {'type': 'keyword'}, 'start_timestamp': {'type': 'float'}, 'video_id': {'type': 'keyword'}}}}}




# Testing

## Check Index Health and Mapping

In [11]:
# Check the health of the OpenSearch cluster
cluster_health = client.cluster.health()
print("Cluster Health:")
pp.pprint(cluster_health)

# Retrieve and print the mapping of the index
index_mapping = client.indices.get_mapping(index=index_name)
print("\nIndex Mapping:")
pp.pprint(index_mapping)

Cluster Health:
{'active_primary_shards': 10,
 'active_shards': 10,
 'active_shards_percent_as_number': 83.33333333333334,
 'cluster_name': 'docker-cluster',
 'delayed_unassigned_shards': 0,
 'discovered_cluster_manager': True,
 'discovered_master': True,
 'initializing_shards': 0,
 'number_of_data_nodes': 1,
 'number_of_in_flight_fetch': 0,
 'number_of_nodes': 1,
 'number_of_pending_tasks': 0,
 'relocating_shards': 0,
 'status': 'yellow',
 'task_max_waiting_in_queue_millis': 0,
 'timed_out': False,
 'unassigned_shards': 2}

Index Mapping:
{'wiirijo': {'mappings': {'dynamic': 'strict',
                          'properties': {'caption': {'type': 'text'},
                                         'caption_bow': {'type': 'text'},
                                         'caption_vec': {'dimension': 768,
                                                         'method': {'engine': 'nmslib',
                                                                    'name': 'hnsw',
                



## Search the Index

In [12]:
query = {
    "query": {
        "match": {
            "caption": "skateboarding in a park"
        }
    }
}

response = client.search(index=index_name, body=query)
print("Search Results:")
for hit in response['hits']['hits']:
    print(f"Caption: {hit['_source']['caption']}, Score: {hit['_score']}")


Search Results:
Caption: A man is skating in a skate park., Score: 4.5011463
Caption: A group is gathered in a boxing rink., Score: 3.104506
Caption:   A man knocks down pins in a skating pool drop in., Score: 3.0368183
Caption:  Other person skates on a park, then pass over the rails and turning and flipping teh skateboard., Score: 3.0077438
Caption: A young man skateboards in the street, then enters in a shop., Score: 2.9684896
Caption:  The man puts a jacket on in front of the mirror., Score: 2.699169
Caption:   A group of men are walking in a straight line while clapping bowling pins together., Score: 2.6349707
Caption:   We see a lady with a sponge washing a bike and men riding their bmx bikes in the dirt., Score: 2.5260642
Caption: A closeup of a plate of cookies is shown, with more cookies and a jar of milk in the background., Score: 2.346759
Caption: All of a sudden,he appears with a plastic cup with holes on both ends in his left hand., Score: 2.0115752




## Semantic Search with KNN Vectors

In [13]:
import torch
from transformers import AutoTokenizer, AutoModel

# Generate embedding for your query
query_text = "A man skateboarding on a park"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()

inputs = tokenizer(query_text, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
    query_embedding = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()

# Construct the query for OpenSearch
query = {
    "query": {
        "knn": {
            "caption_vec": {
                "vector": query_embedding.tolist(),
                "k": 5  # Number of nearest neighbors to retrieve
            }
        }
    }
}

response = client.search(index=index_name, body=query)
print("KNN Search Results:")
for hit in response['hits']['hits']:
    print(f"Caption: {hit['_source']['caption']}, Score: {hit['_score']}")




KNN Search Results:
Caption: A man is skating in a skate park., Score: 7.110288
Caption:  Other person skates on a park, then pass over the rails and turning and flipping teh skateboard., Score: 5.451238
Caption:  Young man skates on the street passing over rails and flipping and turning., Score: 4.1757374
Caption: A young man skateboards in the street, then enters in a shop., Score: 4.1705737
Caption:  Young men skateboard in different parts of a city., Score: 4.036891
Caption:   A man knocks down pins in a skating pool drop in., Score: 3.9605217
Caption:  Skaters go up the ramps., Score: 3.9509585
Caption:  After, a young men wearing black clothes flips his skateboard and jumps., Score: 3.9086497
Caption:   A man bowls a strike in a bowling alley., Score: 3.745032
Caption:  A juvenile skates on a big court, he pass over rails without falling., Score: 3.3034587


