# Deep Dive Reverse Video Search

In the [previous tutorial](./1_reverse_video_search_engine.ipynb), we've learnt how to build a reverse video search engine. Now let's make the solution more feasible in production.

## Preparation

Let's recall preparation steps first:
1. Install packages
2. Prepare data
3. Start milvus

### Install packages

Make sure you have installed required python packages:

| package |
| -- |
| pymilvus |
| towhee |
| towhee.models |
| pillow |
| ipython |
| fastapi |

In [1]:
! python -m pip install -q pymilvus towhee towhee.models

### Prepare data

This tutorial will use a small data extracted from [Kinetics400](https://www.deepmind.com/open-source/kinetics). You can download the subset from [Github](https://github.com/towhee-io/examples/releases/download/data/reverse_video_search.zip). 

The data is organized as follows:
- **train:** candidate videos, 20 classes, 10 videos per class (200 in total)
- **test:** query videos, same 20 classes as train data, 1 video per class (20 in total)
- **reverse_video_search.csv:** a csv file containing an ***id***, ***path***, and ***label*** for each video in train data

In [2]:
! curl -L https://github.com/towhee-io/examples/releases/download/data/reverse_video_search.zip -O
! unzip -q -o reverse_video_search.zip

For later steps to easier get videos & measure results, we build some helpful functions in advance:
- **ground_truth:** get ground-truth video ids for the query video by its path

In [5]:
import pandas as pd

df = pd.read_csv('./reverse_video_search.csv')

id_video = df.set_index('id')['path'].to_dict()
label_ids = {}
for label in set(df['label']):
    label_ids[label] = list(df[df['label']==label].id)
    

def ground_truth(path):
    label = path.split('/')[-2]
    return label_ids[label]

### Start Milvus

Before getting started with the engine, we also need to get ready with Milvus. Please make sure that you have started a Milvus service ([Milvus Guide](https://milvus.io/docs/v2.0.x/install_standalone-docker.md)).
Here we prepare a function to work with a Milvus collection with the following parameters:
- [L2 distance metric](https://milvus.io/docs/v2.0.x/metric.md#Euclidean-distance-L2)
- [IVF_FLAT index](https://milvus.io/docs/v2.0.x/index.md#IVF_FLAT).

In [6]:
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

def create_milvus_collection(collection_name, dim):
    connections.connect(host='localhost', port='19530')
    
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)
    
    fields = [
    FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=False),
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
    ]
    schema = CollectionSchema(fields=fields, description='deep dive reverse video search')
    collection = Collection(name=collection_name, schema=schema)

    # create IVF_FLAT index for collection.
    index_params = {
        'metric_type':'L2',
        'index_type':"IVF_FLAT",
        'params':{"nlist": 400}
    }
    collection.create_index(field_name="embedding", index_params=index_params)
    return collection

### Build Engine

Now we are ready to build a reverse-video-search engine. Here we show an engine built with `MVIT model` and its performance to make comparasion later.

In [11]:
import towhee
import time

start = time.time()

collection = create_milvus_collection('mvit', 768)

dc = (
    towhee.read_csv('reverse_video_search.csv')
      .runas_op['id', 'id'](func=lambda x: int(x))
      .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
      .video_classification['frames', 'vec'].pytorchvideo(
            model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
      .tensor_normalize['vec', 'vec']()
      .to_milvus['id', 'vec'](collection=collection, batch=10)
)

end = time.time()

print('Total insert time: %.2fs'%(end-start))
print('Total number of inserted data is {}.'.format(collection.num_entities))

start = time.time()

benchmark = (
    towhee.glob['path']('./test/*/*.mp4')
        .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
        .video_classification['frames', 'vec'].pytorchvideo(
            model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
        .tensor_normalize['vec', 'vec']()
        .milvus_search['vec', 'result'](collection=collection, limit=10)
        .runas_op['path', 'ground_truth'](func=ground_truth)
        .runas_op['result', 'result'](func=lambda res: [x.id for x in res])
        .with_metrics(['mean_hit_ratio', 'mean_average_precision'])
        .evaluate['ground_truth', 'result']('mvit')
        .report()
)

end = time.time()

print('Total search time: %.2fs'%(end-start))

Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main


Total insert time: 169.33s
Total number of inserted data is 200.


Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main


Unnamed: 0,mean_hit_ratio,mean_average_precision
mvit,0.785,0.864911


Total search time: 21.16s


## Dimensionality Reduction

In production, memory consumption is always a major concern, which can by relieved by minimizing the embedding dimension. Random projection is a dimensionality reduction method for a set vectors in Euclidean space. Since this method is fast and requires no training, we'll try this technique and compare performance with MVIT model:

First let's get a quick look at the engine performance without dimension reduction. The embedding dimension is 768.

To reduce dimension, we can apply a projection matrix in proper size to each original embedding. We can just add an operator `.runas_op['vec', 'vec'](func=lambda x: np.dot(x, projection_matrix))` right after an video embedding is generated. Let's see how's the engine performance with embedding dimension down to 128.

In [10]:
import numpy as np

projection_matrix = np.random.normal(scale=1.0, size=(768, 128))

# def dim_reduce(vec):
#     return np.dot(vec, projection_matrix)

start = time.time()

collection = create_milvus_collection('mvit_128', 128)

dc = (
    towhee.read_csv('reverse_video_search.csv')
      .runas_op['id', 'id'](func=lambda x: int(x))
      .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
      .video_classification['frames', 'vec'].pytorchvideo(
            model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
      .runas_op['vec', 'vec'](func=lambda x: np.dot(x, projection_matrix))
      .tensor_normalize['vec', 'vec']()
      .to_milvus['id', 'vec'](collection=collection, batch=10)
)

end = time.time()

print('Total insert time: %.2fs'%(end-start))
print('Total number of inserted data is {}.'.format(collection.num_entities))

start = time.time()

benchmark = (
    towhee.glob['path']('./test/*/*.mp4')
        .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
        .video_classification['frames', 'vec'].pytorchvideo(
            model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
        .runas_op['vec', 'vec'](func=lambda x: np.dot(x, projection_matrix))
        .tensor_normalize['vec', 'vec']()
        .milvus_search['vec', 'result'](collection=collection, limit=10)
        .runas_op['path', 'ground_truth'](func=ground_truth)
        .runas_op['result', 'result'](func=lambda res: [x.id for x in res])
        .with_metrics(['mean_hit_ratio', 'mean_average_precision'])
        .evaluate['ground_truth', 'result']('mvit_128')
        .report()
)

end = time.time()

print('Total search time: %.2fs'%(end-start))

Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main


Total insert time: 162.57s
Total number of inserted data is 200.


Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main


Unnamed: 0,mean_hit_ratio,mean_average_precision
mvit_128,0.775,0.845302


Total search time: 19.39s


It's surprising that the performance is not affected obviously. Both mHR and mAP descrease by 0.01 while the embedding size are reduced by 6 times (dimension from 768 to 128).

## Parallel Execution

We are able to enable parallel execution by simply calling `set_parallel` within the pipeline. It tells Towhee to process the data in parallel. The code below enables parallel execution on the above example. It shows that the execution speeds up by 25% for a data size of 200 videos. If you use a larger data, the improvement by parallel execution should be more obvious.

In [12]:
start = time.time()

collection = create_milvus_collection('mvit', 768)

dc = (
    towhee.read_csv('reverse_video_search.csv')
      .runas_op['id', 'id'](func=lambda x: int(x))
      .set_parallel(5)
      .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
      .video_classification['frames', 'vec'].pytorchvideo(
            model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
      .tensor_normalize['vec', 'vec']()
      .to_milvus['id', 'vec'](collection=collection, batch=10)
)

end = time.time()

print('Total insert time: %.2fs'%(end-start))
print('Total number of inserted data is {}.'.format(collection.num_entities))

Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main


Total insert time: 125.62s
Total number of inserted data is 200.


## Exception Safe

When we have large-scale data, there may be some bad data that will cause errors. Typically, the users don't want such errors to break the system in production. Therefore, the pipeline should continue to process the rest of the videos and report broken ones.

Towhee supports an `exception-safe` execution mode that allows the pipeline to continue on exceptions and represent the exceptions with `Empty` values. The user can choose how to deal with the empty values at the end of the pipeline. During the query below, there are 4 files in total under the `exception` folder, one of them is broken. With `exception-safe`, it will print the ERROR but NOT terminate the process. As you can see from results, `drop_empty` deletes empty data.

In [21]:
(
    towhee.glob['path']('./exception/*')
          .exception_safe()
          .video_decode.ffmpeg['path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 32})
          .video_classification['frames', 'vec'].pytorchvideo(
              model_name='mvit_base_32x3', predict=False, skip_preprocess=True)
          .tensor_normalize['vec', 'vec']()
          .milvus_search['vec', 'result'](collection=collection, limit=10)
          .runas_op['result', 'res_path'](func=lambda res: [id_video[x.id] for x in res])
          .drop_empty()
          .select['path', 'res_path']()
          .show()
)

Using cache found in /home/mengjia.gu/.cache/torch/hub/facebookresearch_pytorchvideo_main
2022-06-01 21:35:11,519 - 140445661360704 - video_decoder.py-video_decoder:121 - ERROR: moov atom not found


path,res_path
./exception/kDuAS29BCwk.mp4,"[./train/chopping_wood/Fq-N6hpOTB...,./train/chopping_wood/tVfchvUzas...,./train/chopping_wood/aXO-sNzhiE...,./train/chopping_wood/WjIEsPkw5R...,...] len=10"
./exception/ty4UQlowp0c.mp4,"[./train/eating_carrots/9OZhQqMhX...,./train/eating_carrots/y1U6Z2ZYQ...,./train/eating_carrots/bTCznQiu0...,./train/eating_carrots/V7DUq0JJn...,...] len=10"
./exception/rJu8mSNHX_8.mp4,"[./train/pumping_fist/FGZ_lEaHCws...,./train/eating_carrots/Ou1w86qEr...,./train/eating_carrots/WkwzsrDd-...,./train/eating_carrots/bTCznQiu0...,...] len=10"
