# Database Introduction
[MUSCLE VCD](https://www.rocq.inria.fr/imedia/civr-bench/data.html) is a video copy detection dataset which contains some repetitive videos or clips.
We have downloaded this dataset in our server.
This dataset contains three sub folders: master, st1 and st2.
```
├── master
│   ├── movie100.mpg
│   ├── movie101.mpg
│   ├── movie10.mpg
│   ├── ...
│   ├── movie98.mpg
│   ├── movie99.mpg
│   └── movie9.mpg
├── st1
│   ├── ST1Query10.mpg
│   ├── ...
│   ├── ST1Query7.mpg
│   ├── ST1Query8.mpg
│   └── ST1Query9.mpg
└── st2
    ├── ST2Query1.mpg
    ├── ST2Query2.mpg
    └── ST2Query3.mpg
```

- Folder master:
Contains about 100 videos coming from different sources: web video clips, TV archives, movies. The videos cover very large kind of programs: documentaries, movies, sports events, TV shows, cartoons  etc. The videos have different bitrates, different resolutions and different video format.
- Folder st1:
A set of video is used as queries and each query returned an answer: the file is a copy of a video (or of a part of a video) in the database or the file was not a copy. The copy data can be reencoded, noised, or slightly retouched.
The ground truth of st1 set is as below:
```
ST1Query1 movie27
ST1Query2 not_in_db
ST1Query3 movie8
ST1Query4 not_in_db
ST1Query5 movie44
ST1Query6 movie76
ST1Query7 not_in_db
ST1Query8 not_in_db
ST1Query9 movie9
ST1Query10 movie21
ST1Query11 movie37
ST1Query12 not_in_db
ST1Query13 movie11
ST1Query14 movie17
ST1Query15 movie68
```
Left is query id, and the right is copy movie id in the database. 'not_in_db' means there is no copy video in the database.
- Folder st2:
Transformed extract have been inserted in videos not in the database. The goal is to find these segments with the boundaries despite the transformations. Applied transformations can be very diversified: cropping; fade cuts; insertion of logos, borders, texts, moving texts, moving characters, etc.
The ground truth of st2 set is as below:
```
ST2Query1 0:01:01 0:01:43 Movie30 00:05:49
ST2Query1 0:02:52 0:04:07 Movie55 00:19:12
ST2Query1 0:07:49 0:08:44 Movie33 00:01:29
ST2Query1 0:09:09 0:11:01 Movie38 00:15:05
ST2Query1 0:11:20 0:12:47 Movie43 00:38:38
ST2Query1 0:13:30 0:14:11 Movie50 00:01:23
ST2Query2 0:00:40 0:02:05 movie98 00:08:04
ST2Query2 0:03:30 0:04:10 movie20 00:03:35
ST2Query2 0:04:30 0:05:34 movie27 00:01:43
ST2Query2 0:06:21 0:08:24 movie26 00:15:07
ST2Query2 0:08:40 0:09:17 movie89 00:08:06
ST2Query2 0:10:36 0:11:32 movie82 00:06:30
ST2Query2 0:12:41 0:14:24 movie59 00:13:39
ST2Query2 0:16:28 0:17:09 movie13 00:04:51
ST2Query3 0:01:08 0:02:06 movie46 00:31:34
ST2Query3 0:03:27 0:04:07 movie15 00:05:45
ST2Query3 0:04:54 0:05:19 movie16 00:40:36
ST2Query3 0:06:18 0:06:49 movie18 00:00:55
ST2Query3 0:07:56 0:08:24 movie99 00:48:27
ST2Query3 0:10:02 0:10:46 movie65 00:07:29
ST2Query3 0:11:23 0:11:52 movie23 00:04:25
```


From left to right means:

query video id, query video start time, query video end time, copy video id, copy video start time.

In fact, the end time of copy video = copy video start time + (query video end time - query video start time), because the length of copy clips are the same.

# St1 Task
St1 task is a video granularity task. What we need to do is: given a query video in st1 folder, retrieval the copy video from the master folder.

[DnS(Distill-and-Select)](https://arxiv.org/abs/2106.13266) is a model framework, which provides fine-grained student models that can get a good performance in corresponding tasks.

We first define a similarity calculation method as below and then try the fine-grained attention model and binary model with pretrained weights. This dataset feature is extracted by [feature_extracting.py](../feature_extracting.py)  in the `../features/muscle_features.hdf5` file.

In [1]:
import sys

sys.path.append('../')
from tqdm import tqdm
from utils.data_utils import collate_eval
from torch.utils.data import DataLoader
import torch
from datasets.generators import DatasetGenerator
from model.students import FineGrainedStudent
import json


@torch.no_grad()
def calculate_similarities_to_queries(model, queries, target, args):
    similarities = []
    batch_sz = 2048 if 'batch_sz_sim' not in args else args.batch_sz_sim
    for i, query in enumerate(queries):
        if query.device.type == 'cpu':
            query = query.to(args.gpu_id)
        sim = []
        for b in range(target.shape[0] // batch_sz + 1):
            batch = target[b * batch_sz: (b + 1) * batch_sz]
            if batch.shape[0] >= 4:
                s = model.calculate_video_similarity(query, batch)
                sim.append(s)
        sim = torch.mean(torch.cat(sim, 0))
        similarities.append(sim.cpu().numpy())
    return similarities


@torch.no_grad()
def query_vs_target(model, dataset, args):
    # Create a video generator for the queries
    generator = DatasetGenerator(args.dataset_hdf5, dataset.get_queries())
    loader = DataLoader(generator, num_workers=args.workers, collate_fn=collate_eval)

    # Extract features of the queries
    all_db, queries, queries_ids = set(), [], []
    print('\n> Extract features of the query videos')
    for video in tqdm(loader):
        video_features = video[0][0]
        video_id = video[2][0]
        if video_id:
            # print('video_features.shape = ', video_features.shape)
            features = model.index_video(video_features.to(args.gpu_id))
            if 'load_queries' in args and not args.load_queries: features = features.cpu()
            all_db.add(video_id)
            queries.append(features)
            queries_ids.append(video_id)

    # Create a video generator for the database video
    generator = DatasetGenerator(args.dataset_hdf5, dataset.get_database())
    loader = DataLoader(generator, num_workers=args.workers, collate_fn=collate_eval)

    # Calculate similarities between the queries and the database videos
    similarities = dict({query: dict() for query in queries_ids})
    print('\n> Calculate query-target similarities')
    for video in tqdm(loader):
        video_features = video[0][0]
        video_id = video[2][0]
        if video_id:
            features = model.index_video(video_features.to(args.gpu_id))
            sims = calculate_similarities_to_queries(model, queries, features, args)
            all_db.add(video_id)
            for i, s in enumerate(sims):
                similarities[queries_ids[i]][video_id] = float(s)

    # with open(f'muscle_vcd_fg_student_{"attention" if args.attention else "binary"}.json', 'w') as f:
    #     f.write(json.dumps(similarities))
    return similarities

### Fine grain attention student model.

In [2]:
from datasets import MUSCLE_VCD
import argparse

args = argparse.Namespace(dataset_hdf5='../features/muscle_features.hdf5', workers=8, gpu_id='cuda:0', attention=True,
                          binarization=False)
dataset = MUSCLE_VCD(video_root='../muscle_vcd', query='st1')
fg_attention_student_model = FineGrainedStudent(attention=args.attention,
                                                binarization=args.binarization,
                                                pretrained=True).to(args.gpu_id)
fg_attention_student_similarities = query_vs_target(fg_attention_student_model, dataset, args)

len of master =  104
len of st1 =  18
len of st2 =  4
len of all_data_file_list =  126

> Extract features of the query videos


100%|██████████| 15/15 [00:01<00:00, 11.89it/s]



> Calculate query-target similarities


100%|██████████| 101/101 [00:11<00:00,  8.97it/s]


In [3]:
fg_attention_student_similarities

{'ST1Query10.mpg': {'movie98.mpg': -0.7432085275650024,
  'movie8.mpg': -0.6678217649459839,
  'movie99.mpg': -0.6354598999023438,
  'movie6.mpg': -0.5729501247406006,
  'movie22.mpg': -0.5899425745010376,
  'movie3.mpg': -0.8639711141586304,
  'movie23.mpg': -0.663490891456604,
  'movie34.mpg': -0.8173862099647522,
  'movie42.mpg': -0.8414761424064636,
  'movie92.mpg': -0.9124811887741089,
  'movie5.mpg': -0.7290552854537964,
  'movie80.mpg': -0.4866667091846466,
  'movie100.mpg': -0.9103272557258606,
  'movie63.mpg': -0.636135458946228,
  'movie93.mpg': -0.9143726825714111,
  'movie37.mpg': -0.7826518416404724,
  'movie4.mpg': -0.6173480153083801,
  'movie69.mpg': -0.8035256266593933,
  'movie49.mpg': -0.5180991291999817,
  'movie38.mpg': -0.7654179334640503,
  'movie59.mpg': -0.6346622705459595,
  'movie75.mpg': -0.6877409219741821,
  'movie60.mpg': -0.6740655899047852,
  'movie90.mpg': -0.8951758742332458,
  'movie16.mpg': -0.8612357378005981,
  'movie46.mpg': -0.6797227263450623,


In [4]:
thresh = 0

print('fine grain attention student predict result:\n')
for query_id, sim_dict in fg_attention_student_similarities.items():
    for ref_id, sim in sim_dict.items():
        if sim > thresh:
            print(f'{query_id}->{ref_id}')

fine grain attention student predict result:

ST1Query10.mpg->movie21.mpg
ST1Query11.mpg->movie37.mpg
ST1Query5.mpg->movie44.mpg
ST1Query6.mpg->movie76.mpg
ST1Query3.mpg->movie8.mpg
ST1Query1.mpg->movie27.mpg
ST1Query9.mpg->movie9.mpg
ST1Query13.mpg->movie11.mpg
ST1Query15.mpg->movie68.mpg
ST1Query14.mpg->movie17.mpg


We can see this result is absolutely correct and is all the same with ground truth, which means the precision and recall are both 100%.

### Fine grain binary student model.

In [5]:
args = argparse.Namespace(dataset_hdf5='../features/muscle_features.hdf5', workers=8, gpu_id='cuda:0', attention=False,
                          binarization=True)
dataset = MUSCLE_VCD(video_root='../muscle_vcd', query='st1')
fg_binary_student_model = FineGrainedStudent(attention=args.attention,
                                             binarization=args.binarization,
                                             pretrained=True).to(args.gpu_id)
fg_binary_student_similarities = query_vs_target(fg_binary_student_model, dataset, args)

len of master =  104
len of st1 =  18
len of st2 =  4
len of all_data_file_list =  126

> Extract features of the query videos


100%|██████████| 15/15 [00:03<00:00,  4.54it/s]



> Calculate query-target similarities


100%|██████████| 101/101 [00:32<00:00,  3.08it/s]


In [6]:
fg_binary_student_similarities

{'ST1Query10.mpg': {'movie98.mpg': -0.8027970194816589,
  'movie8.mpg': -0.7593562006950378,
  'movie99.mpg': -0.7795343399047852,
  'movie6.mpg': -0.7349483966827393,
  'movie22.mpg': -0.6819246411323547,
  'movie3.mpg': -0.9040968418121338,
  'movie23.mpg': -0.7317216396331787,
  'movie34.mpg': -0.8737401366233826,
  'movie42.mpg': -0.8739299774169922,
  'movie92.mpg': -0.9169495701789856,
  'movie5.mpg': -0.8654197454452515,
  'movie80.mpg': -0.6059709787368774,
  'movie100.mpg': -0.9080538749694824,
  'movie63.mpg': -0.7406996488571167,
  'movie93.mpg': -0.9176517128944397,
  'movie37.mpg': -0.8500936031341553,
  'movie4.mpg': -0.7576103210449219,
  'movie69.mpg': -0.8300360441207886,
  'movie49.mpg': -0.5875372886657715,
  'movie38.mpg': -0.8312634229660034,
  'movie59.mpg': -0.7337921857833862,
  'movie75.mpg': -0.776633620262146,
  'movie60.mpg': -0.7627720832824707,
  'movie90.mpg': -0.9056084752082825,
  'movie16.mpg': -0.8824436068534851,
  'movie46.mpg': -0.7757455706596375,

In [7]:
thresh = 0

print('fine grain binary student predict result:\n')
for query_id, sim_dict in fg_binary_student_similarities.items():
    for ref_id, sim in sim_dict.items():
        if sim > thresh:
            print(f'{query_id}->{ref_id}')

fine grain binary student predict result:

ST1Query10.mpg->movie21.mpg
ST1Query11.mpg->movie37.mpg
ST1Query5.mpg->movie44.mpg
ST1Query6.mpg->movie76.mpg
ST1Query3.mpg->movie8.mpg
ST1Query1.mpg->movie27.mpg
ST1Query9.mpg->movie9.mpg
ST1Query13.mpg->movie11.mpg
ST1Query15.mpg->movie68.mpg
ST1Query14.mpg->movie17.mpg


Also, we can see this result is absolutely correct and is all the same with ground truth and the result of attention student model, which means the precision and recall are both 100%.

So two kinds of DnS pretrained student model can solve ST1 query task perfectly.

# St2 Task

St2 task is a clip granularity task. What we need to do is: given a query video in st2 folder, retrieval the copy clips from the master folder. The copy clips can be a small part of the video, and variety transformations can be applied in them. So this task is more challenging.
On the other hand, DnS framework can not solve clip granularity task. So we must find another way to solve this problem.

[VCSL](https://arxiv.org/abs/2203.02654) propose a framework to retrieval copy clips. It is roughly work in this way:

First embedding the sampled video frames, then calculating the similarity frame-by-frame to get a similarity matrix, followed by Video Temporal Alignment of this similarity matrix and finding the respective copy time period.

![](vcsl_framework.png)
In fact, we can combine DnS models and VCSL framework, that is: to get the similarity matrix by DnS student models, and then use different Video Temporal Alignment methods to retrieval the copy clips.

### Calculate similarity matrix.
We use DnS pretrained attention student model to get features and then calcu DnS similarity matrix.

In [8]:
from calcu_similarity_matrix import calcu_similarity_matrix

dataset = MUSCLE_VCD(video_root='../muscle_vcd', query='st2')
args = argparse.Namespace(feature_path='../features/muscle_features.hdf5',
                          similarity_type='DnS',
                          dns_student_type='attention',
                          output_dir='./note_book_sim_matrix/muscle-dns_backbone-st2_pair-dns_sim',
                          workers=8,
                          device='cuda:0',
                          pair_file=None
                          )

calcu_similarity_matrix(dataset, args)

len of master =  104
len of st1 =  18
len of st2 =  4
len of all_data_file_list =  126
before filter num = 0, in_hdf5_num = 0, not_in_hdf5_num = 0

> Extract features of the query videos


100%|██████████| 3/3 [00:01<00:00,  2.34it/s]



> Calculate query-target similarities


100%|██████████| 101/101 [00:12<00:00,  8.09it/s]


Finish processing, exit...


### Video Temporal Alignment
VCSL lists some kinds of Video Temporal Alignment algorithm. We experimentally prove that, a good choice is to use TN algorithm, which can achieve a good result without pretrained or fine-tune. 


In [9]:
from vcsl.vta import build_vta_model
from utils import DataType, build_writer
from vcsl.datasets import ItemDataset
import os

args = argparse.Namespace(
    input_root='./note_book_sim_matrix/muscle-dns_backbone-st2_pair-dns_sim',
    batch_size=32,
    data_workers=8,
    request_workers=8,
    alignment_method='TN',
    output_root='./note_book_default_pred',
    result_file='muscle-dns_backbone-st2_pairs-dns_sim-TN-pred.json',
    tn_max_step=10,
    tn_top_K=5,
    max_path=10,
    discontinue=3,
    max_iou=0.3,
    min_sim=0.2,
    min_length=5
)
pairs, files_dict, query, reference = None, None, None, None


data_list = []
sim_npy_files = os.listdir(args.input_root)
for sim_npy_file in sim_npy_files:
    file_name = sim_npy_file[:-4]
    query_id = file_name.split('-')[0]
    ref_id = file_name.split('-')[1]
    data_list.append((f"{query_id}-{ref_id}", f"{query_id}-{ref_id}"))

print('len(data_list) = ', len(data_list))
dataset = ItemDataset(data_list,
                      store_type='local',
                      data_type=DataType.NUMPY.type_name,
                      root=args.input_root,
                      trans_key_func=lambda x: x + '.npy',
                      )

print(f"Data to run {len(dataset)}")

loader = DataLoader(dataset, collate_fn=lambda x: x,
                    batch_size=args.batch_size,
                    num_workers=args.data_workers)

model_config = dict()

if args.alignment_method.startswith('TN'):
    model_config = dict(
        tn_max_step=args.tn_max_step, tn_top_k=args.tn_top_K, max_path=args.max_path,
        min_sim=args.min_sim, min_length=args.min_length, max_iou=args.max_iou
    )
else:
    raise ValueError(f"Unknown VTA method: {args.alignment_method}")


model = build_vta_model(method=args.alignment_method, concurrency=args.request_workers, **model_config)

total_result = dict()
for batch_data in loader:
    batch_result = model.forward_sim(batch_data)

    for pair_id, result in batch_result:
        total_result[pair_id] = result

output_store = 'local'
if output_store == 'local' and not os.path.exists(args.output_root):
    os.makedirs(args.output_root, exist_ok=True)
writer = build_writer(output_store, DataType.JSON.type_name)
writer.write(os.path.join(args.output_root, args.result_file), total_result)

len(data_list) =  303
Data to run 303


2022-08-12 07:53:24.078 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 873 for sim 1054x419
2022-08-12 07:53:24.361 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x2097
2022-08-12 07:53:24.387 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 109 for sim 744x6206
2022-08-12 07:53:24.793 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x4001
2022-08-12 07:53:25.239 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 907 for sim 894x2811
2022-08-12 07:53:25.306 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 153 for sim 1054x3929
2022-08-12 07:53:25.375 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 0 for sim 1054x3339
2022-08-12 07:53:25.455 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x1784
2022-08-12 07:53:25.459 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x3611
2022-08-12 07:53:25.573 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 203 for sim 894x5274
2022-08-12 07:53:25.577 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x3491
2022-08-1

2022-08-12 07:53:32.551 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 306 for sim 744x5923
2022-08-12 07:53:32.602 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x200
2022-08-12 07:53:32.833 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 8 for sim 894x6206
2022-08-12 07:53:32.967 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 3 for sim 894x5021
2022-08-12 07:53:33.276 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x31
2022-08-12 07:53:33.282 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 64 for sim 744x61
2022-08-12 07:53:33.320 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x28
2022-08-12 07:53:33.322 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 7 for sim 744x1103
2022-08-12 07:53:33.341 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x400
2022-08-12 07:53:33.391 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 0 for sim 1054x423
2022-08-12 07:53:33.612 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 8 for sim 744x468
2022-08-12 07:53:33.656 | I

2022-08-12 07:53:38.999 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x2320
2022-08-12 07:53:39.029 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x423
2022-08-12 07:53:39.063 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x18
2022-08-12 07:53:39.118 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x577
2022-08-12 07:53:39.354 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x733
2022-08-12 07:53:39.442 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 14 for sim 894x2556
2022-08-12 07:53:39.641 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 74 for sim 1054x5021
2022-08-12 07:53:39.679 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 50 for sim 1054x3731
2022-08-12 07:53:40.154 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x844
2022-08-12 07:53:40.180 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 0 for sim 1054x118
2022-08-12 07:53:40.191 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x1103
2022-08-12 07:53:40.23

2022-08-12 07:53:48.258 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 38 for sim 744x3655
2022-08-12 07:53:48.422 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 9 for sim 744x577
2022-08-12 07:53:48.450 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 5 for sim 744x1784
2022-08-12 07:53:48.588 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 1221 for sim 744x5021
2022-08-12 07:53:48.656 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 0 for sim 1054x380
2022-08-12 07:53:48.720 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 0 for sim 744x156
2022-08-12 07:53:48.755 | INFO     | vcsl.vta:tn:638 - Graph N 3721 E 8 for sim 744x818
2022-08-12 07:53:48.768 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 6 for sim 1054x3966
2022-08-12 07:53:48.906 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 139 for sim 1054x2897
2022-08-12 07:53:48.919 | INFO     | vcsl.vta:tn:638 - Graph N 5271 E 125 for sim 1054x5375
2022-08-12 07:53:49.099 | INFO     | vcsl.vta:tn:638 - Graph N 4471 E 0 for sim 894x2897
2022-08-12 07

10404

### Evaluate F1 metric
We have got an output file in `result_file`, and we provide a st2 ground truth file with the same format in [gt_json](../muscle_vcd/st2/gt_json.json).

We can evaluate our result file with this gt file.

In [10]:
dataset = MUSCLE_VCD(video_root='../muscle_vcd', query='st2')

dataset.evaluate('./note_book_default_pred/muscle-dns_backbone-st2_pairs-dns_sim-TN-pred.json', 'f1')

len of master =  104
len of st1 =  18
len of st2 =  4
len of all_data_file_list =  126
 not in pred key num =  0
finish loading files, start evaluation...
Feature ./note_book_default_pred/muscle & VTA dns_backbone: 
Overall segment-level performance, Recall: 89.40%, Precision: 94.32%, F1: 91.79%, 
video-level performance, FRR: 0.66%, FAR: 0.66%, 
query set cnt 3, query macro-Recall: 99.27%, query macro-Precision: 99.61%, F1: 99.44%, 
