# How to Build a Video Segment Copy Detection System

In the previous example we have demonstrated how to use milvus and towhee to build a simple video copy detection system at video level. In this tutorial, we will demonstrate video copy detection down to segment level. 

**What's the differents between video-level deduplication and segment-level deduplication?**


- Video-level deduplication is a method for situations with high repetition. It finds duplicate videos by comparing the similarity between the embeddings of the whole video. Since only one embedding is extracted from a video, this method works faster. But the limitation of this method is also obvious: it is not good for detecting similar videos of different lengths. For example, the first quarter of video A and video B are exactly the same, but their embeddings may not be similar. In this case, it is obviously impossible to detect infringing content.
 
- Segment-level deduplication detects the specific start and end times of repeated segments, which can handle complex clipping and insertion of video segments as well as situations where the video lengths are not equal. It does so by comparing the similarity between video frames. Obviously, we need to use this method in the actual task of mass video duplication checking. Of course, the speed of this method will be slower than the one of video level.

**What are Milvus & Towhee?**

- [Milvus](https://milvus.io/) is the most advanced open-source vector database built for AI applications and supports nearest neighbor embedding search across tens of millions of entries.
- [Towhee](https://towhee.io/) is a framework that provides ETL for unstructured data using SoTA machine learning models.

In this tutorial, we will demonstrate video duplication detection at segment level using Towhee and Milvus. Moreover, we managed to make the core functionality as simple as few lines of code, with which you can start hacking your own video deduplication engine.


## Preparation

### Install packages

Make sure you have installed required python packages:

| package |
| -- |
| towhee |
| pillow |
| ipython |
| numpy |
| plyvel / happybase |


In [1]:
! python -m pip install -q towhee pillow ipython numpy plyvel happybase

### Prepare the data

First, we need to prepare the dataset and Milvus environment.   

The VCDB core dataset is almost all full video length repetition, which is not suitable for the evaluation of segment repetition detection technology. In this tutorial, we use [VCSL dataset](https://arxiv.org/abs/2203.02654). 

VCSL is a large-scale real dataset for video duplication detection from Youtube and Bilibili. Unlike VCDB, its cheating changes to video frames are more complex, including crop, filter, text overlay, background, cam-cording, picture in picture, even recent deepfake, etc. there are a wide range of content transformations among over 280k segment copies in VCSL, and these realistic skillful transformations bring great challenges to segment-level copy detection. 

In this tutorial, we use only a mini set of VCSL. It contains 5 events with three videos in each event, which are copies of each other. There is also a broken video in folder `crashed_video` for robustness testing.

In [2]:
! curl -L https://github.com/towhee-io/examples/releases/download/data/VCSL-demo-en.zip -O
! unzip -q -o VCSL-demo-en.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  399M  100  399M    0     0  7072k      0  0:00:57  0:00:57 --:--:-- 10.7M    0  4762k      0  0:01:25  0:00:10  0:01:15 7573k  0     0  6269k      0  0:01:05  0:00:30  0:00:35 6113k6330k      0  0:01:04  0:00:32  0:00:32 7155k7  0:00:14 5008k


The directory structure of this demo dataset is like this:
```
./VCSL-demo-en/
├── animation
│   ├── 1b4e048011714eab928650f64e6370a6-123-JqSBkvPsYBw.mkv
│   ├── 043bee7a71f347f18e8576bb1a01c86b-Langrisser-y3qAPvWnL18.mkv
│   └── 1134884a6c544a00b3bce99457a2a98d-LANGRISSER_Mobile_OP-EA4PmSWGr8o.mp4
├── baisui_mountain
│   ├── 41c4eaced0d24ebba50d180026531025-Baisui_Mountain_commercial-1UJ411s7we.flv
│   ├── 217a12c936414660a53b55b22e2aea59-20200607Baisui_Mountain-1kf4y127GB.flv
│   └── 03584e404c0847fcbe5f9c486e8f8fc7-Baisui_Mountain_means_this-1UE411V74D
├── madongmei
│   ├── 8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv
│   ├── 0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv
│   └── ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv
├── the_wandering_earth
│   ├── d62ce5becff14a0c9c7dab5eea6647dc_the_wandering_earth_Wu_Jing_became_teenage_idol-1qf4y1G7gM.flv
│   ├── e5dc80abd7a24b47accde190c9fdbcdc-The_Wandering_Earth_is_on_CCTV-1db411m7f8.flv
│   └── ef65e0f662e646a88a13b6eddb640e48-News_Broadcast_on_Wandering_Earth-1xb411U7uE.flv
└── yellow_alien
    ├── 0fba4debd48245699971c4c18608cde3-Yellow_skinned_alien_low_cost_production-1vP4y1s79h.flv
    ├── 9ec6f467472a428e91fd45bb688e3be1-drunken_square_yellow_alien-1uU4y1E7n6.flv
    └── 01414786cbe74d2d95e2b71df26f39a3-Golden_Wheel_It_doesnt_matter_anymore-1sh411B7g1.flv
```

Define some helper function to convert video to gif so that we can have a look at these videos.   

In [1]:
from IPython import display
from pathlib import Path
import towhee
from towhee import pipe, ops
from PIL import Image as PILImage
import os

def display_gif(video_path_list, text_list):
    html = ''
    for video_path, text in zip(video_path_list, text_list):
        html_line = '<img src="{}"> {} <br/><br/>'.format(video_path, text)
        html += html_line
    return display.HTML(html)

def convert_video2gif(video_path, output_gif_path, start_time=0.0, end_time=1000.0, num_samples=16):
    p = (
        pipe.input('video_file')
        .flat_map('video_file', 'frame', ops.video_decode.ffmpeg(start_time=start_time, end_time=end_time, sample_type='time_step_sample', args={'time_step': 3}))
             .output('frame')
    )
    frames = p(video_path).to_list()
    imgs = [PILImage.fromarray(frame[0]) for frame in frames]
    imgs = [img.resize((int(img.width/6), int(img.height/6)), PILImage.NEAREST) for img in imgs]
    imgs[0].save(fp=output_gif_path, format='GIF', append_images=imgs[1:], save_all=True, loop=0)


def display_gifs_from_video(video_path_list, text_list, start_time_list = None, end_time_list = None, tmpdirname = './tmp_gifs'):
    Path(tmpdirname).mkdir(exist_ok=True)
    gif_path_list = []
    for i, video_path in enumerate(video_path_list):
        video_name = str(Path(video_path).name).split('.')[0]
        gif_path = Path(tmpdirname) / (video_name + '.gif')
        if start_time_list is not None:
            convert_video2gif(video_path, gif_path, start_time=start_time_list[i], end_time=end_time_list[i])
        else:
            convert_video2gif(video_path, gif_path)
        gif_path_list.append(gif_path)
    return display_gif(gif_path_list, text_list)

In [12]:
import random
random.seed(9)
vcsl_demo_root = './VCSL-demo-en/'

event_list = os.listdir(vcsl_demo_root)

random_event = random.choice(event_list)
random_event_folder = os.path.join(vcsl_demo_root, random_event)
random_event_videos = [os.path.join(random_event_folder, video_file) for video_file in os.listdir(random_event_folder)]
tmpdirname = './tmp_gifs'
display_gifs_from_video(random_event_videos, random_event_videos, tmpdirname=tmpdirname)

In [3]:
import towhee

os.environ["CUDA_VISIBLE_DEVICES"] = '1'

def merge_ndarray(x):
    import numpy as np
    return np.concatenate(x).reshape(-1, x[0].shape[0])

### Setup Milvus and create a Milvus Collection

The last thing to be prepared is Milvus. For more options & detailed instructions, you can refer to [Milvus doc](https://milvus.io). If you need more help for Milvus, feel free to submit tickets or join discussion in [Milvus github](https://github.com/milvus-io/milvus).

Please make sure that you have started a [Milvus service](https://milvus.io/docs/install_standalone-docker.md). This notebook uses [milvus 2.2.10](https://milvus.io/docs/v2.2.x/install_standalone-docker.md) and [pymilvus 2.2.11](https://milvus.io/docs/release_notes.md#2210).

In [4]:
# Download docker yaml for Milvus standalone
! wget https://github.com/milvus-io/milvus/releases/download/v2.2.10/milvus-standalone-docker-compose.yml -O docker-compose.yml
# Run command below under the same directory as the docker yaml
! docker-compose up -d
# Install pymilvus
! python -m pip install pymilvus==2.2.11

Let's first create a `video_deduplication` collection that uses the [L2 distance metric](https://milvus.io/docs/metric.md#Euclidean-distance-L2) and an [IVF_FLAT index](https://milvus.io/docs/index.md#IVF_FLAT).

In [2]:
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

connections.connect(host='127.0.0.1', port='19530')


def create_milvus_collection(collection_name, dim):
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)

    fields = [
        FieldSchema(name='id', dtype=DataType.INT64, descrition='the id of the embedding', is_primary=True, auto_id=True),
        FieldSchema(name='path', dtype=DataType.VARCHAR, descrition='the path of the embedding', max_length=500),
        FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='video embedding vectors', dim=dim)
    ]
    schema = CollectionSchema(fields=fields, description='video dedup')
    collection = Collection(name=collection_name, schema=schema)

    index_params = {'metric_type': 'IP', 'index_type': "IVF_FLAT", 'params': {"nlist": 1}}
    collection.create_index(field_name="embedding", index_params=index_params)
    return collection

collection = create_milvus_collection('video_copy_detection', 256)

## Video Copy Detection

In this section, we'll show how to build our Video Copy Detection engine using Milvus. The basic idea behind Video Copy Detection is the extract embeddings from videos using Deep Neural Network and store them in Milvus, then get query videos embeddings and compare with those stored in Milvus.

We use [Towhee](https://towhee.io/), a machine learning framework that allows for creating data processing pipelines. [Towhee](https://towhee.io/) also provides predefined operators which implement insert and query operation in Milvus.


### Load Video Embeddings into Milvus

For every video, we decode it to image frames, and then using neural network to extract their embeddings. We insert them to Milvus and levelDB for storage.
![](video_decopy_insert.png)


In [6]:
%%time
from towhee import pipe, ops
import glob

emb_pipe = (
    pipe.input('url')
        .flat_map('url', 'frames', ops.video_decode.ffmpeg(sample_type='time_step_sample', args={'time_step': 1}))
        .map('frames', 'emb', ops.image_embedding.isc())
        .map(('url', 'emb'), 'insert_res', ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='video_copy_detection'))
        .window_all('emb', 'video_emb', merge_ndarray)
        .map(('url', 'video_emb'), ('url_vec_status'), ops.kvstorage.insert_hbase('127.0.0.1', 9090, 'video_copy_detection'))
        # .map(('url', 'video_emb'), ('url_vec_status'), ops.kvstorage.insert_leveldb('url_vec.db'))
        .output()
)

path = glob.glob('VCSL-demo-en/*/*')
for i in path:
    result = emb_pipe(i)

del emb_pipe

CPU times: user 7min 1s, sys: 9.84 s, total: 7min 11s
Wall time: 3min 12s


### Query videos

In theory, for each query video, it is necessary to match and retrieve all the videos in the database, which will cause huge overhead. In this tutorial, we perform a rough video selection which filter the videos with low similarity to solve this problem. 
 
First, for every query frame, we retrieve a certain number of similar frames through Milvus, which match for a specified video. The videos of these frames are then aggregated, sorted, and filtered. Then, the video embeddings of remaining videos and embedding of the query video are processed for localizing copyed segments. In this way, we can filter out videos with low similarity, saving a lot of computation for the whole pipeline.
![](video_decopy_query.png)

In [7]:
%%time
from towhee.datacollection import DataCollection
collection.load()

search_pipe = (
    pipe.input('url')
        .flat_map('url', 'frames', ops.video_decode.ffmpeg(sample_type='time_step_sample', args={'time_step': 1}))
        .map('frames', 'emb', ops.image_embedding.isc())
        .flat_map('emb', 'res', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='video_copy_detection', limit=5, output_fields=['path'], metric_type='IP'))
        .window_all('res', ('retrieved_urls', 'score'), lambda x: ([i[2] for i in x], [i[1] for i in x]))
        .window_all('emb', 'video_emb', merge_ndarray)
        .flat_map(('retrieved_urls','score'), 'candidates', ops.video_copy_detection.select_video(top_k=5, reduce_function='sum', reverse=True))
        # .map('candidates', 'retrieved_emb', ops.kvstorage.from_leveldb(path = 'url_vec.db', is_ndarray=True))
        .map('candidates', 'retrieved_emb', ops.kvstorage.search_hbase('127.0.0.1', 9090, 'video_copy_detection', is_ndarray=True))
        .map(('video_emb', 'retrieved_emb'), ('similar_segment', 'segment_score'), ops.video_copy_detection.temporal_network(min_length=1))
        .output('url', 'candidates', 'similar_segment', 'segment_score')
)  

path = glob.glob('VCSL-demo-en/madongmei/*')
for i in path:
    result = search_pipe(i)
    DataCollection(result).show()

del search_pipe

url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[0, 0, 52, 52]] len=1",[1.0192308185192256] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[0, 1, 19, 25]] len=1",[0.6052339520565299] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[0, 143, 22, 164]] len=1",[0.7498454105022342] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/baisui_mountain/03584e404c0847fcbe5f9c486e8f8fc7-Baisui_Mountain_means_this-1UE411V74D.flv,,
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/baisui_mountain/217a12c936414660a53b55b22e2aea59-20200607Baisui_Mountain-1kf4y127GB.flv,,


url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[0, 0, 167, 167]] len=1",[1.0059880678525228] len=1
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[136, 0, 164, 25],[138, 43, 166, 79]] len=2","[0.45078764096745905,0.2279533192049712] len=2"
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[141, 0, 164, 21]] len=1",[0.695049833167683] len=1
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/baisui_mountain/03584e404c0847fcbe5f9c486e8f8fc7-Baisui_Mountain_means_this-1UE411V74D.flv,,
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/the_wandering_earth/ef65e0f662e646a88a13b6eddb640e48-News_Broadcast_on_Wandering_Earth-1xb411U7uE.flv,,


url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[0, 0, 83, 83]] len=1",[1.012048258120755] len=1
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[1, 1, 25, 19],[23, 0, 76, 23]] len=2","[0.6449160008203416,0.3358775589026903] len=2"
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[1, 144, 25, 164],[40, 150, 83, 166]] len=2","[0.6091001223434102,0.3373360962180768] len=2"
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/yellow_alien/0fba4debd48245699971c4c18608cde3-Yellow_skinned_alien_low_cost_production-1vP4y1s79h.flv,,


CPU times: user 3min 5s, sys: 1min 14s, total: 4min 20s
Wall time: 49.8 s


For each frame of each query video, we query the most similar frame information of 5 frames from Milvus. We aggregate and sort this information, and select candidate videos with topk=5. For this query video and the corresponding 5 candidate videos, `temporal_network` calculation is performed, and finally the detected duplicate segments are obtained. 

Note that our query uses the same dataset, in which there are 5 events, and each event has 3 videos, which are copies of each other. Using this dataset to query itself means that for each video, the correct query result should be the three videos under its own event. 

The output `similar_segment` column is detected segments list, which format is list of `[query_start_second, ref_start_second, query_end_second, ref_end_second]`. And `segment_score` column is the corresponding similarity score of each segment. We can observe that each query video does detect only 3 results of its own event, which is consistent with ground truth.
 
Let's take the result of the following line as an example, with `similar_segment` = [0, 143, 22, 164], indicating that in the query video, from 0 to 22 seconds, and the ref video from 143 to 164 seconds are repeated. We can display these clips.
![example](example.png)

In [3]:
event_videos = ['VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv',
               'VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv']
tmpdirname2 = './tmp_gifs2'

display_gifs_from_video(event_videos, event_videos, start_time_list=[0, 143], end_time_list=[22, 164], tmpdirname=tmpdirname2)

  imgs = [img.resize((int(img.width/6), int(img.height/6)), PILImage.NEAREST) for img in imgs]


### Options

For image embedding operator and kv storage operator, some options are provided:

- Image embedding operator: Check [Towhee Image Embedding](https://towhee.io/tasks/detail/operator?field_name=Computer-Vision&task_name=Image-Embedding) for more pre-trained models Towhee encapsulates as operators. For video copy detection task, we recommend `ISC`. `ISC` is a pre-trained model that works pretty well for such task. Besides, we support following models from timm:

	'isc',
	'gmixer_24_224',
	'resmlp_12_224',
	'coat_lite_mini',
	'deit_small_patch16_224',
	'pit_xs_224',
	'convit_small',
	'tnt_s_patch16_224',
	'pit_ti_224',
	'resmlp_36_distilled_224',
	'convit_tiny',
	'coat_lite_small',
	'coat_lite_tiny',
	'deit_tiny_patch16_224',
	'cait_xxs24_224',
	'cait_s24_224',
	'cait_xxs36_224',
	'vit_small_patch32_224',
	'vit_small_patch32_384',
	'vit_small_r26_s32_224',
	'vit_small_patch16_224'.

	Note that the vector dimension of the milvus collection we create before should change accordingly depends on the output embedding shape of the embedding model (e.g. 384 for `gmixer_24_224`, 256 for `ISC`);


- kv storage: Check [Towhee kv storage](https://towhee.io/kvstorage) for different kv database. If one wants to run pipeline on some large dataset, we recommend `hbase`, otherwise `leveldb` would be enough. If we choose leveldb as kv database, we need to del the pipeline after running to release the leveldb lock.

### Towhee Built-in Pipeline

For users's convenience, Towhee has excapsulate several built-in pipelines including `video_embedding` and `video_copy_detection`. So one can create and run the pipeline above within a few lines of code.

In [1]:
from towhee import AutoPipes, AutoConfig

emb_conf = AutoConfig.load_config('video_embedding')
emb_conf.collection = 'video_copy_detection'
# emb_conf.leveldb_path = 'url_vec.db'
emb_conf.hbase_table = 'video_copy_detection'
emb_conf.device = 0
emb_pipe = AutoPipes.pipeline('video_embedding', emb_conf)

path = glob.glob('VCSL-demo-en/*/*')

for i in path:
    result = emb_pipe(i)

del emb_pipe

In [6]:
from towhee.datacollection import DataCollection

connections.connect(host='127.0.0.1', port='19530')
Collection('video_copy_detection').load()

search_conf = AutoConfig.load_config('video_copy_detection')
search_conf.collection = 'video_copy_detection'
# search_conf.leveldb_path = 'url_vec.db'
search_conf.hbase_table = 'video_copy_detection'
search_conf.topk = 5
search_conf.device = 0
search_pipe = AutoPipes.pipeline('video_copy_detection', search_conf)

path = glob.glob('VCSL-demo-en/madongmei/*')

for i in path:
    result = search_pipe(i)
    DataCollection(result).show()

del search_pipe

url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[0, 1, 19, 25]] len=1",[0.6052339409672937] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[0, 0, 52, 52]] len=1",[1.0192307784007146] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[0, 143, 22, 164]] len=1",[0.7498453827791436] len=1
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/animation/043bee7a71f347f18e8576bb1a01c86b-Langrisser-y3qAPvWnL18.mkv,,
VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,VCSL-demo-en/the_wandering_earth/d62ce5becff14a0c9c7dab5eea6647dc_the_wandering_earth_Wu_Jing_became_teenage_idol-1qf4y1G7gM.flv,,


url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[0, 0, 167, 167]] len=1",[1.0059880193122133] len=1
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[136, 0, 164, 25],[138, 43, 166, 79]] len=2","[0.45078762972129965,0.22795329638756812] len=2"
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[141, 0, 164, 21]] len=1",[0.6950498060746626] len=1
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/animation/043bee7a71f347f18e8576bb1a01c86b-Langrisser-y3qAPvWnL18.mkv,,
VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,VCSL-demo-en/the_wandering_earth/ef65e0f662e646a88a13b6eddb640e48-News_Broadcast_on_Wandering_Earth-1xb411U7uE.flv,,


url,candidates,similar_segment,segment_score
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,"[[0, 0, 83, 83]] len=1",[1.0120482329862663] len=1
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/0640bd5d43d1499c962e275be6b804ef-Does_MaDongmei_live_here-1e64y1y799.flv,"[[1, 144, 25, 164],[40, 150, 83, 166]] len=2","[0.609100111506202,0.3373360800541053] len=2"
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/madongmei/ad244c924f31461a9d809c77ae251ac1-the_classic_dialogue_what_is_Ma_Mei-1y7411n7y1.flv,"[[1, 1, 25, 19],[23, 0, 76, 23]] len=2","[0.6449159866287595,0.3358775534127888] len=2"
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/animation/043bee7a71f347f18e8576bb1a01c86b-Langrisser-y3qAPvWnL18.mkv,,
VCSL-demo-en/madongmei/8ad81fc9fe0a47dbaab1b4cdc40bf07b-go_and_cool_off_divine_comedy_Ma_Dongmei-1t54y117JK.flv,VCSL-demo-en/animation/1b4e048011714eab928650f64e6370a6-123-JqSBkvPsYBw.mkv,,


In [7]:
import shutil, os
from pathlib import Path

shutil.rmtree('VCSL-demo-en')
if Path('url_vec.db').exists():
	shutil.rmtree('url_vec.db')
os.remove('VCSL-demo-en.zip')