# Sponsor content detection in YouTube videos
## Transfomers for binary text classification
This notebook seeks to accomplish the task of sponsored-content detection using a binary text classification model. The text classification model is created by fine-tuning a DistilBERT pre-trained model.

## Motivation
Several similar projects based on a BERT-type text classification model have been written about in on the Internet. Unfortunately, in both instances the authors do not share details about the performance of the model. Instead, they used vague language like "95% accuracy" without qualifying that in any meaningful way. What is more, the trained models in both instances then demonstrably perform poorly in the downstream task of task classification, but no exact numbers are reported. 

We wanted to investigate how well a text classification model can perform on what is essentially a span extraction task.

In [7]:
import os
import sys

import numpy as np
import torch
from datasets import Dataset, IterableDataset, IterableDatasetDict, ClassLabel, load_dataset, load_from_disk, load_metric
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, DataCollatorWithPadding
import pandas as pd
import pyarrow as pa

sys.path.append(os.path.dirname(os.path.realpath('..')))
from data_loader import load_examples_from_chunks, load_captions_from_chunks

os.environ["WANDB_DISABLED"] = "true"

# Prepare the data

Read the transcripts from the `data.N.json.gz` and extract examples using `load_examples_from_chunks`. 

In [2]:
LABELS = {
    'content': 0,
    'sponsor': 1,
}

def load_examples(chunks=None):
    for example, label in load_examples_from_chunks(base_name='data', root_dir='./', chunks=chunks):
        yield example, LABELS[label]

def iterable_to_pandas(columns, iterable, max_length):
    from tqdm.auto import tqdm
    df = pd.DataFrame(columns=columns)
    for item in tqdm(iterable, total=max_length):
        df.loc[len(df)] = item
    
    return df

# Save prepared data to disk
The dataset returned by `load_examples_from_chunks` is much smaller than the original ~10 GiB dataset because it does not include full video transcripts. We read this whole thing into memory into a pandas `DataFrame` and then save it to disk for further use. Loading the dataset into memory makes it easier to work with. 

In [3]:
import itertools
for x in itertools.islice(load_examples(), 0, 50):
    print(x)

[34mOpening ./data.1.json.gz for reading...[0m


("a sponsor of this video I work with pet flow because I think they really do offer a way to make your life better and easier so needless to say I think you should get your dog food from pet flow the great thing about them is that you can go and you order your dog food one time and then it's just automatically there whenever you need it you just select how often you want to deliver they save you the hassle of having to drive to the store every week or two to get your dog food I love that they and you guys support content like this because I think it's so important now I'll have their link in the description along with a coupon code that will give you an awesome discount on your first order did you know that", 1)
("puppies and their parents by making a contribution of any amount you'd like to our patreon campaign setup automatic pet food delivery with Peplow I'll have a link in the description as well as a coupon code that'll give you a terrific discount on your first order see you guys

In [4]:
df = iterable_to_pandas(['text', 'label'], load_examples(range(1, 16)), 16 * 20_000)

  0%|          | 0/320000 [00:00<?, ?it/s]

[34mOpening ./data.1.json.gz for reading...[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for -9UVTcimhZY[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for -qzgXC6ZF4s[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for -VfozgRVt8E[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for -XWMXrTfK4Q[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for 04X5x4LDEDc[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for 0CNdSMy2COs[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored segment for 0VSRMRh8fEs[0m
[34mClosed ./data.1.json.gz.[0m
[34mOpening ./data.2.json.gz for reading...[0m
[33mCould not find a non-sponsored segment with the same duration as the sponsored 

NOTE: The above progress bar was out of 320,000 because that was the realistically maximum number of samples that we could get given the dataset that we have. The red color is not an indicator of failure.

In [5]:
Dataset.from_pandas(df).remove_columns('__index_level_0__').save_to_disk('./classification-dataset')

# Read prepared data
Read the prepared dataset using the 🤗 API. 

In [4]:
raw_datasets = load_from_disk('./classification-dataset').train_test_split(test_size=0.2)
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 172993
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 43249
    })
})

In [5]:
raw_datasets['test'][:30]

{'text': ["my laptop that i used to record audio bricked itself so i'm recording in here hi welcome back twins you're probably thinking from the title of this video can he just say that twins weird amount and yes i can because if you didn't know i'm a twin now every time i've told somebody i'm a twin in my whole life i have to give a very important distinction right away i am not an identical twin i am a fraternal twin identical twins in my opinion and no hate to you guys fraternal twins for life i have to make that important distinction because the first time i tell somebody that i'm a twin their first thought is like oh is there another one of you creeping around like you do on the streets have i talked to you and thought it was you but it was your brother no you haven't uh this is my brother tony we look like brothers and we don't look alike sometimes people look at us and go i can i can kind of see it and what you're seeing is us being related if you can kind of see a resemblance b

In [6]:
# If we've arrived here, everything with the dataset is okay and it has been stored to disk. We
# can drop the in-memory `DataFrame` we constructed originally. 
df = None

# Tokenize inputs
Tokenize the datatset with the pre-trained tokenizer. Sequences are padded to the maximum length supported by BERT and truncated if longer.

In [14]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

In [8]:
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

  0%|          | 0/173 [00:00<?, ?ba/s]

  0%|          | 0/44 [00:00<?, ?ba/s]

In [9]:
cleaned_datasets = tokenized_datasets.remove_columns(['text'])
train_dataset = cleaned_datasets['train']
test_dataset = cleaned_datasets['test']

# Prepare for training
Set training parameters, configure metrics, etc.

In [10]:
torch.cuda.empty_cache()
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
training_args = TrainingArguments(
    output_dir="distilbert-classification-uncased", 
    per_device_train_batch_size=48, 
    per_device_eval_batch_size=48,
    save_total_limit=2, 
    save_strategy='epoch',
    evaluation_strategy='epoch')

accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric = load_metric("recall")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels)
    recall = recall_metric.compute(predictions=predictions, references=labels)
    return {**accuracy, **precision, **recall}

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'pre_classifi

# Train the model ⚡
We're using the default number of batches, but we terminate the training early because we observe that the model performs extremely well on all metric on the test dataset and because the training loss and validation loss are comparable after step 30,000, indicating that there is not too much over- or under-fitting, and that the model is not likely to learn anything else.

In [11]:
trainer.train()

***** Running training *****
  Num examples = 172993
  Num Epochs = 3
  Instantaneous batch size per device = 48
  Total train batch size (w. parallel, distributed & accumulation) = 48
  Gradient Accumulation steps = 1
  Total optimization steps = 10815


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall
1,0.1408,0.131712,0.951745,0.962096,0.940169
2,0.0956,0.137763,0.95512,0.957821,0.95182
3,0.0509,0.155389,0.956762,0.966651,0.945832


***** Running Evaluation *****
  Num examples = 43249
  Batch size = 48
Saving model checkpoint to distilbert-classification-uncased/checkpoint-3605
Configuration saved in distilbert-classification-uncased/checkpoint-3605/config.json
Model weights saved in distilbert-classification-uncased/checkpoint-3605/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 43249
  Batch size = 48
Saving model checkpoint to distilbert-classification-uncased/checkpoint-7210
Configuration saved in distilbert-classification-uncased/checkpoint-7210/config.json
Model weights saved in distilbert-classification-uncased/checkpoint-7210/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 43249
  Batch size = 48
Saving model checkpoint to distilbert-classification-uncased/checkpoint-10815
Configuration saved in distilbert-classification-uncased/checkpoint-10815/config.json
Model weights saved in distilbert-classification-uncased/checkpoint-10815/pytorch_model.bin
Deleting older checkpo

TrainOutput(global_step=10815, training_loss=0.10587699262290104, metrics={'train_runtime': 4844.9158, 'train_samples_per_second': 107.118, 'train_steps_per_second': 2.232, 'total_flos': 6.874779808709222e+16, 'train_loss': 0.10587699262290104, 'epoch': 3.0})

```
Epoch	Training Loss	Validation Loss	Accuracy	Precision	Recall
    1	0.140800	    0.131712	    0.951745	0.962096	0.940169
    2	0.095600	    0.137763	    0.955120	0.957821	0.951820
    3	0.050900	    0.155389	    0.956762	0.966651	0.945832
```
We chose to use the model trained after 2 epochs because 3 seems to overfit the training set.

In [9]:
model = None
trainer = None
trained = None
torch.cuda.empty_cache()

def softmax_outputs(outputs) -> dict:
    return torch.nn.functional.softmax(outputs.logits, dim=-1)[0].tolist()

trained = AutoModelForSequenceClassification.from_pretrained('./distilbert-classification-uncased/checkpoint-7210')
trained.to('cuda')

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0): TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
       

# Evaluate
And find best window_duration.

In [10]:
import itertools
from collections import defaultdict

from data_loader import Caption, load_captions_from_chunks, segment_text, get_intersection_range

def caption_times(c):
    return c.start, c.end

def prediction_times(p):
    return tuple(p[0])

def tumbling_time_window(captions, duration, key=caption_times):
    results = [captions[0]]
    for caption in captions:
        if key(results[-1])[1] - key(results[0])[0] <= duration:
            results.append(caption)
        else:
            yield results
            results = [caption]

    yield results
    
def session_time_window(captions, duration, key=caption_times):
    captions_iter = iter(captions)
    results = [next(captions_iter)]
    for caption in captions_iter:
        if key(results[-1])[1] - key(caption)[0] <= duration:
            results.append(caption)
        else:
            yield results
            results = [caption]

    yield results

def batch(iterable, n):
    length = len(iterable)
    for i in range(0, length, n):
        yield iterable[i:min(i + n, length)]
        
def decode_label(outputs):
    content, sponsor = outputs
    
    prediction_dict = {'sponsor': sponsor, 'content': content}
    prediction_dict = {k: v for k, v in sorted(prediction_dict.items(), key=lambda item: item[1], reverse=True)}

    return next(iter(prediction_dict.items()))
        
def predict_in_batches(texts, batch_size: int = 8):    
    batches = list(batch(texts, batch_size))
    for b in batches:
        inputs = defaultdict(list)
        for text in b:
            tokenized = tokenize_function({ 'text': text })
            for k, v in tokenized.items():
                inputs[k].append(v)
            
        inputs = { k: torch.tensor(v).cuda() for k, v in inputs.items() }
        outputs = trained(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1).tolist()
        yield from predictions
        
def predict_sponsor_segments(captions, window_duration=10):
    windows = list(tumbling_time_window(captions, window_duration))
    window_texts = [segment_text(window) for window in windows]
    predictions = predict_in_batches(window_texts, 4)
    
    for window, text, prediction in zip(windows, window_texts, predictions):
        yield [window[0].start, window[-1].end], text, *decode_label(prediction)
        
def merge_prediction_(predictions):
    assert len(set((label for _, _, label, _ in predictions))) == 1
    # All co-occurring predictions have the same label so we merge them
    merged_start, merged_end = predictions[0][0][0], predictions[-1][0][1]
    merged_text = ' '.join((text for _, text, _, _ in predictions))
    # Don't know what the correct way to compute the joint probability here is,
    # just assume they are independent; We don't really use this number anywhere
    prob = np.prod([prob for _, _, _, prob in predictions])
    return [merged_start, merged_end], merged_text, predictions[0][2], prob

def merge_predictions(predictions, within_duration=5):
    for co_occuring in session_time_window(predictions, within_duration, key=prediction_times):
        merged = [co_occuring[0]]
        for times, text, label, prob in co_occuring[1:]:
            _, _, prev_label, _ = merged[0]
            if label == prev_label:
                merged.append((times, text, label, prob))
            else:
                yield merge_prediction_(merged)
                merged = [(times, text, label, prob)]
        
        if len(merged) > 0:
            yield merge_prediction_(merged)
        

In [11]:
import itertools

def range_equals(left: 'Tuple[float, float]', right: 'Tuple[float, float]', eps: float) -> bool:
    left_start, left_end = left
    right_start, right_end = right
    
    return (abs(left_start - right_start) <= eps
        and abs(left_end - right_end) <= eps)

def count_range_equals(pairs, eps: float) -> int:
    cnt = 0
    for left, right in pairs:
        if range_equals(left, right, eps):
            cnt += 1
    return cnt

assert range_equals([0, 5], [0, 5], eps=0)
assert range_equals([1, 6], [0, 5], eps=1)
assert range_equals([-1, 4], [0, 5], eps=1)
assert not range_equals([-2, 4], [0, 5], eps=1)
assert not range_equals([1, 7], [0, 5], eps=1)

def range_negation(base: 'Tuple[float, float]', ranges: 'List[Tuple[float, float]]') -> 'List[Tuple[float, float]]':
    """
    base:    |-------------|
    ranges:  | ***   **    |
    Return:  |#   ###  ####|
    """
    results = []
    last_end = base[0]
    for r in ranges:
        if last_end != r[0]:
            results.append((last_end, r[0]))
        last_end = r[1]
    if last_end != base[1]:
        results.append((last_end, base[1]))
        
    return results
    
assert range_negation((2, 10), [(3,4), (5, 6)]) == [(2, 3), (4, 5), (6, 10)]
assert range_negation((2, 6), [(3,4), (5, 6)]) == [(2, 3), (4, 5)]
assert range_negation((3, 6), [(3,4), (5, 6)]) == [(4, 5)]

In [25]:
from termcolor import colored

def create_labels_from_range(captions, sponsor_ranges):
    caption_labels = np.zeros(len(captions), dtype=bool)
    for start_idx, end_idx in sponsor_ranges:
        if start_idx is None or end_idx is None:
            continue
        for i in range(start_idx, end_idx + 1):
            caption_labels[i] = True

    token_labels = []
    for i, caption in enumerate(captions):
        num_tokens = len(caption.text.split())
        token_labels.extend([caption_labels[i]] * num_tokens)
    return token_labels

def create_labels_from_times(captions, sponsor_times):
    ranges = [get_intersection_range(captions, *pair[1]) for pair in sponsor_times]
    return create_labels_from_range(captions, ranges)

def evaluate(videos, eps=5, window_duration=10):
    from tqdm.auto import tqdm
    
    predicted_labels = np.empty(0)
    actual_labels = np.empty(0)
    # Values for our close match metric (exact match with threshold)
    # Number of maches
    close_matches = 0
    # Number of predicted ranges
    total_predicted_ranges = 0
    
    for video_id, captions, sponsor_ranges in tqdm(videos):
        print(colored(f'{video_id} {sponsor_ranges}', None, 'on_magenta'))
        sponsor_times = [(captions[start].start, captions[end].end) for start, end in sponsor_ranges]
        predicted_sponsor_times = []

        for times, text, label, prob in merge_predictions(predict_sponsor_segments(captions, window_duration), window_duration):
            if label == 'sponsor':
                predicted_sponsor_times.append((f'{int(prob * 100)}%', times))

            color = { 'sponsor': 'yellow', 'content': None }[label]
            # print(colored(f'{int(prob * 100)}% {times[0]} <--> {times[1]} {text}', color=color))
            
            if any((range_equals(times, actual_times, eps) for actual_times in sponsor_times)):
                close_matches += 1
            total_predicted_ranges += 1

        predicted_sponsor_ranges = [get_intersection_range(captions, *pair[1]) for pair in predicted_sponsor_times]
        predicted_labels = np.append(predicted_labels, create_labels_from_range(captions, predicted_sponsor_ranges))
        actual_labels = np.append(actual_labels, create_labels_from_range(captions, sponsor_ranges))
        
        print(f'\tPredicted={predicted_sponsor_ranges},\n\tExpected={sponsor_ranges}')
    
    from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, precision_recall_curve, roc_curve
    
    close_match_score = close_matches / total_predicted_ranges
    print(f'Exact match (with {eps}s threshold)', close_match_score)
    print('Confusion matrix', confusion_matrix(actual_labels, predicted_labels))
    print('Accuracy', accuracy_score(actual_labels, predicted_labels))
    print('Precision', precision_score(actual_labels, predicted_labels))
    print('Recall', recall_score(actual_labels, predicted_labels))
    print('P@R', precision_recall_curve(actual_labels, predicted_labels))
    print('RoC', roc_curve(actual_labels, predicted_labels))

In [31]:
test_videos = list(itertools.islice(load_captions_from_chunks('data', './', [16]), 1, 200))

[34mOpening ./data.16.json.gz for reading...[0m


Dropping YzgTMh21zhI because sponsor times do not match the captions
Dropping yzhnRt6ZDKM because sponsor times do not match the captions
Dropping YZhVE7X0zwk because sponsor times do not match the captions
Dropping yzJokj2gelY because sponsor times do not match the captions
Dropping YzMAxdSdkzo because sponsor times do not match the captions
Dropping YZMrBCxarlk because sponsor times do not match the captions
Dropping YzreeM882Yw because sponsor times do not match the captions
Dropping YZRxX1XYgoQ because sponsor times do not match the captions
Dropping yZx29Zv6H74 because sponsor times do not match the captions
Dropping YZxRPdlozIA because sponsor times do not match the captions
Dropping yZZy57v-e1o because sponsor times do not match the captions
Dropping y_j04hL_-88 because sponsor times do not match the captions
Dropping z-0cQSmSZvA because sponsor times do not match the captions
Dropping Z-U_eZaTZ6k because sponsor times do not match the captions
Dropping z011vxNsnaM because spons

In [32]:
evaluate(
    videos=test_videos,
    window_duration=5,
    eps=5,
)

  0%|          | 0/199 [00:00<?, ?it/s]

[45mYzGt5abcfbA [[21, 46]][0m
	Predicted=[(38, 40), (230, 236), (352, 355)],
	Expected=[[21, 46]]
[45mYZhRjPrNY4k [[267, 300]][0m
	Predicted=[(265, 268), (276, 278), (287, 297), (337, 339), (372, 374), (396, 398)],
	Expected=[[267, 300]]
[45mYzi6lQjLjQ8 [[14, 41]][0m
	Predicted=[(14, 16), (37, 39), (292, 294), (334, 334)],
	Expected=[[14, 41]]
[45mYzIdXOeqXig [[187, 231]][0m
	Predicted=[(44, 44), (189, 191), (194, 196), (245, 246), (255, 259), (292, 298)],
	Expected=[[187, 231]]
[45myZIT6ZtsNxI [[156, 196]][0m
	Predicted=[(124, 125), (176, 179), (184, 186), (189, 190), (225, 226), (268, 271), (372, 376), (435, 436), (596, 599), (839, 840), (959, 963), (1009, 1012), (1148, 1151), (1206, 1207), (1258, 1262)],
	Expected=[[156, 196]]
[45myzj6iuYIQVM [[145, 194]][0m
	Predicted=[(0, 5), (145, 148), (155, 166), (184, 193), (543, 546)],
	Expected=[[145, 194]]
[45mYZJjFsAxN7s [[9, 32]][0m
	Predicted=[(11, 18), (26, 30), (87, 89), (192, 193)],
	Expected=[[9, 32]]
[45myZk-w1-5I2s [

	Predicted=[(144, 144)],
	Expected=[[185, 185]]
[45myZ_AbZtHjh0 [[0, 1], [374, 428]][0m
	Predicted=[(0, 2), (72, 75), (82, 82), (174, 174), (215, 222), (284, 285), (394, 396), (400, 402), (407, 409), (414, 416)],
	Expected=[[0, 1], [374, 428]]
[45mY_-Zab2OPhI [[38, 44]][0m
	Predicted=[(41, 42), (74, 74), (131, 132), (217, 218), (230, 231), (389, 392), (440, 441), (None, None)],
	Expected=[[38, 44]]
[45my_1Kg45APko [[4, 9]][0m
	Predicted=[(16, 18), (29, 30), (50, 50), (100, 101)],
	Expected=[[4, 9]]
[45my_1MJIQfhyk [[12, 49]][0m
	Predicted=[(3, 3), (6, 6), (11, 13), (18, 24), (35, 37), (41, 46), (88, 89), (122, 122)],
	Expected=[[12, 49]]
[45mY_1uPVoRdV8 [[0, 51]][0m
	Predicted=[(35, 37), (42, 51), (136, 139), (238, 240), (358, 363)],
	Expected=[[0, 51]]
[45my_3leYr24gs [[0, 5]][0m
	Predicted=[(0, 2), (52, 54), (71, 73), (77, 79), (87, 89), (107, 119), (180, 181), (193, 196), (209, 211), (238, 241), (256, 258), (274, 276), (283, 285), (289, 291), (312, 314)],
	Expected=[[0, 

	Predicted=[(0, 2)],
	Expected=[[1, 7]]
[45mz-0IS-5eg3w [[28, 31], [349, 393]][0m
	Predicted=[(19, 22), (26, 32), (136, 138), (251, 253), (350, 352), (379, 384), (388, 393), (722, 725)],
	Expected=[[28, 31], [349, 393]]
[45mZ-4qNCRDVyU [[0, 1], [138, 156]][0m
	Predicted=[(0, 2), (88, 89), (96, 98), (140, 142), (149, 150), (153, 154), (174, 175)],
	Expected=[[0, 1], [138, 156]]
[45mz-81cV2GCmw [[8, 23]][0m
	Predicted=[(6, 10), (20, 21), (78, 80), (200, 202), (303, 304)],
	Expected=[[8, 23]]
[45mz-b42cr85Bs [[0, 37]][0m
	Predicted=[(0, 2), (33, 35), (110, 111), (295, 298), (361, 363), (381, 383), (434, 436), (454, 457), (490, 492)],
	Expected=[[0, 37]]
[45mz-BPTK2Z5qA [[4, 9]][0m
	Predicted=[(4, 7), (106, 108), (291, 291)],
	Expected=[[4, 9]]
[45mZ-BSAcAxCpM [[0, 32]][0m
	Predicted=[(0, 5), (23, 32), (72, 73), (115, 117)],
	Expected=[[0, 32]]
[45mZ-bTL7oN6B4 [[121, 142]][0m
	Predicted=[(11, 13), (20, 24), (139, 141), (380, 381)],
	Expected=[[121, 142]]
[45mZ-cD4N0RI7w [[0,

	Predicted=[(8, 9), (72, 73), (236, 238)],
	Expected=[[72, 85], [115, 123]]
[45mz04lXKvqAtg [[5, 7]][0m
	Predicted=[(2, 5), (35, 36), (193, 193), (304, 304), (312, 312), (381, 382), (765, 779)],
	Expected=[[5, 7]]
[45mz051wqI1Zs4 [[3, 47]][0m
	Predicted=[(2, 5), (9, 19), (26, 28), (37, 39), (46, 49), (54, 56)],
	Expected=[[3, 47]]
[45mz05kOWxDJfE [[36, 59]][0m
	Predicted=[(0, 4), (35, 43), (49, 51), (180, 190), (215, 217)],
	Expected=[[36, 59]]
[45mz05uXZS0r6E [[234, 253]][0m
	Predicted=[(241, 243), (251, 253), (267, 269), (280, 283)],
	Expected=[[234, 253]]
[45mz079fedyx7Y [[6, 6], [535, 540]][0m
	Predicted=[(58, 59), (64, 69), (75, 80), (84, 88), (495, 496), (535, 538)],
	Expected=[[6, 6], [535, 540]]
[45mz08z8IfkLh4 [[0, 2], [39, 106]][0m
	Predicted=[(0, 3), (40, 43), (48, 53), (68, 71), (79, 80), (83, 85), (89, 91), (96, 99), (103, 106), (451, 453), (458, 460)],
	Expected=[[0, 2], [39, 106]]
[45mZ0A5AESm2ow [[24, 34]][0m
	Predicted=[(22, 24), (29, 31), (48, 48), (54, 

In [33]:
evaluate(
    videos=test_videos,
    window_duration=10,
    eps=5,
)

  0%|          | 0/199 [00:00<?, ?it/s]

[45mYzGt5abcfbA [[21, 46]][0m
	Predicted=[(26, 39), (230, 235), (351, 357), (380, 386)],
	Expected=[[21, 46]]
[45mYZhRjPrNY4k [[267, 300]][0m
	Predicted=[(264, 270), (283, 295), (334, 339)],
	Expected=[[267, 300]]
[45mYzi6lQjLjQ8 [[14, 41]][0m
	Predicted=[(13, 17), (33, 38), (56, 58), (305, 305)],
	Expected=[[14, 41]]
[45mYzIdXOeqXig [[187, 231]][0m
	Predicted=[(40, 44), (51, 56), (187, 192), (256, 259), (292, 297)],
	Expected=[[187, 231]]
[45myZIT6ZtsNxI [[156, 196]][0m
	Predicted=[(183, 191), (435, 437), (596, 599), (None, None)],
	Expected=[[156, 196]]
[45myzj6iuYIQVM [[145, 194]][0m
	Predicted=[(0, 10), (147, 153), (161, 167), (181, 193)],
	Expected=[[145, 194]]
[45mYZJjFsAxN7s [[9, 32]][0m
	Predicted=[(9, 27), (91, 96)],
	Expected=[[9, 32]]
[45myZk-w1-5I2s [[0, 0]][0m
	Predicted=[],
	Expected=[[0, 0]]
[45myZk4a4Xx9FE [[0, 5], [97, 125]][0m
	Predicted=[(0, 4), (101, 115), (124, 128), (346, 350), (None, None)],
	Expected=[[0, 5], [97, 125]]
[45mYZkXDuKto_Y [[20, 5

	Predicted=[(9, 19), (106, 106), (153, 154)],
	Expected=[[0, 22]]
[45my_fVUfzMw6o [[0, 110]][0m
	Predicted=[(0, 5), (13, 22), (33, 43), (49, 52), (65, 81), (87, 91), (752, 764), (1088, 1091), (1736, 1741), (3197, 3204)],
	Expected=[[0, 110]]
[45mY_GF69zGui4 [[0, 70]][0m
	Predicted=[(4, 9), (48, 69), (107, 107), (319, 320), (430, 432), (477, 479), (496, 496), (756, 760), (783, 785), (921, 921)],
	Expected=[[0, 70]]
[45mY_H5ofTzki8 [[9, 13]][0m
	Predicted=[(10, 13), (None, None)],
	Expected=[[9, 13]]
[45mY_hcz8CX9hA [[174, 212]][0m
	Predicted=[(0, 3), (173, 176), (206, 211), (255, 258), (339, 343)],
	Expected=[[174, 212]]
[45my_hdDtNHTRs [[0, 1]][0m
	Predicted=[(220, 222)],
	Expected=[[0, 1]]
[45mY_hy8ZB81L8 [[75, 112]][0m
	Predicted=[(81, 85), (236, 242)],
	Expected=[[75, 112]]
[45my_jw38QD5qY [[135, 157]][0m
	Predicted=[(0, 5), (137, 156)],
	Expected=[[135, 157]]
[45mY_K00erN1mA [[275, 313]][0m
	Predicted=[(279, 285), (294, 312)],
	Expected=[[275, 313]]
[45my_k0KZ2dQeM

	Predicted=[(10, 12), (108, 109), (131, 138), (None, None)],
	Expected=[[21, 33]]
[45mZ-R4H-INsUY [[0, 9], [582, 628]][0m
	Predicted=[(7, 14), (367, 372), (568, 571), (590, 595), (602, 609)],
	Expected=[[0, 9], [582, 628]]
[45mZ-rRAlexoeo [[0, 3], [797, 885]][0m
	Predicted=[(0, 4), (285, 289), (372, 376), (397, 410), (500, 505), (733, 738), (823, 885), (904, 909), (1006, 1011), (1029, 1033)],
	Expected=[[0, 3], [797, 885]]
[45mz-SIiaTBv34 [[137, 141]][0m
	Predicted=[(79, 81)],
	Expected=[[137, 141]]
[45mZ-SZFKM5gzo [[105, 142]][0m
	Predicted=[(103, 108), (138, 143)],
	Expected=[[105, 142]]
[45mZ-tMp5-33k0 [[295, 295]][0m
	Predicted=[(126, 128), (136, 140)],
	Expected=[[295, 295]]
[45mz-tMQ6AkXMI [[5, 5]][0m
	Predicted=[],
	Expected=[[5, 5]]
[45mZ-VEbK8GPW0 [[12, 18]][0m
	Predicted=[(10, 15), (108, 125), (132, 143)],
	Expected=[[12, 18]]
[45mZ-vyYcgwqgI [[10, 30]][0m
	Predicted=[(10, 13), (288, 293)],
	Expected=[[10, 30]]
[45mZ-WnvXIGik0 [[4, 39]][0m
	Predicted=[(12, 2

In [34]:
evaluate(
    videos=test_videos,
    window_duration=10,
    eps=10,
)

  0%|          | 0/199 [00:00<?, ?it/s]

[45mYzGt5abcfbA [[21, 46]][0m
	Predicted=[(26, 39), (230, 235), (351, 357), (380, 386)],
	Expected=[[21, 46]]
[45mYZhRjPrNY4k [[267, 300]][0m
	Predicted=[(264, 270), (283, 295), (334, 339)],
	Expected=[[267, 300]]
[45mYzi6lQjLjQ8 [[14, 41]][0m
	Predicted=[(13, 17), (33, 38), (56, 58), (305, 305)],
	Expected=[[14, 41]]
[45mYzIdXOeqXig [[187, 231]][0m
	Predicted=[(40, 44), (51, 56), (187, 192), (256, 259), (292, 297)],
	Expected=[[187, 231]]
[45myZIT6ZtsNxI [[156, 196]][0m
	Predicted=[(183, 191), (435, 437), (596, 599), (None, None)],
	Expected=[[156, 196]]
[45myzj6iuYIQVM [[145, 194]][0m
	Predicted=[(0, 10), (147, 153), (161, 167), (181, 193)],
	Expected=[[145, 194]]
[45mYZJjFsAxN7s [[9, 32]][0m
	Predicted=[(9, 27), (91, 96)],
	Expected=[[9, 32]]
[45myZk-w1-5I2s [[0, 0]][0m
	Predicted=[],
	Expected=[[0, 0]]
[45myZk4a4Xx9FE [[0, 5], [97, 125]][0m
	Predicted=[(0, 4), (101, 115), (124, 128), (346, 350), (None, None)],
	Expected=[[0, 5], [97, 125]]
[45mYZkXDuKto_Y [[20, 5

	Predicted=[(9, 19), (106, 106), (153, 154)],
	Expected=[[0, 22]]
[45my_fVUfzMw6o [[0, 110]][0m
	Predicted=[(0, 5), (13, 22), (33, 43), (49, 52), (65, 81), (87, 91), (752, 764), (1088, 1091), (1736, 1741), (3197, 3204)],
	Expected=[[0, 110]]
[45mY_GF69zGui4 [[0, 70]][0m
	Predicted=[(4, 9), (48, 69), (107, 107), (319, 320), (430, 432), (477, 479), (496, 496), (756, 760), (783, 785), (921, 921)],
	Expected=[[0, 70]]
[45mY_H5ofTzki8 [[9, 13]][0m
	Predicted=[(10, 13), (None, None)],
	Expected=[[9, 13]]
[45mY_hcz8CX9hA [[174, 212]][0m
	Predicted=[(0, 3), (173, 176), (206, 211), (255, 258), (339, 343)],
	Expected=[[174, 212]]
[45my_hdDtNHTRs [[0, 1]][0m
	Predicted=[(220, 222)],
	Expected=[[0, 1]]
[45mY_hy8ZB81L8 [[75, 112]][0m
	Predicted=[(81, 85), (236, 242)],
	Expected=[[75, 112]]
[45my_jw38QD5qY [[135, 157]][0m
	Predicted=[(0, 5), (137, 156)],
	Expected=[[135, 157]]
[45mY_K00erN1mA [[275, 313]][0m
	Predicted=[(279, 285), (294, 312)],
	Expected=[[275, 313]]
[45my_k0KZ2dQeM

	Predicted=[(10, 12), (108, 109), (131, 138), (None, None)],
	Expected=[[21, 33]]
[45mZ-R4H-INsUY [[0, 9], [582, 628]][0m
	Predicted=[(7, 14), (367, 372), (568, 571), (590, 595), (602, 609)],
	Expected=[[0, 9], [582, 628]]
[45mZ-rRAlexoeo [[0, 3], [797, 885]][0m
	Predicted=[(0, 4), (285, 289), (372, 376), (397, 410), (500, 505), (733, 738), (823, 885), (904, 909), (1006, 1011), (1029, 1033)],
	Expected=[[0, 3], [797, 885]]
[45mz-SIiaTBv34 [[137, 141]][0m
	Predicted=[(79, 81)],
	Expected=[[137, 141]]
[45mZ-SZFKM5gzo [[105, 142]][0m
	Predicted=[(103, 108), (138, 143)],
	Expected=[[105, 142]]
[45mZ-tMp5-33k0 [[295, 295]][0m
	Predicted=[(126, 128), (136, 140)],
	Expected=[[295, 295]]
[45mz-tMQ6AkXMI [[5, 5]][0m
	Predicted=[],
	Expected=[[5, 5]]
[45mZ-VEbK8GPW0 [[12, 18]][0m
	Predicted=[(10, 15), (108, 125), (132, 143)],
	Expected=[[12, 18]]
[45mZ-vyYcgwqgI [[10, 30]][0m
	Predicted=[(10, 13), (288, 293)],
	Expected=[[10, 30]]
[45mZ-WnvXIGik0 [[4, 39]][0m
	Predicted=[(12, 2