# üîç Visual Plagiarism Detector - Complete Cookbook

**Protect your video content with AI-powered similarity detection**

This notebook teaches you to build a production-ready system that detects when your videos are stolen, re-uploaded, or used without permission.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/editor/creative/Automated_Video_Copyright_Detection.ipynb)

---

## üìã What You'll Build

‚úÖ Index your video portfolio as searchable "fingerprints"  
‚úÖ Detect visual similarity in suspect videos (even with edits)  
‚úÖ Generate side-by-side comparison clips for DMCA takedowns  
‚úÖ Identify sequential matches (stronger evidence)  
‚úÖ Create comprehensive reports with confidence scores  

**Time to complete:** 30-45 minutes  
**Difficulty:** Intermediate  
**Cost:** Free tier available (50 videos)

---

## üéØ Use Cases

- **YouTube Creators**: Detect reuploads and content theft
- **Production Companies**: Monitor unauthorized usage
- **Brand Agencies**: Protect client assets
- **Stock Platforms**: Prevent stolen submissions

---

## üìö Table of Contents

1. [Setup & Installation](#setup)
2. [Understanding the System](#understanding)
3. [Part 1: Index Your Portfolio](#part1)
4. [Part 2: Detect Plagiarism](#part2)
5. [Part 3: Advanced Analysis](#part3)
6. [Part 4: Generate Evidence](#part4)
7. [Production Deployment](#production)
8. [Next Steps](#next)


---

# üì¶ 1. Setup & Installation

First, let's install all required packages and set up authentication.

In [2]:
# Install required packages
!pip install -q videodb numpy pandas scikit-learn matplotlib seaborn tqdm ipywidgets

print("‚úÖ All packages installed successfully!")

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/43.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m43.3/43.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.6/1.6 MB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for videodb (setup.py) ... [?25l[?25hdone
‚úÖ All packages installed successfully!


In [5]:
# Import libraries
import videodb
from videodb import connect, SceneExtractionType
from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, asdict
import json
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')
import os
from getpass import getpass

# Set style for plots
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## üîë Authentication

Get your free API key from [VideoDB Console](https://console.videodb.io/)

**Free Tier:** 50 video uploads, no credit card required

In [6]:
api_key = getpass("Enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key

conn = videodb.connect()
collection = conn.get_collection()
print("‚úÖ Connected to VideoDB successfully!")
print(f"üìÅ Collection ID: {collection.id}")

Enter your VideoDB API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
‚úÖ Connected to VideoDB successfully!
üìÅ Collection ID: c-81fc6459-fe30-44ac-8c5b-ea0898c2e152


---

# üß† 2. Understanding the System

## How It Works

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ           YOUR VIDEO PORTFOLIO                  ‚îÇ
‚îÇ  (Original Content You Want to Protect)         ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                 ‚îÇ
                 ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ         SCENE INDEXING (One-Time)               ‚îÇ
‚îÇ  ‚Ä¢ Detect shot changes                          ‚îÇ
‚îÇ  ‚Ä¢ Generate visual descriptions (AI)            ‚îÇ
‚îÇ  ‚Ä¢ Create vector embeddings                     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                 ‚îÇ
                 ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ         PORTFOLIO DATABASE                      ‚îÇ
‚îÇ  Scene 1: "Man in red jacket walking..."        ‚îÇ
‚îÇ  Scene 2: "Close-up of product..."              ‚îÇ
‚îÇ  Scene 3: "Aerial view of city..."              ‚îÇ
‚îÇ  ... (each with vector embedding)               ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ           SUSPECT VIDEO                         ‚îÇ
‚îÇ  (Potentially Stolen Content)                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                 ‚îÇ
                 ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ         SAME INDEXING PROCESS                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                 ‚îÇ
                 ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ      SIMILARITY COMPARISON                      ‚îÇ
‚îÇ  ‚Ä¢ Compare embeddings (cosine similarity        ‚îÇ
‚îÇ  ‚Ä¢ Score: 0.95 = Nearly Identical               ‚îÇ
‚îÇ  ‚Ä¢ Score: 0.60 = Different                      ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                 ‚îÇ
                 ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ         PLAGIARISM REPORT                       ‚îÇ
‚îÇ  ‚Ä¢ High-confidence matches                      ‚îÇ
‚îÇ  ‚Ä¢ Sequential segments                          ‚îÇ
‚îÇ  ‚Ä¢ Evidence clips (side-by-side)                ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

## Key Concepts

### 1. Scene Indexing
VideoDB breaks videos into **scenes** and generates AI descriptions:
- Detects camera changes automatically
- Uses Vision Language Models (VLMs) to describe content
- Creates semantic embeddings (1536-dim vectors)

### 2. Similarity Scoring
Cosine similarity measures how "close" two scenes are:
```
0.95 - 1.00 ‚Üí Identical/Near-Identical ‚ö†Ô∏è HIGH CONFIDENCE
0.85 - 0.95 ‚Üí Very Similar ‚ö†Ô∏è Likely plagiarism
0.75 - 0.85 ‚Üí Similar üü° Investigate further
0.60 - 0.75 ‚Üí Somewhat Similar üü¢ May be coincidental
0.00 - 0.60 ‚Üí Different ‚úÖ Not a match
```

### 3. Sequential Matching
Finding **multiple consecutive scenes** that match = stronger evidence than isolated matches

---

# üé¨ 3. Part 1: Index Your Portfolio

Let's build the core system to index your original videos.

## üìö Portfolio Indexer Class

This class handles uploading and indexing your video portfolio.

In [None]:
class PortfolioIndexer:
    """
    Indexes your video portfolio for plagiarism detection.
    """

    DEFAULT_SCENE_PROMPT = (
        "Describe the visual content in detail: "
        "people, objects, actions, setting, colors, composition, "
        "camera angle, lighting, and any visible text or logos."
    )

    def __init__(self, collection):
        self.collection = collection
        self.portfolio_db = []

    def upload_and_index_video(
        self,
        video_path: str,
        metadata: Optional[Dict] = None,
        threshold: int = 20
    ) -> Tuple:
        """
        Upload and index a single video.

        Args:
            video_path: URL or local path
            metadata: Custom metadata dict
            threshold: Shot detection sensitivity (10-30)

        Returns:
            (video_object, index_id)
        """
        print(f"üì§ Uploading: {video_path}")

        # Upload video
        video = self.collection.upload(url=video_path)
        print(f"  ‚úÖ Uploaded: {video.name} (ID: {video.id})")

        # Create scene index
        print(f"  üîç Creating scene index...")
        index_id = video.index_scenes(
            extraction_type=SceneExtractionType.shot_based,
            extraction_config={"threshold": threshold},
            prompt=self.DEFAULT_SCENE_PROMPT
        )

        print(f"  ‚úÖ Scene index: {index_id}")

        # Store metadata
        video_metadata = {
            "video_id": video.id,
            "video_name": video.name,
            "index_id": index_id,
            "upload_date": datetime.now().isoformat(),
            "custom_metadata": metadata or {}
        }

        self.portfolio_db.append(video_metadata)

        return video, index_id

    def batch_index_portfolio(self, video_urls: List[str]) -> List[Dict]:
        """
        Index multiple videos with progress tracking.
        """
        results = []

        for idx, url in enumerate(tqdm(video_urls, desc="Indexing portfolio"), 1):
            try:
                video, index_id = self.upload_and_index_video(url)
                results.append({
                    "status": "success",
                    "url": url,
                    "video_id": video.id,
                    "index_id": index_id
                })
            except Exception as e:
                print(f"  ‚ùå Error: {e}")
                results.append({
                    "status": "failed",
                    "url": url,
                    "error": str(e)
                })

        return results

    def save_portfolio(self, filename: str = "portfolio_index.json"):
        """Save portfolio to JSON file."""
        with open(filename, 'w') as f:
            json.dump(self.portfolio_db, f, indent=2)
        print(f"üíæ Portfolio saved: {len(self.portfolio_db)} videos")

    def load_portfolio(self, filename: str = "portfolio_index.json"):
        """Load existing portfolio."""
        with open(filename, 'r') as f:
            self.portfolio_db = json.load(f)
        print(f"üìÇ Loaded: {len(self.portfolio_db)} videos")

    def get_stats(self) -> Dict:
        """Get portfolio statistics."""
        if not self.portfolio_db:
            return {"total_videos": 0}

        return {
            "total_videos": len(self.portfolio_db),
            "videos": [v["video_name"] for v in self.portfolio_db]
        }

print("‚úÖ PortfolioIndexer class defined")

‚úÖ PortfolioIndexer class defined


## üé• Index Your Videos

Now let's index your video portfolio. Replace these URLs with your actual videos.

In [None]:
# Initialize indexer
indexer = PortfolioIndexer(collection)

# Define your video portfolio
# Replace these with your actual video URLs
my_videos = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",  # Example video 1
    # Add more videos here:
    "https://youtu.be/tNI57rl_Xoo?si=MP9O8n_XcdynSIh4",
    "https://youtu.be/RoN2LO5E-QA?si=5GrtNUHL5cWw5UZ0",
]

print(f"üìö Portfolio size: {len(my_videos)} videos")
print("\n‚ö†Ô∏è  Note: Indexing takes ~5-10 minutes per hour of video")

üìö Portfolio size: 3 videos

‚ö†Ô∏è  Note: Indexing takes ~5-10 minutes per hour of video


In [None]:
# ========================================
# IMPORT EXISTING VIDEOS (FIXED)
# ========================================

print("üîç Checking for existing videos in your VideoDB collection...\n")

# Get all videos from collection
all_videos = collection.get_videos()
print(f"üì¶ Total videos in collection: {len(all_videos)}")
print(f"üìö Videos in portfolio index: {len(indexer.portfolio_db)}")

# Show which videos are NOT in portfolio
missing_videos = []
for video in all_videos:
    in_portfolio = any(v['video_id'] == video.id for v in indexer.portfolio_db)
    if not in_portfolio:
        missing_videos.append(video)

if missing_videos:
    print(f"\n‚ö†Ô∏è  Found {len(missing_videos)} videos NOT in portfolio:\n")
    for idx, video in enumerate(missing_videos, 1):
        print(f"  {idx}. {video.name}")
        print(f"     ID: {video.id}\n")

    print("="*70)
    print("üí° Importing existing videos to portfolio...")
    print("="*70)

    # Set to True to auto-import
    import_all = True

    if import_all:
        print("\nüîÑ Importing all existing videos...\n")

        for video in tqdm(missing_videos, desc="Processing videos"):
            try:
                print(f"üìπ {video.name}")

                # FIXED: Check for existing scene indexes first
                # Try to get existing indexes
                existing_indexes = []
                try:
                    # List all indexes for this video
                    # Note: API might not have a direct list method, so we try common index IDs
                    # Or we can just try to create and catch the error
                    pass
                except:
                    pass

                # Try to create scene index (will fail if exists)
                try:
                    print(f"  üîç Creating scene index...")
                    index_id = video.index_scenes(
                        extraction_type=SceneExtractionType.shot_based,
                        extraction_config={"threshold": 20},
                        prompt=PortfolioIndexer.DEFAULT_SCENE_PROMPT
                    )
                    print(f"  ‚úÖ New index created: {index_id}")

                except Exception as e:
                    # If error says "already exists", extract the existing index_id
                    error_msg = str(e)

                    if "already exists" in error_msg:
                        # Extract index_id from error message
                        # Error format: "Scene index with id XXXXX already exists..."
                        import re
                        match = re.search(r'id ([a-f0-9]+) already exists', error_msg)

                        if match:
                            index_id = match.group(1)
                            print(f"  ‚úÖ Using existing index: {index_id}")
                        else:
                            print(f"  ‚ö†Ô∏è  Has existing index but couldn't extract ID")
                            print(f"     Error: {error_msg}")
                            continue
                    else:
                        print(f"  ‚ùå Error: {e}")
                        continue

                # Add to portfolio
                indexer.portfolio_db.append({
                    "video_id": video.id,
                    "video_name": video.name,
                    "index_id": index_id,
                    "upload_date": datetime.now().isoformat(),
                    "custom_metadata": {"imported_from_existing": True}
                })

                print(f"  ‚úÖ Added to portfolio\n")

            except Exception as e:
                print(f"  ‚ùå Unexpected error: {e}\n")

        # Save portfolio
        indexer.save_portfolio()

        print("\n" + "="*70)
        print(f"‚úÖ Portfolio updated!")
        print(f"üìä Total videos in portfolio: {len(indexer.portfolio_db)}")
        print("="*70)
    else:
        print("\nüí° To import these videos, set 'import_all = True' above and re-run this cell")
else:
    print("\n‚úÖ All collection videos are already in your portfolio!")

üîç Checking for existing videos in your VideoDB collection...

üì¶ Total videos in collection: 16
üìö Videos in portfolio index: 0

‚ö†Ô∏è  Found 16 videos NOT in portfolio:

  1. Phil Dunphy's Best Moments
     ID: m-z-019c2eb2-0c2e-7812-8b43-e65ac5d6a897

  2. Phil Dunphy's Best Moments
     ID: m-z-019c2ea5-f18b-7c43-a708-7a4689237e6f

  3. Phil Dunphy's Best Moments
     ID: m-z-019c2e8e-53fb-7b30-ad66-b48db33c4a2f

  4. Phil Dunphy's Best Moments
     ID: m-z-019c2e74-8c6f-72d1-b34d-dce1fdf08ebf

  5. Modern Family | The Best Advice from Phil Dunphy
     ID: m-z-019c2e6b-8633-7062-a5b6-03a49cd1c8af

  6. Modern Family | The Best Advice from Phil Dunphy
     ID: m-z-019c2e68-3f8e-7eb2-8794-006915aa7b20

  7. The BEST of Phil Dunphy (Mashup) | Modern Family | TBS
     ID: m-z-019c2e66-20c6-7d30-8e2e-91b9b4a01c8c

  8. Brooklyn Nine-Nine having world class writing for 23 minutes straight
     ID: m-z-019c2e65-3df2-78e1-9329-3503dea94911

  9. Rick Astley - Never Gonna Give You Up

Processing videos:   0%|          | 0/16 [00:00<?, ?it/s]

üìπ Phil Dunphy's Best Moments
  üîç Creating scene index...
  ‚úÖ Using existing index: cfb207c0a588ecc6
  ‚úÖ Added to portfolio

üìπ Phil Dunphy's Best Moments
  üîç Creating scene index...
  ‚úÖ Using existing index: a7a5699dd0aa0436
  ‚úÖ Added to portfolio

üìπ Phil Dunphy's Best Moments
  üîç Creating scene index...
  ‚úÖ Using existing index: 27c935b50649dffd
  ‚úÖ Added to portfolio

üìπ Phil Dunphy's Best Moments
  üîç Creating scene index...
  ‚úÖ Using existing index: 6efe0930f74f02cb
  ‚úÖ Added to portfolio

üìπ Modern Family | The Best Advice from Phil Dunphy
  üîç Creating scene index...
  ‚úÖ Using existing index: 6e7dffe075ee9f36
  ‚úÖ Added to portfolio

üìπ Modern Family | The Best Advice from Phil Dunphy
  üîç Creating scene index...
  ‚úÖ Using existing index: 2c93b8f10a1b8c19
  ‚úÖ Added to portfolio

üìπ The BEST of Phil Dunphy (Mashup) | Modern Family | TBS
  üîç Creating scene index...
  ‚úÖ Using existing index: b2bac6faadb28a6c
  ‚úÖ Added to p

In [None]:
# Run the indexing
# WARNING: This will use your VideoDB quota
# Comment out this cell if you've already indexed

results = indexer.batch_index_portfolio(my_videos)

# Show results
print("\n" + "="*70)
print("INDEXING RESULTS")
print("="*70)

success_count = sum(1 for r in results if r['status'] == 'success')
failed_count = sum(1 for r in results if r['status'] == 'failed')

print(f"‚úÖ Successful: {success_count}")
print(f"‚ùå Failed: {failed_count}")

# Save portfolio
indexer.save_portfolio()

Indexing portfolio:   0%|          | 0/3 [00:00<?, ?it/s]

üì§ Uploading: https://www.youtube.com/watch?v=dQw4w9WgXcQ
  ‚úÖ Uploaded: Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster) (ID: m-z-019c3128-a10b-7690-9da9-1b7d0811c6b3)
  üîç Creating scene index...
  ‚úÖ Scene index: 6632331b3aa073de
üì§ Uploading: https://youtu.be/tNI57rl_Xoo?si=MP9O8n_XcdynSIh4
  ‚úÖ Uploaded: Brooklyn Nine-Nine having world class writing for 23 minutes straight (ID: m-z-019c3129-31d4-7953-bb29-0248ef64cb7a)
  üîç Creating scene index...
  ‚úÖ Scene index: c0b974100b41e8de
üì§ Uploading: https://youtu.be/RoN2LO5E-QA?si=5GrtNUHL5cWw5UZ0
  ‚úÖ Uploaded: The BEST of Phil Dunphy (Mashup) | Modern Family | TBS (ID: m-z-019c312a-f37c-7460-b72b-05bf56f34914)
  üîç Creating scene index...
  ‚úÖ Scene index: 0738b23101110930

INDEXING RESULTS
‚úÖ Successful: 3
‚ùå Failed: 0
üíæ Portfolio saved: 21 videos


In [None]:
# View portfolio statistics
stats = indexer.get_stats()

print("üìä Portfolio Statistics")
print("="*50)
print(f"Total Videos: {stats['total_videos']}")
print("\nVideos in Portfolio:")
for idx, name in enumerate(stats['videos'], 1):
    print(f"  {idx}. {name}")

üìä Portfolio Statistics
Total Videos: 21

Videos in Portfolio:
  1. Phil Dunphy's Best Moments
  2. Phil Dunphy's Best Moments
  3. Phil Dunphy's Best Moments
  4. Phil Dunphy's Best Moments
  5. Modern Family | The Best Advice from Phil Dunphy
  6. Modern Family | The Best Advice from Phil Dunphy
  7. The BEST of Phil Dunphy (Mashup) | Modern Family | TBS
  8. Brooklyn Nine-Nine having world class writing for 23 minutes straight
  9. Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)
  10. The BEST of Phil Dunphy (Mashup) | Modern Family | TBS
  11. Brooklyn Nine-Nine having world class writing for 23 minutes straight
  12. Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)
  13. Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)
  14. The Grand Finale | Arjun Erigaisi vs Vishy Anand | Jerusalem Masters 2025
  15. ‚ÄúDumbest idea I‚Äôve heard‚Äù to $100M ARR: Inside the rise of Gamma | Grant Lee (co-founder)
  16. VideoDB: The ul

---

# üïµÔ∏è 4. Part 2: Detect Plagiarism

Now let's build the plagiarism detector to compare suspect videos against your portfolio.

In [None]:
from videodb import IndexType
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
from typing import List, Dict

class FastPlagiarismDetector:
    """
    Generic plagiarism detector - works for ANY video content.
    """

    def __init__(self, collection, portfolio_db: List[Dict]):
        self.collection = collection
        self.portfolio = portfolio_db
        print(f"Loaded portfolio: {len(self.portfolio)} videos")

    def detect_plagiarism(
        self,
        suspect_video_id: str,
        suspect_index_id: str,
        sample_size: int = 30,
        similarity_threshold: float = 0.70,
        auto_expand: bool = True
    ) -> Dict:
        """
        Generic plagiarism detection using visual similarity only.
        """

        print("="*70)
        print("FAST PLAGIARISM DETECTION")
        print("="*70)

        suspect_video = self.collection.get_video(suspect_video_id)

        # Get ALL scenes
        suspect_scenes_all = suspect_video.get_scene_index(suspect_index_id)
        if not isinstance(suspect_scenes_all, list):
            suspect_scenes_all = suspect_scenes_all.get_scenes()

        total_scenes = len(suspect_scenes_all)
        print(f"\nSuspect video: {suspect_video.name}")
        print(f"Total scenes: {total_scenes}")

        # STEP 1: Smart sampling
        sample_indices = self._get_distributed_sample(total_scenes, sample_size)
        suspect_scenes_sample = [suspect_scenes_all[i] for i in sample_indices]

        print(f"Testing sample: {len(suspect_scenes_sample)} scenes")
        print(f"Portfolio: {len(self.portfolio)} videos")

        # STEP 2: Run comparison on sample
        print(f"\nPhase 1: Sample comparison...")

        sample_matches = self._compare_scenes_batch(
            suspect_video_id,
            suspect_scenes_sample,
            self.portfolio,
            max_results_per_scene=1
        )

        print(f"\nSample results: {len(sample_matches)} matches found")

        if len(sample_matches) == 0:
            return {
                "plagiarism_detected": False,
                "matches": pd.DataFrame(),
                "summary": "No similar scenes detected in sample"
            }

        # Analyze sample matches
        high_conf_sample = sample_matches[
            sample_matches['similarity_score'] >= similarity_threshold
        ]

        print(f"High-confidence in sample: {len(high_conf_sample)}")

        if len(high_conf_sample) == 0:
            return {
                "plagiarism_detected": False,
                "confidence": "LOW",
                "matches": sample_matches,
                "summary": f"Found {len(sample_matches)} low-confidence matches"
            }

        # Found strong matches! Determine which portfolio video(s)
        top_matches = high_conf_sample['portfolio_video_name'].value_counts()
        print(f"\nTop matching videos:")
        for video_name, count in top_matches.head(3).items():
            avg_score = high_conf_sample[
                high_conf_sample['portfolio_video_name'] == video_name
            ]['similarity_score'].mean()
            print(f"  - {video_name}: {count} matches (avg score: {avg_score:.3f})")

        if not auto_expand:
            return {
                "plagiarism_detected": True,
                "confidence": "MEDIUM",
                "top_matches": top_matches.to_dict(),
                "matches": high_conf_sample,
                "summary": f"Sample detected {len(high_conf_sample)} matches"
            }

        # STEP 3: Run FULL comparison against top matching video only
        top_portfolio_name = top_matches.index[0]
        top_portfolio_item = [
            p for p in self.portfolio
            if p['video_name'] == top_portfolio_name
        ][0]

        print(f"\nPhase 2: Full comparison against top match...")
        print(f"Target: {top_portfolio_name}")
        print(f"Comparing all {total_scenes} scenes...")

        full_matches = self._compare_scenes_batch(
            suspect_video_id,
            suspect_scenes_all,
            [top_portfolio_item],
            max_results_per_scene=1
        )

        high_conf_full = full_matches[
            full_matches['similarity_score'] >= similarity_threshold
        ]

        print(f"\nFull results: {len(full_matches)} total, {len(high_conf_full)} high-confidence")

        # Detect sequential matches
        sequences = self._detect_sequences(high_conf_full)

        return {
            "plagiarism_detected": True,
            "confidence": "HIGH" if len(sequences) > 0 else "MEDIUM",
            "top_match": top_portfolio_name,
            "total_matches": len(full_matches),
            "high_confidence_matches": len(high_conf_full),
            "sequential_segments": len(sequences),
            "matches": full_matches,
            "high_confidence_only": high_conf_full,
            "sequences": sequences,
            "summary": f"Found {len(high_conf_full)} high-confidence matches with {top_portfolio_name}"
        }

    def _get_distributed_sample(self, total: int, sample_size: int) -> List[int]:
        """Sample scenes evenly distributed across the video."""
        if total <= sample_size:
            return list(range(total))

        step = total / sample_size
        return [int(i * step) for i in range(sample_size)]

    def _compare_scenes_batch(
        self,
        suspect_video_id: str,
        suspect_scenes: List[Dict],
        portfolio_items: List[Dict],
        max_results_per_scene: int = 1
    ) -> pd.DataFrame:
        """Compare scenes against portfolio."""
        all_matches = []

        for suspect_scene in tqdm(suspect_scenes, desc="Comparing"):
            suspect_description = suspect_scene.get('description', '')
            if not suspect_description:
                continue

            for portfolio_item in portfolio_items:
                try:
                    portfolio_video = self.collection.get_video(portfolio_item["video_id"])

                    search_results = portfolio_video.search(
                        query=suspect_description,
                        search_type="semantic",
                        index_type=IndexType.scene,
                        index_id=portfolio_item["index_id"]
                    )

                    if hasattr(search_results, 'shots') and len(search_results.shots) > 0:
                        for shot in search_results.shots[:max_results_per_scene]:
                            # FIXED: Use search_score instead of score
                            shot_score = getattr(shot, 'search_score', 0.0)
                            shot_start = getattr(shot, 'start', 0)
                            shot_end = getattr(shot, 'end', 0)

                            if shot_score > 0:
                                all_matches.append({
                                    "suspect_video_id": suspect_video_id,
                                    "suspect_scene_start": suspect_scene.get('start', 0),
                                    "suspect_scene_end": suspect_scene.get('end', 0),
                                    "suspect_description": suspect_description[:150],
                                    "portfolio_video_id": portfolio_item["video_id"],
                                    "portfolio_video_name": portfolio_item["video_name"],
                                    "portfolio_scene_start": shot_start,
                                    "portfolio_scene_end": shot_end,
                                    "similarity_score": float(shot_score)
                                })

                except Exception as e:
                    if "No results found" not in str(e):
                        pass
                    continue

        return pd.DataFrame(all_matches)

    def _detect_sequences(
        self,
        matches_df: pd.DataFrame,
        max_gap: float = 10.0,
        min_length: int = 3
    ) -> List[Dict]:
        """Detect consecutive matching scenes."""
        if matches_df.empty:
            return []

        sequences = []
        sorted_matches = matches_df.sort_values('suspect_scene_start')

        current_seq = []

        for idx, row in sorted_matches.iterrows():
            if not current_seq:
                current_seq = [row.to_dict()]
            else:
                gap = row['suspect_scene_start'] - current_seq[-1]['suspect_scene_end']

                if gap <= max_gap:
                    current_seq.append(row.to_dict())
                else:
                    if len(current_seq) >= min_length:
                        sequences.append({
                            "num_scenes": len(current_seq),
                            "duration": current_seq[-1]['suspect_scene_end'] - current_seq[0]['suspect_scene_start'],
                            "avg_score": np.mean([s['similarity_score'] for s in current_seq]),
                            "start": current_seq[0]['suspect_scene_start'],
                            "end": current_seq[-1]['suspect_scene_end']
                        })
                    current_seq = [row.to_dict()]

        if len(current_seq) >= min_length:
            sequences.append({
                "num_scenes": len(current_seq),
                "duration": current_seq[-1]['suspect_scene_end'] - current_seq[0]['suspect_scene_start'],
                "avg_score": np.mean([s['similarity_score'] for s in current_seq]),
                "start": current_seq[0]['suspect_scene_start'],
                "end": current_seq[-1]['suspect_scene_end']
            })

        return sorted(sequences, key=lambda x: x['avg_score'], reverse=True)

    def generate_report(self, result: Dict) -> None:
        """Generate detailed report."""
        print("\n" + "="*70)
        print("PLAGIARISM DETECTION REPORT")
        print("="*70)

        if result['plagiarism_detected']:
            print(f"\nVERDICT: PLAGIARISM DETECTED")
            print(f"Confidence: {result.get('confidence', 'UNKNOWN')}")

            if 'top_match' in result:
                print(f"\nMost Similar Video: {result['top_match']}")

            if 'total_matches' in result:
                print(f"\nStatistics:")
                print(f"  Total matches: {result['total_matches']}")
                print(f"  High-confidence: {result['high_confidence_matches']}")
                print(f"  Sequential segments: {result['sequential_segments']}")

            if result.get('sequences'):
                print(f"\nSequential Matches (strong evidence):")
                for i, seq in enumerate(result['sequences'][:5], 1):
                    print(f"  {i}. {seq['num_scenes']} consecutive scenes")
                    print(f"     Duration: {seq['duration']:.1f}s")
                    print(f"     Avg score: {seq['avg_score']:.3f}")
                    print(f"     Time: {seq['start']:.1f}s - {seq['end']:.1f}s")

            if 'high_confidence_only' in result and not result['high_confidence_only'].empty:
                print(f"\nTop Individual Matches:")
                top = result['high_confidence_only'].nlargest(10, 'similarity_score')
                display(top[['portfolio_video_name', 'similarity_score', 'suspect_scene_start', 'portfolio_scene_start']])
        else:
            print(f"\nVERDICT: NO PLAGIARISM DETECTED")
            print(f"\n{result.get('summary', 'Videos appear to be different')}")

        print("="*70)


print("FastPlagiarismDetector ready - FIXED to use search_score")

FastPlagiarismDetector ready - FIXED to use search_score


## üîç Run Plagiarism Detection

Now let's analyze a suspect video!

In [None]:
# Initialize fast detector with your portfolio
fast_detector = FastPlagiarismDetector(collection, indexer.portfolio_db)

# Enter suspect video URL
suspect_url = "https://youtu.be/jKBRCloTK4w?si=QKTg1nAZdE7zkmm9"

print(f"üïµÔ∏è  Suspect video: {suspect_url}")

Loaded portfolio: 21 videos
üïµÔ∏è  Suspect video: https://youtu.be/jKBRCloTK4w?si=QKTg1nAZdE7zkmm9


In [None]:
# Upload suspect video
print(f"üïµÔ∏è  Uploading suspect video: {suspect_url}")
suspect_video = collection.upload(url=suspect_url)
print(f"‚úÖ Uploaded: {suspect_video.name}")

# Index scenes
print("üîç Creating scene index...")
suspect_index = suspect_video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 20},
    prompt=PortfolioIndexer.DEFAULT_SCENE_PROMPT
)
print(f"‚úÖ Scene index: {suspect_index}")

üïµÔ∏è  Uploading suspect video: https://youtu.be/jKBRCloTK4w?si=QKTg1nAZdE7zkmm9
‚úÖ Uploaded: Phil Dunphy's Best Moments
üîç Creating scene index...
‚úÖ Scene index: 07166fedb3bf7f46


In [None]:
# Run plagiarism detection
result = fast_detector.detect_plagiarism(
    suspect_video_id=suspect_video.id,
    suspect_index_id=suspect_index,
    sample_size=30,
    similarity_threshold=0.70,
    auto_expand=True
)

# Show report
fast_detector.generate_report(result)

FAST PLAGIARISM DETECTION

Suspect video: Phil Dunphy's Best Moments
Total scenes: 198
Testing sample: 30 scenes
Portfolio: 21 videos

Phase 1: Sample comparison...


Comparing:   0%|          | 0/30 [00:00<?, ?it/s]


Sample results: 630 matches found
High-confidence in sample: 348

Top matching videos:
  - Phil Dunphy's Best Moments: 108 matches (avg score: 0.811)
  - The BEST of Phil Dunphy (Mashup) | Modern Family | TBS: 90 matches (avg score: 0.789)
  - Brooklyn Nine-Nine having world class writing for 23 minutes straight: 67 matches (avg score: 0.790)

Phase 2: Full comparison against top match...
Target: Phil Dunphy's Best Moments
Comparing all 198 scenes...


Comparing:   0%|          | 0/198 [00:00<?, ?it/s]


Full results: 198 total, 177 high-confidence

PLAGIARISM DETECTION REPORT

VERDICT: PLAGIARISM DETECTED
Confidence: HIGH

Most Similar Video: Phil Dunphy's Best Moments

Statistics:
  Total matches: 198
  High-confidence: 177
  Sequential segments: 3

Sequential Matches (strong evidence):
  1. 62 consecutive scenes
     Duration: 178.4s
     Avg score: 0.815
     Time: 0.0s - 178.4s
  2. 64 consecutive scenes
     Duration: 181.3s
     Avg score: 0.803
     Time: 193.8s - 375.1s
  3. 51 consecutive scenes
     Duration: 146.0s
     Avg score: 0.795
     Time: 392.9s - 538.9s

Top Individual Matches:


Unnamed: 0,portfolio_video_name,similarity_score,suspect_scene_start,portfolio_scene_start
90,Phil Dunphy's Best Moments,0.950986,244.411,243.91
117,Phil Dunphy's Best Moments,0.935215,299.9,299.9
108,Phil Dunphy's Best Moments,0.927552,279.479,277.544
49,Phil Dunphy's Best Moments,0.919779,137.771,137.771
0,Phil Dunphy's Best Moments,0.910751,0.0,0.0
129,Phil Dunphy's Best Moments,0.910268,343.744,342.676
109,Phil Dunphy's Best Moments,0.90978,280.18,280.18
145,Phil Dunphy's Best Moments,0.906769,411.345,12.079
14,Phil Dunphy's Best Moments,0.901291,29.162,29.162
29,Phil Dunphy's Best Moments,0.89491,67.668,67.668




## üìã Generate Final Report

In [None]:
# Generate evidence clips (FINAL FINAL VERSION)
if result['plagiarism_detected'] and 'high_confidence_only' in result and not result['high_confidence_only'].empty:
    print("üé¨ Generating Evidence Clips")
    print("="*70)

    from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit

    high_conf = result['high_confidence_only']
    evidence_clips = []
    num_clips = min(5, len(high_conf))

    for idx, (_, match) in enumerate(high_conf.head(num_clips).iterrows(), 1):
        print(f"\nClip {idx}/{num_clips}:")
        print(f"  Similarity: {match['similarity_score']:.3f}")

        try:
            # Calculate duration
            duration = min(
                match['portfolio_scene_end'] - match['portfolio_scene_start'],
                match['suspect_scene_end'] - match['suspect_scene_start']
            )

            # Create timeline
            timeline = Timeline(conn)

            # Create assets
            portfolio_asset = VideoAsset(
                id=match['portfolio_video_id'],
                start=match['portfolio_scene_start']
            )

            suspect_asset = VideoAsset(
                id=match['suspect_video_id'],
                start=match['suspect_scene_start']
            )

            # Create clips
            left_clip = Clip(
                asset=portfolio_asset,
                duration=duration,
                position=Position.left,
                fit=Fit.crop,
                scale=0.5
            )

            right_clip = Clip(
                asset=suspect_asset,
                duration=duration,
                position=Position.right,
                fit=Fit.crop,
                scale=0.5
            )

            # Add to timeline (CORRECTED: add start parameter)
            track = Track()
            track.add_clip(start=0, clip=left_clip)
            track.add_clip(start=0, clip=right_clip)  # Both start at 0 to play simultaneously
            timeline.add_track(track)

            # Generate stream
            stream_url = timeline.generate_stream()

            evidence_clips.append({
                "clip_number": idx,
                "url": stream_url,
                "similarity": match['similarity_score']
            })

            print(f"  ‚úÖ Generated: {stream_url}")

        except Exception as e:
            print(f"  ‚ùå Error: {e}")
            import traceback
            traceback.print_exc()

    if evidence_clips:
        print(f"\n‚úÖ Generated {len(evidence_clips)} evidence clips")
        print("\nüìã Evidence URLs:")
        for clip in evidence_clips:
            print(f"  {clip['clip_number']}. Score: {clip['similarity']:.3f}")
            print(f"     {clip['url']}")
    else:
        print("\n‚ùå No clips generated")
else:
    print("No high-confidence matches")

üé¨ Generating Evidence Clips

Clip 1/5:
  Similarity: 0.911
  ‚úÖ Generated: https://play.videodb.io/v1/a779fe53-bcc7-457e-a04f-4d818c031b26.m3u8

Clip 2/5:
  Similarity: 0.810
  ‚úÖ Generated: https://play.videodb.io/v1/bad0df0d-47e7-46c6-82fd-5b9d8bb84b43.m3u8

Clip 3/5:
  Similarity: 0.826
  ‚úÖ Generated: https://play.videodb.io/v1/77119c81-2119-43ca-b992-2f6a69672799.m3u8

Clip 4/5:
  Similarity: 0.746
  ‚úÖ Generated: https://play.videodb.io/v1/aefb9d23-9f41-4b62-915e-9aa9ffefcd69.m3u8

Clip 5/5:
  Similarity: 0.802
  ‚úÖ Generated: https://play.videodb.io/v1/f3617ec5-2f63-4fea-be3d-6f801734db0b.m3u8

‚úÖ Generated 5 evidence clips

üìã Evidence URLs:
  1. Score: 0.911
     https://play.videodb.io/v1/a779fe53-bcc7-457e-a04f-4d818c031b26.m3u8
  2. Score: 0.810
     https://play.videodb.io/v1/bad0df0d-47e7-46c6-82fd-5b9d8bb84b43.m3u8
  3. Score: 0.826
     https://play.videodb.io/v1/77119c81-2119-43ca-b992-2f6a69672799.m3u8
  4. Score: 0.746
     https://play.videodb.io/v1/aefb9

### Preview some evidences

In [23]:
from videodb import play_stream

if evidence_clips:
    print(f"Playing the first evidence clip:")
    evidence_preview = evidence_clips[0]['url']
else:
    print("No evidence clips available to play.")

play_stream(evidence_preview)

Playing the first evidence clip:


---

# üéØ 5. Next Steps

## üéì What You Learned

‚úÖ How to index video portfolios with scene detection  
‚úÖ Semantic search for visual similarity detection  
‚úÖ Filtering and analyzing plagiarism matches  
‚úÖ Detecting sequential matches for stronger evidence  
‚úÖ Generating side-by-side comparison clips  

## üöÄ Take It Further

1. **Scale Up**: Index your entire video portfolio
2. **Automate**: Set up scheduled monitoring
3. **Integrate**: Connect to your DMCA workflow
4. **Customize**: Adjust thresholds for your use case
5. **Deploy**: Move to production (API/Lambda)

## üìö Resources

- [VideoDB Documentation](https://docs.videodb.io)
- [Scene Indexing Guide](https://docs.videodb.io/visual-search-and-indexing-80)
- [Discord Community](https://discord.gg/py9P639jGz)
- [GitHub Examples](https://github.com/video-db/videodb-cookbook)

## üí¨ Get Help

- **Discord**: [discord.gg/py9P639jGz](https://discord.gg/py9P639jGz)
- **Email**: support@videodb.io
- **Docs**: [docs.videodb.io](https://docs.videodb.io)

## üéâ You're Done!

**Congratulations!** You've built a production-ready video plagiarism detector.

If this helped you, please:
- ‚≠ê Star the [VideoDB repository](https://github.com/video-db/videodb-python)
- üí¨ Share your results in [Discord](https://discord.gg/py9P639jGz)
- üìù Write about your experience

---

**Built with ‚ù§Ô∏è using VideoDB**

*Protect your content. Enforce your rights. Scale your monitoring.*