## VideoDB - Effortlessly Remove Inappropriate Content from Video

<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/examples/Content%20Moderation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Overview

Content moderation usually involves complex pipelines:

**VideoDB** simplifies this into a "Prompt-and-Filter" workflow.

In this tutorial, we will:
1.  **Upload** a video containing mixed content (e.g., action scenes).
2.  **Index** the video with a **"Moderator Prompt"** that strictly labels scenes as `CONTENT_SAFE` or `CONTENT_UNSAFE`.
3.  **Filter** the video by searching for the `CONTENT_SAFE` keyword.
4.  **Stitch** the safe segments together instantly to create a clean version.

## Setup


---
### Install VideoDB


In [None]:
!pip -q install videodb

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/43.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m43.3/43.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for videodb (setup.py) ... [?25l[?25hdone


#### üîë API Key
You only need your **VideoDB API Key**.
> Get your API key from [VideoDB Console](https://console.videodb.io). (Free for first 50 uploads, **No credit card required**).

In [None]:
import videodb
import os
from getpass import getpass

# Prompt user for API key securely
api_key = getpass("Please enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key

Please enter your VideoDB API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


---
## Implementation

### üåê Step 1: Connect to VideoDB

In [None]:
from videodb import connect

# Connect to VideoDB
conn = connect()
coll = conn.get_collection()

### üé• Step 2: Upload Video
We will use a clip from "Breaking Bad" that contains some intense scenes, which makes it a perfect candidate for moderation testing.

In [None]:
# Upload the video
video = coll.upload(url='https://www.youtube.com/watch?v=Xa7UaHgOGfM')
print(f"Uploaded Video ID: {video.id}")

Uploaded Video ID: m-z-019bf683-57d4-7470-8e7a-7555b5101682


Play the video

In [None]:
video.play()

### üîç Step 3: Semantic Indexing with a "Moderator Prompt"
This is the core of our solution. Instead of a generic description, we will give the AI a **Role** and a **Strict Output Format**.

We instruct the AI to analyze 5-second chunks and label them based on specific criteria (violence, guns, blood).
* If safe, it MUST output `CONTENT_SAFE`.
* If unsafe, it MUST output `CONTENT_UNSAFE`.

This deterministic labeling allows us to use simple keyword searches later.

In [None]:
from videodb import SceneExtractionType

# Define strict instructions for the AI
moderation_prompt = """
You are a Content Moderator. Analyze the visual content for inappropriate elements:
1. Violence (fighting, hitting, shooting)
2. Weapons (guns, knives)
3. Blood or Gore
4. Drug use
5. Sexual content

If ANY of these are detected, your response must start with:
"CONTENT_UNSAFE: [brief reason]"

If the scene is clean and safe, your response must start with:
"CONTENT_SAFE: [brief description]"
"""

print("Indexing video for moderation... this might take sometime")

# Index every 5 seconds to ensure granular moderation
scene_index_id = video.index_scenes(
    prompt=moderation_prompt,
    extraction_type=SceneExtractionType.time_based,
    extraction_config={
        "time": 5,      # Check every 5 seconds
        "frame_count": 3 # Look at 3 frames per segment
    }
)

print("‚úÖ Moderation Indexing Complete!")

Indexing video for moderation... this might take sometime
‚úÖ Moderation Indexing Complete!


Lets see some indexes

In [None]:
scene_indexes = video.get_scene_index(scene_index_id)

import json

# Print the first 5 scene indexes with proper JSON formatting
for i, scene in enumerate(scene_indexes[:5]):
    print(f"Scene {i+1}:\n{json.dumps(scene, indent=2)}\n")

Scene 1:
{
  "description": "CONTENT_SAFE: The images display title cards for \"Breaking Bad compilation\" with a smoky green/brown background, and one solid black screen. No inappropriate elements are depicted.",
  "end": 5.005,
  "metadata": {},
  "scene_metadata": {},
  "start": 0.0
}

Scene 2:
{
  "description": "CONTENT_SAFE: The images show two men in indoor settings, one appearing to be in a store or office, and the other in what looks like a hallway or waiting area. No inappropriate elements are detected.",
  "end": 10.01,
  "metadata": {},
  "scene_metadata": {},
  "start": 5.005
}

Scene 3:
{
  "description": "CONTENT_SAFE: The images show two different men in indoor settings. One man, wearing glasses and a striped sweater, is seen in a corridor. The other man, with distinct grey hair and eyebrows, is seen smiling in what appears to be a store. There are no inappropriate elements present.",
  "end": 15.015,
  "metadata": {},
  "scene_metadata": {},
  "start": 10.01
}

Scene 4

### üîé Step 4: Filter for Safe Content
Because we enforced the `CONTENT_SAFE` label in our prompt, we can now simply run a **Keyword Search** to find all the shots that passed the moderation check.

In [None]:
from videodb import SearchType, IndexType

# Search exclusively for our "Safe" tag
safe_results = video.search(
    query="CONTENT_SAFE",
    search_type=SearchType.keyword,
    index_type=IndexType.scene,
    scene_index_id=scene_index_id
)

safe_shots = safe_results.get_shots()

print(f"Found {len(safe_shots)} safe segments out of the total video.")

# Let's inspect the first few safe segments
for i, shot in enumerate(safe_shots[:3]):
    print(f"Segment {i+1} ({shot.start}s - {shot.end}s): {shot.text}")

Found 61 safe segments out of the total video.
Segment 1 (0.0s - 5.005s): CONTENT_SAFE: The images display title cards for "Breaking Bad compilation" with a smoky green/brown background, and one solid black screen. No inappropriate elements are depicted.
Segment 2 (5.005s - 10.01s): CONTENT_SAFE: The images show two men in indoor settings, one appearing to be in a store or office, and the other in what looks like a hallway or waiting area. No inappropriate elements are detected.
Segment 3 (10.01s - 15.015s): CONTENT_SAFE: The images show two different men in indoor settings. One man, wearing glasses and a striped sweater, is seen in a corridor. The other man, with distinct grey hair and eyebrows, is seen smiling in what appears to be a store. There are no inappropriate elements present.


### ‚úÇÔ∏è Step 5: Play the Clean Version
We have the list of "Safe" shots. Now, we just get the stream url, and also play the clean version.

In [None]:
print("Stream URL Clean Version: ", safe_results.stream_url)
safe_results.play()

Stream URL Clean Version:  https://stream.videodb.io/v3/published/manifests/9ff92994-de07-46b2-b2db-c652e22d2b5c.m3u8


### üéâ Conclusion

You just built a **Content Moderation Engine** without using any external vision APIs.

**Key Takeaways:**
1.  **Prompt Engineering:** Directing the indexing AI with strict labels (`CONTENT_SAFE`) turns unstructured video into structured data.
2.  **Keyword Search:** Simple searches become powerful filters when the data is structured.
3.  **Instant Play:** No editing time; Videodb just plays the safe bits.

Explore more at [docs.videodb.io](https://docs.videodb.io/).