## Beep Curse Words
<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/examples/Beep%20Curse%20Words.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

VideoDB's Editor SDK makes it easy to personalize content to meet users' requirements. If users prefer not to include curse words in their content, VideoDB allows for these words to be either removed or replaced with a sound overlay such as beep sound.

This task, typically complex for video editors, can be accomplished with just **a few lines of code** using VideoDB.

This technique can also serve as a valuable **Content Moderation** component for any social content platform, ensuring that content meets the preferences and standards of its audience.

Let's dive in!

In [None]:
!pip -q install -U videodb

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/43.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m43.3/43.3 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for videodb (setup.py) ... [?25l[?25hdone


### Prerequisites
Ensure you have VideoDB installed in your environment. If not, simply run `!pip install videodb` in your terminal.

You'll also need a `VideoDB API key`, which can be obtained from the VideoDB console.

In [None]:
# create a new connection with your API key
import videodb
import os
from getpass import getpass

# Prompt user for API key securely
api_key = getpass("Please enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key

from videodb import connect, play_stream
conn = connect()
coll = conn.get_collection()

Please enter your VideoDB API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


### Source Content

For this tutorial, let's take the Joe Rogan clip, where he is trying to trick siri into using curse words ü§£

In [None]:
# Joe rogan video clip
video = coll.upload(url='https://www.youtube.com/watch?v=7MV6tUCUd-c')
print("Video uploaded:", video.id)

Video uploaded: m-z-019b8e29-2663-76c1-bcf0-424f99679391


In [None]:
#watch the original video
video.play()

In [None]:
# index spoken content in the video
video.index_spoken_words()

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [00:00<00:00, 436.66it/s]


### Create `beep` Asset

We have a sample beep sound in this folder, `beep.wav`. For those looking to add a more playful or unique touch, replacing the beep with alternative sound effects, such as a quack or any other sound, can make the content more engaging and fun.

In [None]:
# upload beep sound - This is just a sample, you can replace it with quack or any other sound effect.
beep = conn.upload(file_path="beep.wav")

In [None]:
# Import Editor SDK components
from videodb.editor import VideoAsset, AudioAsset, Timeline, Track, Clip

In [None]:
# Create audio asset from beep sound
beep_asset = AudioAsset(id=beep.id)

### Moderation
To ensure appropriate content management, it's necessary to have a method for identifying profanity and applying a predefined overlay to censor it. In this tutorial, we've included a list of curse words. Feel free to customize this list according to your requirements.


In [None]:
curse_words_list = ['shit', 'ass', 'shity' 'fuck', 'motherfucker','damn', 'fucking', 'motherfuker', 'bitch', 'hole', 'mother']

In [None]:
# let's review transcript
print(video.get_transcript_text())

transcript = video.get_transcript()
print(transcript)

Do you have a lot of home like Siri and Alexa's at your house? No, I don't trust those. That thing. Or is that that you. You can make Siri say, oh, yeah. You can make Siri say, yeah. People think I was joking about that. Let me show you how to do that. Watch this, folks. Here we go. What is the definition? Taco bell. That's a mistake. What's the definition of mother? Come on, you piece of shit. There's no need for that late change. Sorry. I'm sorry about that. What's the definition of mother? As a noun, it means a woman in relation to child or children to whom she has given birth. Do you want to hear the next one? Yes, as a verb, it means bring up a child with care and affection. Do you want to hear the last one? Yes, as a verb, it means give birth to. Oh, my God. Sir, you. You took it from me. Yeah, they. They. They. They removed it. You know what they removed? Apple. Come on. It was funny. How did they leave it in there and then take it out? Did they take out the dead body one? Remem

### Find Curse Words
We'll use few NLP techniques to identify all variations of any offensive words, eliminating the need to manually find and include each form. Additionally, by analyzing the transcript, you can gain insights into how these sounds are transcribed, acknowledging the possibility of errors.

In [None]:
#install spacy
!pip -q install spacy

In [None]:
#install dataset english core
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m12.8/12.8 MB[0m [31m116.8 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m‚úî Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m‚ö† Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [None]:
# load the english corpus
import spacy
import re
nlp = spacy.load("en_core_web_sm")

In [None]:
def get_root_word(word):
    """
    This function convert each word into its root word
    """
    try:
        #clean punctuations
        cleaned_word = re.sub(r'[^\w\s]', '', word)

        # Process the sentence
        doc = nlp(cleaned_word)

        # Lemmatize the word
        lemmatized_word = [token.lemma_ for token in doc][0]  # Assuming single word input

        return lemmatized_word
    except Exception as e:
        print(f"some issue with lemma for the word {word}")
        return word

### Create Fresh Timeline

Let's create a timeline using the `Track` and `Clip` pattern. Add the video clip to the main track, then loop through the transcript to add beep overlays wherever curse words are detected.

In [None]:
# Create a new Timeline
timeline = Timeline(conn)

# Create main track with video
main_track = Track()
video_asset = VideoAsset(id=video.id)
video_clip = Clip(asset=video_asset, duration=float(video.length))
main_track.add_clip(0, video_clip)
timeline.add_track(main_track)

# Create overlay track for beep sounds
beep_track = Track()

for word in transcript:
    text = word.get('text')
    if text not in ['-']:
        root_word = get_root_word(text)
        if root_word in curse_words_list:
            beep_start_time = float(word.get('start'))
            beep_end_time = float(word.get('end'))
            beep_duration = beep_end_time - beep_start_time

            # Add beep clip at this timestamp with increased volume
            print(f"beep the word: {text}, {beep_start_time}:{beep_end_time}")
            beep_clip = Clip(
                asset=AudioAsset(id=beep.id, start=0, volume=2.0),
                duration=beep_duration
            )
            beep_track.add_clip(beep_start_time-0.25, beep_clip)
timeline.add_track(beep_track)
stream_url = timeline.generate_stream()

beep the word: mother?, 34.65:35.05
beep the word: shit., 36.73:37.13
beep the word: mother?, 44.73:45.13
beep the word: motherfucker., 82.93:83.61
beep the word: motherfucker., 105.41:106.29
beep the word: motherfucker., 111.72:112.44
beep the word: mother, 113.56:113.84
beep the word: motherfucker., 119.92:120.6
beep the word: mother, 138.94:139.26
beep the word: motherfucker,, 149.5:150.42
beep the word: shit, 157.1:157.3
beep the word: mother, 195.74:196.1
beep the word: motherfucker., 204.38:205.18
beep the word: motherfucker, 225.03:225.669


### Review and Share Your Moderated Video
Finally, watch and share your new stream:

In [None]:
from videodb import play_stream

stream_url = timeline.generate_stream()
play_stream(stream_url)

### The Real Power of Programmable Streams
If you have videos pre-uploaded and indexed, running this beep pipeline is in real-time. So, based on your users' choices or your platform's policy, you can use information from spoken content to automatically moderate.