### Embedding Subtitles to Videos Automatically using Python, OpenAI Whisper and FFMPEG ###

**Steps to be followed**

1. Take a video URL of interest
2. Download and save the video using PyTube library
3. Print some other useful information about the video. 
4. Extract and save Audio as a .WAV file from the Video file using FFMPEG. 
5. Invoke local OpenAI Whisper model to get Transcriptions using faster-whisper.
6. Preprocess transcripts for time format adjustments. 
7. Use FFMPEG to embed subtitles in the video. 

**Flow**

![Visual Flow](subtitles.png)

In [2]:
import os
import pytube

In [3]:
url = "https://www.youtube.com/watch?v=DQacCB9tDaw"
yt = pytube.YouTube(url)

yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download()


'/Users/vaibhavpandey/Desktop/generativegeek/add-subs-to-videos/Introducing GPT-4o.mp4'

In [4]:
print(f"Title: {yt.title}")
print(f"Views: {yt.views}")
print(f"Description: {yt.description}")
print(f"Length: {yt.length} seconds")


Title: Introducing GPT-4o
Views: 2847440
Description: OpenAI Spring Update – streamed live on Monday, May 13, 2024. 

Introducing GPT-4o, updates to ChatGPT, and more.
Length: 1573 seconds


In [5]:
#rename the mp4 to the title of teh video without a file extension
os.rename(yt.title + ".mp4", yt.title)

In [9]:
# extract audio from the video
import time
import math 
import ffmpeg

def extract_audio(input_file):
    extracted_audio = f"audio-{input_file}.wav"
    stream = ffmpeg.input(input_file)
    stream = ffmpeg.output(stream, extracted_audio)
    ffmpeg.run(stream, overwrite_output=True)
    return extracted_audio
    

In [10]:
audio_extract = extract_audio(yt.title)

ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino -

In [12]:
# transcribe the audio
from faster_whisper import WhisperModel
def transcribe(audio):

    model = WhisperModel("small")
    segments, info = model.transcribe(audio)
    language = info[0]
    print(f" Transcription Language: {language}")
    segments = list(segments) # this is where the transcribe happens

    for segment in segments: 
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
    return language, segments

In [13]:
%%time 

language, segments = transcribe(audio_extract)



 Transcription Language: en
[0.00s -> 20.96s]  Hi everyone, thank you, thank you, it's great to have you here today.
[20.96s -> 26.88s]  Today I'm going to talk about three things, that's it, we will start with why it's so
[26.88s -> 32.80s]  important to us to have a product that we can make freely available and broadly available to
[32.80s -> 39.52s]  everyone and we're always trying to find out ways to reduce friction so everyone can use
[39.52s -> 46.40s]  Chagivity wherever they are. So today we'll be releasing the desktop version of Chagivity
[46.40s -> 54.56s]  and the refreshed UI that makes it simpler to use much more natural as well. But the big news today
[54.56s -> 62.80s]  is that we are launching our new flagship model and we are calling it GPT-4O. The special thing
[62.80s -> 70.88s]  about GPT-4O is that it brings GPT-4 level intelligence to everyone including our free users.
[71.84s -> 77.68s]  We'll be showing some live demos today to show the full extent of the capab

In [None]:
# SubRip(SRT)
# Subtitle Index : 0, 1, 2, //
# Timecode : Start and end markers. HH:MM:SS,sss format. 
# Text : The subtitle text.

In [14]:
# this is a helper function that takes time in seconds and 
# converts it into HH:MM:SS,sss format for the SRT subtitle files. 



def format_time_for_srt(seconds):
    hours = math.floor(seconds / 3600)
    seconds %= 3600
    minutes = math.floor(seconds / 60)
    seconds %=60
    milliseconds = round((seconds - math.floor(seconds)) * 1000)
    seconds = math.floor(seconds)
    formatted_time = f"{hours :02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"

    return formatted_time

In [18]:
def generate_subtitle_file(input_file, language, segments):
    subtitle_file = f"sub-{input_file}.{language}.srt"
    text = ""
    for index, segment in enumerate(segments):
        segment_start = format_time_for_srt(segment.start)
        segment_end = format_time_for_srt(segment.end)

        text += f"{str(index + 1)}\n"
        text += f"{segment_start} --> {segment_end}\n"
        text += f"{segment.text}\n\n"
    
    f = open(subtitle_file, "w")
    f.write(text)
    f.close()

    return subtitle_file


In [19]:
subtitle_file = generate_subtitle_file(yt.title, language, segments)

In [20]:
def add_subtitle_to_video(input_file, subtitle_file, subtitle_language):
    video_input_stream = ffmpeg.input(input_file)
    subtitle_input_stream = ffmpeg.input(subtitle_file)
    output_video = f"output-{input_file}-{subtitle_language}.mp4"
    subtitle_track_tile = subtitle_file.replace(".srt","")
    stream = ffmpeg.output(video_input_stream, output_video,
                           vf = f"subtitles={subtitle_file}")
    ffmpeg.run(stream, overwrite_output=True)

In [21]:
language

'en'

In [22]:
add_subtitle_to_video(yt.title, subtitle_file, language)

ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/7.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino -