<h3 align = center>How to make a simple video using FFMPEG</h3>
last update: Sept 2025<br>

Introduciton: Recently, I helped someone to make youtube videos. Originally I used a interface software but the free version has a lot of limitations. Plus, using a command-line software can be faster for certain functions (but for others interface version is better). I use AI to write the majority of the code here. At first, I thought it was easy but it turned out to be a lot more complicated than I thought. I will explain below. Though complicated, I learnt some techinical stuff about videos. <br>
And, recently, I started learning data science and learnt to use Jupyter Notebook. I find Jupyter Notebook a lot easier when I need to run a script of many different functions. <br>
For my videos, there is a narrator who tells a story. Background music, videos and images are overlaid to make the story more compelling. This tutorial serves all the purposes of my videos. You will learn some basics of making a video using <i>FFMPEG</i>, including: <br>
- [how to overlay video, music and images on the main video](#Overlaying-videos,-images-and-music)
- [how to combine different videos](#concatenating-videos)
- [how to trim a video](#Trimming-a-video)
- [how to split a video into sections](#Splitting-a-video-into-sections-and-combining-with-black-stills)
- [how to use subtitle in ass format and burn it in to a video](#Burning-in-subtitle)
- [how to add a stil mosaic](#Adding-Mosaics)
- [how to separate different audio channels](#Separating-different-audio-channels-from-a-video)
- [how to extract audio and convert to text (subtitle)](#Convert-audio-to-text-and-translate-the-subtitle-to-different-languages)
- [how to translate the subtitle to different languages](#Convert-audio-to-text-and-translate-the-subtitle-to-different-languages)
- [how to zoom in an image](#Overlaying-images)
- [how to insert a black still (or other colors) with text when introducing a topic](#Creating-a-black-still)
- [how to create end credits](#Creating-End-Credits)
- [understanding codec of a video](#Codec-of-a-video)

 
Finally, some little thing to share. If you use Mac, use Chrome to open Jupyter Notebook, don't use Safari. When you add a new cell, Safari always adds in a wrong postion. There are always issues when you change a cell from Code to Markdown. That drives my crazy. Chrome wouldn't have the same problems.


<h3 align = center>Things you need to download</h3>
The most important is <i><b>ffmpeg</b></i>, followed by <i>ffprobe</i>. <br>
Here are the python packages I use:  <br><br>
<i>re, datetime, math, json, PIL, fractions, matplotlib, tempfile, shutil, numpy, openai-whisper, deep_translator, deepl, openai, demucs, opencc, opencv-python</i> <br><br>

For my case, I installed these four with a lower python version (3.10): <i>openai-whisper</i> (for converting audio to subtitles), <i>demucs</i>(for separating different audio channels), <i>deepl</i>  and <i>openai</i> (for translation). The latest python version (3.13) is not compatible with them. Strangely, I install them sucessfully under python 3.13 in github codespace. If you use github codespace, better install all packages with Jupyter Notebook, not with the terminal. In a cell, install like this:<br><br>
<i>!pip3 install openai-whisper</i>

The code we are using here is in <b><i>video_commands.py</i></b>.


<h3 align = center> Let's begin! </h3>
The media files used in this tutorial are in the folder <i>media_files</i>. The main video is a documentary in 1944 called <i>Marines at Tarawa - Return to Guam</i>. Originally it is 40 minutes but I already trimmed it to 5 minutes. This is the main video of this tutorial called <i><b>marines_5min.mov</i></b>. If you want to trim it yourself, you can download it here: <br>
<a href ="https://archive.org/details/publicmovies212/Marines_at_Tarawa_Return_to_Guam.webm">https://archive.org/details/publicmovies212/Marines_at_Tarawa_Return_to_Guam.webm</a>

### Trimming a video


Say, we want to extract <i>11:00 to 16:00</i> from the video Marines_at_Tarawa_Return_to_Guam.mp4 (the original 40-min video) and rename the output to <i>marines_5min.mov</i>, here is the ffmpeg command (you can run on the command line directly):<br><br>
<i>ffmpeg -y -ss 11:00 -to 16:00 -i Marines_at_Tarawa_Return_to_Guam.mp4 -c:v libx264 -preset veryfast -crf 18 -c:a copy  marines_5min.mov</i> <br><br>
Pay attention to the following in the above ffmpeg command:<br><br>
<b><i>-c:v libx264 -preset veryfast -crf 18</i></b>  <br><br>
HEVC 264(libx264) is a very common compression format and compatible with most video players. An efficient compression format can give you a similar video quality and smaller file size.(This is something new I learn. I used to think the larger the size, the better the quality!) <br><br>
And the actual quality depends on <b><i>crf</i></b> and <b><i>preset</i></b>. <b><i>crf</i></b> ranges from 0 - 50. The smaller the value, the better the quality. <b><i>preset</i></b> has values of "veryslow","slow","medium","fast","veryfast" and "ultrafast". These two together determine the compression time and file size. If you choose a small value of <b><i>crf</i></b> and "veryslow" for <b><i>preset</i></b>, you end up having a video a lot larger than the original.<br><br>

Usually, <b><i> -crf 18(up to 23) -preset fast</b></i> is good enough. You can do an experiment by re-encoding a high resolution video with different <i>crf</i> and <i>preset</i> values and see how the quality changes.<br><br>
Below is the python code to extract 11:00-16:00 from <i>Marines_at_Tarawa_Return_to_Guam.mp4</i> and name it to <i>marines_5min.mov</i>



In [None]:
from video_commands import * 
main_video = "Marines_at_Tarawa_Return_to_Guam.mp4"
start_time = "11:00" #you can enter as 1)an interger which is second, 2)mm:ss, or 3)hh:mm:ss
end_time = "16:00" 
output_file = "marines_5min.mov"
trim_video(main_video, start_time, end_time, output_file,crf=23, preset = "fast")

## Codec of a video
Now we have produced the main video, let's take a look at the codec(technical details). It has got video and audio parts:<br>
ffprobe code for video: <br>
<i>ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate,pix_fmt -of csv=p=0 marines_5min.mov</i><br>
ffmpeg code for audio: <br>
<i>ffprobe -v error -select_streams a:0 -show_entries stream=codec_name,sample_rate,channels -of csv=p=0 marines_5min.mov</i>
<br><br>
Now let's run the python code below:<br>

In [None]:
file_list = ["marines_5min.mov"]
print_media_info(file_list)

Here is the output on the screen when you run the python code above:<br>

---- media_files/marines_5min.mov ---- <br>
video info:  ffprobe -v error -select_streams v:0 -show_entries stream=codec_name,width,height,r_frame_rate,pix_fmt -of csv=p=0 <br> media_files/marines_5min.mov<br>
Video: h264,556,412,yuv420p,25/1 <br>
audio info:  ffprobe -v error -select_streams a:0 -show_entries stream=codec_name,sample_rate,channels -of csv=p=0 <br>media_files/marines_5min.mov<br>
Audio: aac,44100,2<br>
video start and end time:0.000000,300.000000<br>
audio start and end time:0.000000,300.009002<br>

For the video part: <br>
<i>h264</i> is the compression format I just talked about<br>
<i>556,412 </i>are the width and height of the video, respectively<br>
<i>yuv420p</i> is the Chroma Subsampling Scheme. I don't know the importance of this.<br>
<i>25/1</i> is the frame rate<br>

For the audio part: <br>
<i>aac</i> is the most commom audio codec<br>
<i>44100</i>(44.1kHz) is  the standard sample rate for audio CDs<br>
<i>2</i> is the number of audio channels<br>

When concatenating videos of different codec, they have to be re-encoded to the same codec first or concatenation would fail. We will get back to this later. <br>
The most important thing to note here is the final lines: <br>
<i>video start and end time:0.000000,300.000000<br>
audio start and end time:0.000000,300.009002</i><br>

You see the video and audio don't have the same duration. This would cause asynchronization when concatenating different videos(even though the difference is < 0.1s, the asynchronization is noticeable). You would see the video and audio don't match. I struggled a lot to understand this. I used to think asynchronization was due to faults in the <i>ffmpeg</i> command but indeed it is not. You have to trim either the video or the audio so that the duration match. We will do this later.<br>


## Splitting a video into sections and combining with black stills
For my video, I need to insert a few black stills to separate the video in different sections. The black still looks like this:
<p align="center">
    <img src="https://helen-poon.github.io/ffmpeg_video/black_screen.png" 
         alt="Black screen separator" 
         style="max-width: 30%; height: auto;">
</p>


Say, the first black still appears at 1:30, the second one 2:30 and the third one 4:00. Then we have to split the video into different sections: <br>
1) 0:00 - 1:30 <br>
2) 1:30 - 2:30 <br>
3) 2:30 - 4:00 <br>
4) 4:00 - the end <br>

First, let's split the main video into 4 sections, namely <i>video1.mov</i>, <i>video2.mov</i>, <i>video3.mov</i> and <i>video4.mov</i><br><br>
<i>ffmpeg -y -ss 0.000 -i marines_5min.mov -t 90.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video1.mov</i><br>
<i> ffmpeg -y -ss 90.000 -i marines_5min.mov -t 60.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video2.mov</i><br>
<i>ffmpeg -y -ss 150.000 -i marines_5min.mov -t 90.000 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video3.mov </i><br>
<i>ffmpeg -y -ss 240.000 -i marines_5min.mov -t 60.009 -c:v libx264 -preset fast -c:a pcm_s16le -ar 48000 -ac 2 -movflags +faststart video4.mov
 </i><br><br>

 Below is the python code to implement the above. Let's run it.<br>


In [None]:
main_video = "marines_5min.mov"
the_end = get_video_length(main_video) #we retrive the exact end time in hh:mm:ss by calling this function
sections = ["0:00 - 1:30","1:30 - 2:30","2:30 - 4:00",f"4:00-{the_end}"]
#Below are the output files. The length of this list has to match the length of the above list "sections"
output_files = ["video1.mov","video2.mov","video3.mov","video4.mov"] 
split_video(main_video, sections, output_files, audio_codec="wav")

## Creating a black still
Now we are going to create some black stills to be inserted into the video. The main video has 4 different sections, so we insert 3 black stills. Say, the text are "Section 1", "Section 2", and "Section 3", each last for 5 seconds. We convert them to videos with names <i>black1.mov</i>,<i>black2.mov</i> and <i>black3.mov</i>. We first create a blank black still called <i>temp_black.mp4</i> with the same dimensions(556x412) as the main video:<br><br>

<i>ffmpeg -y -f lavfi -i color=c=black:s=556x412:r=25.0:d=5 -f lavfi -i anullsrc=r=44100:cl=stereo -shortest -c:v libx264 -pix_fmt yuv420p -c:a pcm_s16le temp_black.mp4</i><br>

Then python creates an ass file (a subtitle format, this is a text file) with the text called <i>temp_sub.ass</i>. Finally <i>temp_sub.ass</i> is burnt in to <i>temp_black.mp4</i> and the final output is what we want.

<i>ffmpeg -y -i temp_black.mp4 -vf ass=temp_sub.ass -c:v libx264 -pix_fmt yuv420p -c:a copy black1.mov</i>

<b>Note that if the numbers(like <i>1</i> in <i>"Section 1"</i>) appear strange (maybe bigger than the text) in <i>black1.mov</i>, you need to install some fonts. I don't remember how to do it, but I do remember I asked chatgpt.<br></b>


In [None]:
main_video = "marines_5min.mov"
text = ["Section 1","Section 2","Section 3"]
output_files = ["black1.mov","black2.mov","black3.mov"]
duration = 5 # in seconds
for txt,output_names in zip(text,output_files):
    output_file = create_black_still(main_video,txt,duration,output_names,font_name="Arial",font_size=72,
    font_color="&H00FFFFFF" #Solid White
)

The black stills are produced with the same codec as the main video, we can check them:

In [None]:
file_list = ["marines_5min.mov","video1.mov","video2.mov","video3.mov",
            "black1.mov","black2.mov","black3.mov"]
print_media_info(file_list)

You see the video codecs are the same, but the audio codec is <i>pcm_s16l</i> for the segmented videos and black stills. This format is loseless and preserves the quality. But for the final audio format, we will convert to <i>aac</i>. Also note that for the black stills, the audio duration and video duration do not match. We are going to deal with this with the following function <i>reencode_to_match()</i>
. It first extracts the codec from the main video, then convert other videos with the same codec, finally trim the video or audio duration so that they match. If a video has no audio stream, this function would add a silent one. <br>
<b>Note that <i>reencode_to_match()</i> is actually unnecessary because it is incorporated in the next function <i>combine_video()</i></b>. I am just showing what reencoding does here. This process may take > 10 mins. To speed things up, change the <i>crf</i> (40) and <i>preset</i> (ultrafast) values.<br>

Here is the break down of <i>reencode_to_match()</i>:<br>
1)Get main video's codec: <br>
<i>ffprobe -v error -select_streams v:0 -show_entries stream=width,height,r_frame_rate,duration -of json marines_5min.mov
</i><br>

The codec is retrieved and put into the next step.<br>
            
2)Re-encode the main video to audio = PCM: <br>
<i>ffmpeg -y -i media_files/marines_5min.mov -r 25.0 -c:v libx264 -c:a pcm_s16le -ar 44100 -ac 2 -preset fast -crf 18 marines_5min_tmp.mov</i><br><br>
3)Rescale video to match the main video's dimension (yes, the main video is rescaled to the main video, it does the same to all other videos):<br>
<i>ffmpeg -y -i marines_5min_tmp.mov -vf scale=556:412,pad=556:412:0:0:black -c:v libx264 -preset fast -crf 18 -c:a copy marines_5min_reencoded.mov</i><br><br>
4)Pad/truncate audio to match video duration:<br>
<i>ffmpeg -y -i marines_5min_reencoded.mov -c:v copy -af apad,atrim=0:300.000000 marines_5min_reencoded_padded.mov</i><br>

So the final product we want is the last one - <i>marines_5min_reencoded_padded.mov</i>. The same procedure applies to other videos.



In [None]:
main_video = "marines_5min.mov"
# The main video serves as the "standard" codec for others to follow
list_to_reencode = [main_video,"video1.mov","video2.mov","video3.mov","video4.mov",
            "black1.mov","black2.mov","black3.mov"]
reencoded_file_names, reencoded_file_dict =reencode_to_match(main_video, list_to_reencode,crf="23", preset="fast")
"""
reencoded_file_names are the new file names and reencoded_file_dict is a dictionary with old video names as the key and
reencoded videos as the value
"""


Now we have the reencoded video name, and we can check the codec again. The reencoded video has a default new name. If the original name is <i>xxx.mov </i>, then the new name:  <i>xxx_reencoded_padded.mov </i>


In [None]:
list_of_videos = ["video1_reencoded_padded.mov","video2_reencoded_padded.mov","video3_reencoded_padded.mov",
"video4_reencoded_padded.mov","black1_reencoded_padded.mov","black2_reencoded_padded.mov",
                 "black3_reencoded_padded.mov"]
print_media_info(list_of_videos)

You see now the audio and video duration, and the codecs are all the same. We can combine them safely using <i>combine_video()</i>. It first reencode all the videos to convert them to the same codec, then combine them. The code creates a temporary folder (using the package <i>tempfile</i>) and put all the intermediate files there (finally they are deleted). <b>Remember the above function <i>reencode_to_match()</i> is implemented in <i>combine_video()</i>.</b><br>
Now let's combine the videos:<br>

### concatenating videos

In [None]:
#input the video list in order of concatenation
video_list = ["video1.mov","black1.mov","video2.mov","black2.mov","video3.mov","black3.mov","video4.mov"]
#primary index is the video which serves as a "model" for reencoding. All other videos will have the same codec as this one. 
#Note the first video has a primary_index "1", not "0"
combine_video(video_list, primary_index=1, output_file="marines_5min_new.mov", crf=23, preset="fast")

The output <i>marines_5min_new.mov</i> is what we want. This is our new main video. Now you run the code below and you can see the black stills at 1:30, 2:35 and 4:10, each lasts for 5s.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Jou_m0InGRQ" frameborder="0" allowfullscreen></iframe>

## Separating different audio channels from a video
Sometimes the video have different audio channels, i.e. background music and narration. This would bring difficulties when we convert the audio to subtitles. We need to separate the channels first. <i>demucs</i> is a very powerful tool to use. You end up having a file called <i>vocals.wav</i>, this is the file for the subtitle. This process may take around half an hour. I have difficulty implementing <i>demucs</i> using python. I am giving up. Just run the following in the command line. <b>Remember to switch to the python environment where <i>whisper</i> is in to run this.</b> This process takes a 10-20 mins.

copy and paste the below to the terminal. Don't run it here! The output is in default folders <b><i>separated/htdemucs/marines_5min_new</i></b>

<i><b>demucs marines_5min_new.mov</i></b>

Now in <i>separated/htdemucs/marines_5min_new</i>, we find <i>vocals.wav</i>. We use this file to extract the subtitles. <br><br>
<b> If you have a "clean" audio channel, you don't need to run <i>demucs</i>, just input your video to the following step. Remember this step also requires <i>whisper</i> environment. And when running the code below, you may need to comment out some packages in <i>video_commands.py</i> which are not found under <i>whisper</i> environment.
</b>

In [None]:
#input can be a video or audio. If video, make sure the audio channel is "clean"
video = "separated/htdemucs/marines_5min_new/vocals.mp3" #the original file is vocals.wav. 
outputsrt = "subtitle.srt"
voice_to_srt(video, outputsrt)

If there are any mistakes in the subtitle file <i>subtitle.srt</i>, just correct it and we will proceed to the next step.

## Convert audio to text and translate the subtitle to different languages
If you use <i>deepl</i>, you need an api key. <i>Deepl</i> provides a free one. I provide an empty file <i>api_deepl.txt</i> for you to input the key. If you don't want to use it, then just use <i>google translate</i>.<br>
Here are the codes for some common languages used as input for python:<br><br>

<i>english = en<br>
simplified chinese = "zh-CN" for google, "zh" for deepl<br>
traditional chinese = "zh-TW" for google, "zh" for deepl<br>
japanese = ja<br>
korea = ko<br>
german = de<br>
french = fr<br></i>

In [None]:
subtitle_file = "subtitle.srt" # the file we extract from the step above voice_to_srt()
outputfile = "subtitle_google_ja.srt"
target_lang = "ja"
source_lang = "en" #not inputting this would set the value to auto-detect
translate_srt_google(subtitle_file,outputfile, target_lang, source_lang)

If you have deepl api, then here it is. When you open the output file <i>subtitle_deepl_ja.srt</i>, you see <i>Deepl</i> always get the last sentence untranslated. You have to do this manually. If you translate to Chinese, the output can be traditional or simplified Chinese, or a mixture of both. I convert the output to simplfied Chinese in the code.<br>
<b>You may need to switch your python version to a lower one when running <i>deepl</i>.</b>

In [None]:
subtitle_file = "subtitle.srt"
outputfile = "subtitle_google_ja.srt"
target_lang = "ja" 
source_lang = "en" #not inputting this would set the value to auto-detect
deepl_translate_srt(subtitle_file, outputfile,target_lang)

In order to make proofreading easier, you can combine the subtile in the original language and the translated language into one file. The file looks like this:<br><br>
1<br>
00:00:00.000 --> 00:00:03.399<br>
about to our rectum pivions instead of machine guns.<br>
マシンガンの代わりに直腸のピビオンを使う。<br><br>

2<br>
00:00:06.560 --> 00:00:09.500<br>
They got a few of us, all we got them.<br>
彼らは私たち数人を、私たちは彼ら全員を捕まえた。<br><br>


In [None]:
file1_path = "subtitle.srt" #input file, make this the original language
file2_path = "subtitle_deepl_ja.srt" #input file, make this the translated langauge
output_path = "subtitle_en_ja.srt" #output file which combines two input files
combine_srt(file1_path, file2_path, output_path)

After proofreading, you can separate them. Enter 1 for the langauge that appears first in <i>subtitle_en_ja.srt</i>, and 2 for the second.

In [None]:
input_file = "subtitle_en_ja.srt"
language_choice = 2
output_file = "subtitle_ja.srt"
separate_srt_languages(input_file, language_choice, output_file)

We will just leave the subtitle here right at the moment. When the final video is done. We will convert it to ass format and burn in to the video.

## Overlaying videos, images and music

<b>In the last step of this section, we will overlay all the videos, images and music to the main video in one step. But now I am breaking down each step for tutorial purpose. You can directly go to [overlaying everything](#Overlaying-everything) directly and skip the details here.</b><br><br>
Now we can proceed to overlaying other videos, images and music on the main video. We can even zoom in an image. Let's proceed with overlaying videos on the main video first.<br>
<h5>1.Overlaying a video</h5>
For the overlay video, we need to remove the audio first. The first step is to check whether there is an audio stream: <br> 
<i>ffprobe -i media_files/Aumun_Background.mov -show_streams -select_streams a -loglevel error</i><br> <br>

Next, the audio is removed from the video to make it muted, or you can choose a <i>volume factor</i> to lower the audio volume:<br> 
<i>ffmpeg -y -i media_files/Aumun_Background.mov -filter:a <b>volume=0</b> -c:v copy Aumun_Background_muted.mov</i><br>

The default name for the muted video is <i>xxx_muted.mov</i>, or <i>xxx_0.2.mov</i>, if you choose to lower the audio to 20% of the original. This product would be used in the next step.<br>

Finally, the audio is saved as a separate wav file <i>Aumun_Background.wav</i>:<br>
<i>ffmpeg -i media_files/Aumun_Background.mov -vn -acodec pcm_s16le -ar 44100 -ac 2 ./Aumun_Background.wav</i><br><br>
If you don't want this wav file, you can discard it.<br>

In [None]:
video = "media_files/Aumun_Background.mov"
outfolder = "."
volume_factor = 0
separate_audio_video(video,outfolder,volume_factor)

<i>Aumun_Background_muted.mov</i> is what we want. Next, do the re-encoding to match the audio and video codec to the main video. At the end, you can see they have the same codec.

In [None]:
#this is our new main video, with black stills
main_video = "marines_5min_new.mov"
list_to_reencode = [main_video,"Aumun_Background_muted.mov"]
reencoded_file_names, reencoded_file_dict =reencode_to_match(main_video, list_to_reencode,crf="23", preset="fast")
print(reencoded_file_names)
print_media_info(reencoded_file_names)


Finally, we can do the overlay. Note that the function <i>overlay_video_img_music()</i> actually incorporates <i>reencode_to_match()</i>. You just have to input the raw media files.<br>
What you need is a parameter list as a list of list:<br>

<i>para_list = [["media_files/Aumun_Background.mov","0:30","0:45","0:05",3]]</i><br><br>
We want to overlay <i>media_files/Aumun_Background.mov"(1st parameter)</i> from <i>time=30 seconds(2nd parameter)</i> to <i>time=45 seconds(3rd parameter)</i> of the main video. <i>"media_files/Aumun_Background.mov</i> last for 29 seconds and we want to extract the part of this video starting from <i>5s(4th parameter)</i>. That means we overlay (5-20)s of <i>media_files/Aumun_Background.mov</i> to (30-45)s of the main video. The fade out time is <i>3s(final parameter)</i>.<br>

For the time parameters(2nd,3rd and 4th parameters), they can be input as an integer, or as "mm:ss" or "hh:mm:ss". Fade-in and fade-out should be in seconds(an integer)<br><br>

Now, let's do it:<br>


In [None]:
main_video = "marines_5min_new.mov"
para_list = [["media_files/Aumun_Background.mov","0:30","0:45","0:05",3]]
outputfile = "marines_5min_video_overlay.mov"
overlay_video_img_music(main_video, para_list, outputfile, crf=23, preset="fast")

Now you can run the code below to see the newly created file <i>marines_5min_video_overlay.mov</i>. Go to time = 0:30 and you can see the overlay until 0:45, with fade-out starting 5 seconds before.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/PtlQhHtLjKs" frameborder="0" allowfullscreen></iframe>

### Overlaying images
If we just want to overlay a still image, things are a lot easier. If we want to have a zoom-in effect, we need to turn the image into a video first. But the first step is to decide the focus of the zoom in. This step is actally more complicated with <i>ffmpeg </i> than interface software. For the following image, suppose we want to zoom in to the boat. We need to rescale the image(<i>media_files/boat.jpg</i>) to match the main video and then add a grid to get the coordinate.
<p align="center">
    <img src="https://helen-poon.github.io/ffmpeg_video/media_files/boat.jpg" 
         alt="Black screen separator" 
         style="max-width: 50%; height: auto;">
</p>


In [None]:
main_video = "marines_5min_new.mov" #we need to rescale the input image to match the size of the main video
video_width,video_height = get_media_dimensions(main_video) 
image_name = "media_files/boat.jpg" #the input image
#the rescaled image which matches the dimensions of the main video. This file is used as the overlay
output_filename = "boat_rescaled.jpg" 
make_rescaled_image(video_width, video_height, image_name, output_filename) #produce the rescaled image
input_file = output_filename
output_file = "boat_rescaled_grid.jpg"
#add a grid to the rescaled image to retrieve pixel positions, setting an interval = 50 means the distance between the grid is 50 pixels
add_numbered_grid(input_file, output_file,video_width,video_height,interval=50,line_color="red",number_color="yellow") 


Here is the newly created <i>boat_rescaled_grid.jpg</i>. <b>Top left corner has x = 0, y = 0 </b>. Since the size of the grid is 50 pixels. We can see the boat has a coordinate x = 275, y=275.
<p align="center">
    <img src="https://helen-poon.github.io/ffmpeg_video/boat_rescaled_grid.jpg" 
         alt="Black screen separator" 
         style="max-width: 50%; height: auto;">
</p>

Now we can produce the video with the zoom in. Below we want to make a video out of <i>boat_rescaled.jpg</i>. The video has a duration of 5 seconds. Starting from 2 second of the overlay, zoom in starts to the position x = 275, y = 275 (where the boat lies). The magnification is 2. The output file name is <i>boat_video.mov</i>.
We have to input the above as a list of list.

In [None]:
x_pos,y_pos = 275,275
duration = 5
zoom_start = 2
zoom_max = 2
out_file = "boat_video.mov"
image_list = [["boat_rescaled.jpg", x_pos, y_pos, duration, zoom_start, zoom_max, out_file]]
create_zoom(image_list, main_video, crf=18, preset="fast")

Now run the following and you can see the video of 5s with a zoom-in to the boat starting at time = 2s. With this video, you can overlay it to the main video like we just did.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/sDrC1RSrT2o" frameborder="0" allowfullscreen></iframe>

If we want to overlay a still image, things are a lot easier. We call the function <i>overlay_video_img_music()</i> again. Remember we have to enter a parameter list as a list of list, this time in the format for an image. We want to overlay <i>boat_rescaled.jpg(1st parameter)</i> from <i>1:50-1:55(2nd and 3rd parameter)</i> on the main video, with a <i>fade-out = 2s (last parameter)</i>.<br>
<i>para_list = [["boat_rescaled.jpg","1:50","1:55","2"]]</i> <br>



 

In [None]:
para_list = [["boat_rescaled.jpg","1:50","1:55","2"]]
output_file = "marines_5min_img_overlay.mov"
overlay_video_img_music(main_video, para_list, output_file, crf=18, preset="fast")

Now run the following and you see the overlay image from 1:50 - 1:55, with fadeout starting at 1:53

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/EZ3qWLFfw2M" frameborder="0" allowfullscreen></iframe>

### Overlaying music
Finally, we overlay music. Here is the parameter list for music:<br><br>

<i>end_time = get_video_length(main_video)</i><br>
<i>para_list = [["media_files/Mozart.wav","0:30",f"{end_time}",3,4,2]]</i><br><br>
We want to overlay <i>"media_files/Mozart.wav"(1st parameter)</i> from <i>time=30 seconds(2nd parameter)</i> to <i>the end(3rd parameter)</i> of the main video (the end time is retrieved via <i>get_video_length()</i>), with a <i>fade-in</i> and <i>fade-out</i> time = <i>3(4th parameter)</i> and <i>4 seconds(5th parameter)</i> respectively. And we want to make the music volume <i>2 times </i> the original.<br>

If the music has a shorter duration than the overlay time, it would repeat itself. For the overlay time parameters(2nd and 3rd parameters), they can be input as an integer, or as "mm:ss" or "hh:mm:ss". Fade-in and fade-out should be in seconds(an integer)<br><br>

In [None]:
main_video = "marines_5min_new.mov"
end_time = get_video_length(main_video)
para_list = [["media_files/Mozart.wav","0:30",f"{end_time}",3,4,2]]
outputfile = "marines_5min_music.mov"
overlay_video_img_music(main_video, para_list, outputfile, crf=23, preset="fast")

Now run the following and you can hear the music starting from time = 30s to the end 

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/ane1G4ckd_Q" frameborder="0" allowfullscreen></iframe>

## Overlaying everything
Now we come to the most critical part - overlaying everthing. We have a main video <i>marines_5min_new.mov</i>
Here are what we want to overlay:<br><br>
1. overlay the video <i>media_files/Aumun_Background.mov</i> from 0:20-0:30 of the main video. <i>media_files/Aumun_Background.mov</i> should be extracted from time=5s. No fade-out time.<br>
2. overlay the image-turn-video <i>boat_video.mov </i> from 3:30-3:35 of the main video. We already make <i>boat_video.mov </i> 5s.<br>
3. overlay the video <i>media_files/Mountain_Forest.mov</i> from 4:00-4:15 of the main video. <i>media_files/Mountain_Forest.mov</i> should be extracted from the start. Fade-out time = 3s<br>
4. Overlay the image <i>media_files/bridge.jpg</i> from 0:10-0:15 of the main video. Fade-out = 1s.<br>
5. Overlay the image <i>media_files/kochi.jpg</i> from 2:00-2:15 of the main video. Fade-out = 5s.<br>
6. Overlay the image <i>media_files/cat.jpg</i> from 3:00-3:03 of the main video. No fade-out.<br>
7. Overlay the music <i>media_files/Mozart.wav</i> from 0:10-3:00 of the main video. Fade-in and fade-out = 5s. Music volume 3 times the original.<br>
8. Overlay the music <i>media_files/Rachmaninoff.wav</i> from 3:10-the end of the main video. Neither fade-in nor fade-out. Original volume.<br><br>

When we define the parameter list, we first have to categorize them, then put them in order of appearance. Like the above, I put video first, then image and finally music. They are already in order of appearance. The parameter list looks like this:<br><br>
<i>
end_time = get_video_length(main_video)<br>

para_list = <br>
<i>[["media_files/Aumun_Background.mov","0:20","0:30","0:05",0],</i>&nbsp;&nbsp;&nbsp;&nbsp;#0:20-0:30 of the main video, extract 0:05-0:15 of the overaly, no fade-out 
           <i>  ["boat_video.mov","3:30","3:35","0:00",0], </i>&nbsp;&nbsp;&nbsp;&nbsp;#3:30-3:35 of the main video, extract 0:00-0:05 of the overaly, no fade-out<br>
         <i>    ["media_files/Mountain_Forest.mov","4:00","4:15","0:00",3],</i>&nbsp;&nbsp;&nbsp;&nbsp; #4:00-4:15 of the main video, extract 0:00-0:15 of the overaly, fade-out = 3s<br>
          <i>    ["media_files/kochi.jpg","1:50","1:55","2"],</i>&nbsp;&nbsp;&nbsp;&nbsp;#1:50-1:55 of the main video, fade-out = 2s<br>
          <i>    ["media_files/bridge.jpg","2:00","2:15","5"],</i>&nbsp;&nbsp;&nbsp;&nbsp;#2:00-2:15 of the main video, fade-out = 5s<br>
          <i>    ["media_files/cat.jpg","3:00","3:03","0"],</i>&nbsp;&nbsp;&nbsp;&nbsp;#3:00-3:03 of the main video, no fade-out<br>
           <i>   ["media_files/Mozart.wav","0:10","3:00",5,5,3],</i>&nbsp;&nbsp;&nbsp;&nbsp;#0:10-3:00 of the main video, fade in = fade-out = 5s,3 times audio volume<br>
           <i>   ["media_files/Rachmaninoff.wav","3:10",f"{end_time}",0,0,1]]</i>&nbsp;&nbsp;&nbsp;&nbsp;#3:10-the end of the main video, no fadein fadeout,original audio volume<br>


</i>

Let's create the final product.<br>

In [None]:
main_video = "marines_5min_new.mov"
end_time = get_video_length(main_video)
para_list = [["media_files/Aumun_Background.mov","0:20","0:30","0:05",0],
["boat_video.mov","3:30","3:35","0:00",0],
["media_files/Mountain_Forest.mov","4:00","4:15","0:00",3],
["media_files/kochi.jpg","1:50","1:55","2"],
["media_files/bridge.jpg","2:00","2:15","5"],
["media_files/cat.jpg","3:00","3:03","0"],
["media_files/Mozart.wav","0:10","3:00",5,5,3],
["media_files/Rachmaninoff.wav","3:10",f"{end_time}",0,0,1]]
output_file = "marines_5min_all_overlay.mov"
overlay_video_img_music(main_video, para_list, output_file, crf=23, preset="fast")

Now run the following and you can see:
1) 0:10 - 3:00 music
2) 0:20 - 0:30 an overlay video with no fadeout
3) 1:50 - 1:55 an overlay image with fadeout = 2s
4) 2:00 - 2:15 an overlay image with fadeout = 5s
5) 3:00 - 3:03 an overlay image with no fadeout
6) 3:10 - the end music
7) 3:30 - 3:35 an overlay video which zooms in to the boat
8) 4:00 - 4:15 an overlay video with fadeout = 3s

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/GYVAkAYAVwo" frameborder="0" allowfullscreen></iframe>

## Adding Mosaics
If you want to add mosaics, again <i>ffmpeg</i> is more complicaed than interface software. And you can only do still mosaic.<br>
First you extract a screen shot in which you want to add the mosaic, then add a grid to get the coordinates. In this example, we want to add a mosaic to the top left corner from 10-15s, and another mosaic to the middle from 1:30-1:35s.

In [None]:
#extract screen shot at 1:32 and name it to mosaic1.png, another at 0:10 and name it to mosiac2.png
main_video = "marines_5min_all_overlay.mov"
time_and_name_list = [["1:32","mosaic1.png"],["0:10","mosaic2.png"]] #list of list
extract_frames(main_video, time_and_name_list, fast_seek=True)
input_file = "mosaic1.png"
output_file = "mosaic1_grid.png"
video_width,video_height = get_media_dimensions(main_video)
add_numbered_grid(input_file, output_file,video_width,video_height,interval=50,line_color="red",number_color="black")
input_file = "mosaic2.png"
output_file = "mosaic2_grid.png"
add_numbered_grid(input_file, output_file,video_width,video_height,interval=50,line_color="red",number_color="black")

Now open <i>mosaic1_grid.png</i> and <i>mosaic2_grid.png</i> to get the top left and top right corner. Remember the top left has x=0,y=0.
For the first mosaic, we take the position of the top left as x=25,y=25. I want to overlay a box of width=100(pixels), height=50(pixels). The pixelization is 15 (heavily blurred). The time of the mosaic is 10-15s. Then the parameter list:<br><br>
<i>[25, 25, 100, 50, 15, 10, 15]</i> <br><br>
The parameters in order are: x position(25), y position(25), box width(100), box height(50), pixelation(15), start time(10s) of mosaic, end time(15s) of mosaic.<br><br>
Now let's add two mosaics.<br>

In [None]:
main_video = "marines_5min_all_overlay.mov"
mosaic_list=[
[25, 25, 100, 50, 15, 10, 15],
[130, 175, 270, 120, 15, "1:30", "1:35"]]
output_video = "marines_5min_all_overlay_mosaic.mov"
apply_mosaics(main_video, output_video, mosaic_list)

Now run the following to see the mosaic at 0:10 and 1:30

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/kVxSkbWuxhU" frameborder="0" allowfullscreen></iframe>

## Burning in subtitle
Now, the video is done. When can get back to the subtitle we create at the beginning. If you want soft subtitle and upload it to youtube, then srt file is fine. If you want more complicated subtitle with styling, then ass file would serve the purpose. Now let's convert the subtitle from srt to ass format:

In [None]:
srt_file = "subtitle.srt" #input file
ass_file = "subtitle.ass" #output file
main_video = "marines_5min_all_overlay_mosaic.mov"
#the ass file has a resolution which is the same as the dimension of the main video
video_width, video_height = get_media_dimensions(main_video)
srt_to_ass(srt_file, ass_file, video_width, video_height, fontname="Arial", fontsize=24)

I usally open the video with <i>Elmedia</i> and drag in the ass subtitle to see if any modification is needed. If you want to change the font style and font size, it is easy. If you want to lower the position in which the subtitle appears, then you have to go to the code (<i>video_commands.py</i>), look for the function <i>srt_to_ass()</i>, and look up the line:<br>
<i>event = f"Dialogue: 0,{start_ass},{end_ass},Default,,0,0,0,,{text}"</i><br><br>
<i>0,0,0</i> represents the left, right and vertical margin, respectively. If you want to move up the subtitle, then increase the vertical margin (the thrid "0"). The first two "0" sets the subtitle in the middle.<br>


If the ass file is fixed. We can burn in to the main video. This function <i>burn_subtitles()</i> works for both <i>srt</i> and <i>ass</i> files. <b> Remember if some subtitle text (especially numbers) appears strange in the video, you have to install some fonts. Ask chatgpt for help.</b>

In [None]:
subtitle_file = "subtitle.ass"
main_video = "marines_5min_all_overlay_mosaic.mov"
output_video = "marines_subtitle.mov"
#the font here is only for srt files
burn_subtitles(main_video, subtitle_file, output_video, font="Arial", crf =23, preset="fast")

Now run the following and you can see the video with the subtitle embedded (hard subtitle)

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/e9DdwhEgCkc" frameborder="0" allowfullscreen></iframe>

## Creating End Credits
For my own video, there is still one last part - credits at the end. I first produce a template called <i>end_credits_template.ass</i>, with some fixed lines which appear at the end. Then I have a list of people contributing to the video called <i>contributors.txt</i>. The file looks like this:<br><br>
<i>
Director: XXX<br>
Production Crew: AAA<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ABC<br>
Music: XYZ</i><br><br>

I set the above as rolling subtitles. If you want something else, just modify the code (look for <i>end_credits_ass()</i>). The following first creates an ass file using the <i>end_credits_template.ass</i> and <i>contributors.txt</i>. Then the ass file is embedded to a video with a black background. There is background music for this end-credits video. <br><br>
Here are the parameters:<br><br>
Width and height are taken from the main video. The first line of <i>contributors.txt</i> appears at time = 1.4s(<i>first_start</i>). Each line lasts for 6s(<i>each_duration_s</i>)(Every line rolls up). For <i>line_offset</i> and <i>fallback_duration_s</i>, I don't really remember what they are ...The ass file has a resolution which matches the size of the main video.<br><br>
<i>end_credits_ass(template_file, txt_file, output_file, width, height,
                   first_start="0:00:01.40", each_duration_s=6.0,
                   line_offset_s=0.60,
                   fallback_duration_s=4.0)</i>
<br><br>
Next, create the video by embedding the above ass subtitle. Enter a background music as <i>song_file</i> with a default <i>volume</i> of 1.0. Fadein and fadeout are for the background music. <i>end_second</i> is the time between the end of the last credit line and the end of the video.<br><br>
<i>
create_ending_film(ass_file, song_file, output_file, width, height,
                       volume=1.0, end_second=5, fadein=5, fadeout=10,
                       bg_video=None,video_fadeout=4)

</i><br>
Now let's run the code.<br><br>

In [None]:
template_file = "end_credits_template.ass" # a template used everytime, txt_file below would be inserted
txt_file = "contributors.txt" #insert the content of this to the template above
output_file = "end_credits.ass" # This includes the above two files
width, height = get_media_dimensions(main_video)
#create an ass file
end_credits_ass(template_file, txt_file, output_file, width, height,
                   first_start="0:00:01.40", each_duration_s=6.0,
                   line_offset_s=0.60,
                   fallback_duration_s=4.0)
#embed the ass file to the video
ass_file = output_file
song_file = "media_files/Mozart.wav"
output_file = "end_credits_video.mov"
create_ending_film(ass_file, song_file, output_file, width, height,
                       volume=1.0, end_second=5, fadein=5, fadeout=10,
                       bg_video=None,video_fadeout=4)

Now run the following cell to see the <i>end_credits_video.mov</i> (The end credit video with the ass file embedded)

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/7EEhY8OQPOw" frameborder="0" allowfullscreen></iframe>

Now it is really the final step, combining the main video with the end-credits video. If you have a beginning video, you can also combine it here. We call the function <i>combine_video()</i> again.


In [None]:
main_video = "marines_subtitle.mov"
video_list = [main_video,"end_credits_video.mov"]
output_file = "marines_5min_final.mov"
combine_video(video_list, primary_index=1, output_file=output_file, crf=23, preset="fast")

Finally, we are done! Now you can make your own video.

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/_lC6NdYWs2g" frameborder="0" allowfullscreen></iframe>