# Dazbo's YouTube and Video Demos

## Overview

This notebook forms the first part of a walkthrough series.

The overall series covers:

1. Starting with an idea. Here, the goal is to work with vidoes, which could be on YouTube. We want to be able to download videos, extract audio, transcribe, translate, and potentially summarise the content.
1. Experimenting on this idea, using a Jupyter notebook, with Python.
1. Trying a few libraries and a couple of classical AI models. 
1. Building a solution that makes use of Google Gemini multiomodal GenAI.
1. Turning the notebook into a web application, using Streamlit.
1. Packaging the application as a container.
1. Finally, hosting the application on Google Cloud's serverless Cloud Run service.

The code and notebooks are intended to be supplemented by these walkthroughs:

1. [Downloading YouTube Videos, Extracting Audio, and Generating Transcripts with Python and Jupyter Notebooks](https://medium.com/python-in-plain-english/downloading-youtube-videos-extracting-audio-and-generating-transcripts-with-python-and-jupyter-c3068f82bbe0)
1. [YouTube Video Downloader with Generative AI: Run Anywhere, Transcribe and Translate](https://python.plainenglish.io/youtube-video-downloader-with-generative-ai-and-python-run-anywhere-transcribe-and-translate-dec2e593dd58)
1. [Building and Running an AI YouTube and Video Processing as a Python Streamlit Web Application, on Serverless Google Cloud Run](https://medium.com/google-cloud/running-ai-youtube-and-video-processing-as-a-python-streamlit-web-application-and-hosting-on-748aae8e54b4)

Additionally, you will find supporting READMEs and scripts in my [GitHub repo](https://github.com/derailed-dash/youtube-and-video).

## This Notebook

Examples of how to work with YouTube videos using Python. Here I'll demonstrate:

- How to [download videos and extract audio](#downloading-videos-and-extracting-audio)
- How to [transcribe audio to text using a speech-to-text API](#extracting-audio-using-python-speech-recognition)
- How to [extract existing transcripts and translate](#extract-existing-transcripts-from-videos)

**To run this notebook, first execute the cells in the [Setup](#Setup) section, as described below.** Then you can experiment with any of the subsequent cells.

A few useful notes:

- The source for this notebook source lives in my GitHub repo, <a href="https://github.com/derailed-dash/youtube-and-video" target="_blank">Youtube-and-Video</a>.
- Check out further guidance - including tips on how to run the notebook, in the project's `README.md`.
- For example, you could...
  - Run the notebook locally, in your own Jupyter environment.
  - Run the notebook in a cloud-based Jupyter environment, with no setup required on your part! For example, with **Google Colab**: <br><br><a href="https://colab.research.google.com/github/derailed-dash/youtube-and-video/blob/main/src/notebooks/youtube-demos.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Google Colab"/></a><br><br>It looks like this:<br><br><img src="static/images/collab-view.png" width="640px"></img>
- For more ways to run Jupyter Notebooks, check out [my guide](https://medium.com/python-in-plain-english/five-ways-to-run-jupyter-labs-and-notebooks-23209f71e5c0).


## Setup

### Packages

First, let's install any dependent packages:

In [3]:
%pip install --upgrade --no-cache-dir python-dotenv dazbo-commons pytubefix moviepy yt_dlp

Note: you may need to restart the kernel to use updated packages.


In [4]:
import IPython
from IPython.display import display
from IPython.core.display import Markdown

import logging
import re
import io
import sys
from pathlib import Path
from dataclasses import dataclass
import dazbo_commons as dc
from dotenv import load_dotenv

In [5]:
# Colab requires an older version of Ipykernel
if not "google.colab" in sys.modules:
    pass
    %pip install --upgrade --no-cache-dir ipykernel
    

Note: you may need to restart the kernel to use updated packages.


### Logging

Now we'll setup logging. Here I'm using coloured logging from my [dazbo-commons](https://pypi.org/project/dazbo-commons/) package. Feel free to change the logging level.

In [6]:
# Setup logging
APP_NAME="dazbo-yt-demos"
logger = dc.retrieve_console_logger(APP_NAME)
logger.setLevel(logging.DEBUG)
logger.info("Logger initialised.")
logger.debug("DEBUG level logging enabled.")

[32m19:22:03.003:dazbo-yt-demos - INF: Logger initialised.[39m
[34m19:22:03.004:dazbo-yt-demos - DBG: DEBUG level logging enabled.[39m


### File Locations

Here we initialise some file path locations, e.g. an output folder.

In [7]:
locations = dc.get_locations(APP_NAME)
for attribute, value in vars(locations).items():
    logger.debug(f"{attribute}: {value}")

[34m19:22:03.013:dazbo-yt-demos - DBG: script_name: dazbo-yt-demos[39m
[34m19:22:03.015:dazbo-yt-demos - DBG: script_dir: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos[39m
[34m19:22:03.016:dazbo-yt-demos - DBG: input_dir: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/input[39m
[34m19:22:03.016:dazbo-yt-demos - DBG: output_dir: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output[39m
[34m19:22:03.017:dazbo-yt-demos - DBG: input_file: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/input/input.txt[39m


### Utility Functions

In [8]:
def clean_filename(filename):
    """ Create a clean filename by removing unallowed characters. """
    pattern = r'[^a-zA-Z0-9._\s-]'
    return  re.sub(pattern, '_', filename)

### Install Additional Packages You May Need

**Bear in mind that `nodejs` is required by the `pytubefix` library, to prevent this appliction being detected as a bot.**

You can run the cell below, but it may not work on your environment. So you might need to install packages manually, e.g.

<table>
  <col style="width:10%">
  <col style="width:45%">
  <col style="width:45%"> <!-- Adjust as needed or remove for auto-sizing -->
  <tr>
    <th>Package</th>
    <th>Purpose</th>
    <th>Install Command</th>
  </tr>
  <tr>
    <td><a href="https://ffmpeg.org/">ffmpeg</a></td>
    <td>A useful utility for video and audio format conversion. Many Python libraries use it. It will not generally be used by this notebook, but if you run into errors requiring ffmpeg, you will want to run this section.</td>
    <td>Linux: <code>sudo apt install ffmpeg</code><br>Windows: <code>winget install ffmpeg</code></td>
  </tr>
  <tr>
    <td><a href="https://xiph.org/flac/download.html">FLAC</a></td>
    <td>The Python <code>speech_recognition</code> library uses the FLAC utility to convert audio files into a format that can be processed for speech recognition.</td>
    <td>Linux: <code>sudo apt install flac</code><br>Windows: Download the latest</td>
  </tr>
    <tr>
    <td>nodejs</td>
    <td>The pytubefix library can automatically create YouTube PO tokens, but this relies on nodejs being installed.</td>
    <td>Linux: <code>sudo apt install nodejs</code><br>Windows: <code>winget install node.js</code></td>
  </tr>
</table>


In [9]:
import os
import platform
import subprocess

def run_command(command):
    """Run a shell command and print its output in real-time."""
    process = subprocess.Popen(
        command, 
        shell=True, 
        stdout=subprocess.PIPE, 
        stderr=subprocess.PIPE
    )
    
    # Read and print the output line by line
    if process.stdout is not None:
        for line in iter(process.stdout.readline, b''):
            logger.info(line.decode().strip())
        process.stdout.close()
        
    process.wait()
    
def install_software(appname: str):
    os_name = platform.system()
    logger.info(f"Installing {appname} on {os_name}...")
    
    # Mapping operating systems to their respective installation commands
    command_map = {
        "Windows": f"winget install {appname} --silent --no-upgrade",
        "Linux": f"apt -qq -y install {appname}",
        "Darwin": f"brew install {appname}"
    }
    command = command_map.get(os_name)
    if command:
        run_command(command)
        logger.info(f"Done.")
    else:
        logger.error(f"Unsupported operating system: {os_name}")

def check_installed(app_exec: str) -> bool:    
    appname, *arg = app_exec.split()
    arg = " ".join(arg)
    logger.debug(f"Checking if {appname} is installed")
    
    try:
        output = subprocess.check_output([appname, arg], stderr=subprocess.STDOUT)
        logger.debug(f"{appname} version: {output.decode().strip()}")
        logger.debug(f"{appname} is already installed.")
        return True
    except (subprocess.CalledProcessError, FileNotFoundError):
        logger.debug(f"{appname} is not installed or absent from path.")
        
    return False

apps = [ ("ffmpeg", "ffmpeg -version"),
         ("flac", "flac --version"),
         ("nodejs" , "node --version"),]
          
for app_install, app_exec in apps:
    if not check_installed(app_exec):
        install_software(app_install)


[34m19:22:03.038:dazbo-yt-demos - DBG: Checking if ffmpeg is installed[39m
[34m19:22:03.285:dazbo-yt-demos - DBG: ffmpeg version: ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with Apple clang version 16.0.0 (clang-1600.0.26.6)
configuration: --prefix=/usr/local/Cellar/ffmpeg/7.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetyp

Now we'll check `ffmpeg` has been installed.

On Windows, this may not have been added to your path. If so, you can check your default install location using `winget --info`, and then add it to your path.

In [10]:
logger.info("Note that installed applications may not be immediately available after first installing.\n" \
            "It may be necessary to relaunch the notebook environment.")

!ffmpeg -version

[32m19:22:03.455:dazbo-yt-demos - INF: Note that installed applications may not be immediately available after first installing.
It may be necessary to relaunch the notebook environment.[39m


ffmpeg version 7.1.1 Copyright (c) 2000-2025 the FFmpeg developers
built with Apple clang version 16.0.0 (clang-1600.0.26.6)
configuration: --prefix=/usr/local/Cellar/ffmpeg/7.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enab

### Videos to Work With

We start by defining a list of videos to test our application with, along with a function that takes a full YouTube URL and returns just the id portion.

I’ve used these videos because…

- The first is the fantastic [Burning Bridges](https://www.youtube.com/watch?v=udRAIF6MOm8) by Sigrid. The video has no embedded transcript.
- The second is the beautiful song [I Believe](https://www.youtube.com/watch?v=CiTn4j7gVvY) by Melissa Hollick. It’s one of my favourite songs of all time. When I get a migraine, I turn off the lights, and listen to this to feel better! And for those who enjoy gaming, this song is the end titles to the amazing Wolfenstein: New Order game. This video has an embedded transcript.
- Then we have a short [Jim Carey speech](https://www.youtube.com/watch?v=nLgHNu2N3JU), which gives us dialog without music or other ambient noise. It has an embedded transcript.
- And finally, a [Ukrainian song](https://www.youtube.com/watch?v=d4N82wPpdg8) from Eurovision 2024, by Jerry Heil and Alyona Alyona. This gives us an opportunity to test translation. It also has an embedded transcript.

In [11]:
# Videos to download
urls = [
    "https://www.youtube.com/watch?v=ZbXErZeW-Zo&t=1148s",  # sunscreen review
#     "https://www.youtube.com/watch?v=CiTn4j7gVvY",  # Melissa Hollick - I Believe (English)
#     "https://www.youtube.com/watch?v=nLgHNu2N3JU",  # Jim Carey - Motivational speech (English)
#     "https://www.youtube.com/watch?v=d4N82wPpdg8",  # Jerry Heil & Alyona Alyona - Teresa & Maria (Ukrainian)
#     "https://www.youtube.com/shorts/41iWg91yFv0",   # Rick Astley short
 ]

def get_video_id(url: str) -> str:
    """ Return the video ID, which is the part after 'v=' """
    pattern = r'(?:v=|\/)([0-9A-Za-z_-]{11}).*'
    match = re.search(pattern, url)
    if match:
        return match.group(1)
    return None

## Downloading Videos and Extracting Audio

Here I'll demonstrate a few different Python libraries for working with YouTube videos.

### Option 1 - With PyTubeFix

Here I'll use the [pytubefix](https://github.com/JuanBindez/pytubefix) library to download YouTube videos, and then to download mp3 audio-only streams as files.

This library is a community-maintained fork of `pytube`. It was created to provide quick fixes for issues that the official pytube library faced, particularly when YouTube's updates break `pytube`.

Pros:

- The library is very easy to use.
- We can work with video, audio, channels, playlists, and even search and filter.
- It is [well documented](https://pytubefix.readthedocs.io/en/latest/).
- It can be used from the command line, with its simple CLI.
- It is VERY FAST!

Cons:

- Does not offer some of the more sophisticated capabilities that are offered by `yt_dlp`.
- It does not appear to set mp3 headers correctly. The mp3s are actually encoded as mp4a. I don't think this is a problem, but it's worth bearing in mind!

In [18]:

from pytubefix import YouTube
from pytubefix.cli import on_progress

output_locn = f"{locations.output_dir}/pytubefix"

def process_yt_videos():
    for i, url in enumerate(urls):
        logger.info(f"Downloads progress: {i+1}/{len(urls)}")

        try:
            # YouTube now requires the PO token to be passed in the requet
            # The library will automatically generate a PO token, 
            # but nodejs must be installed to do so.
            yt = YouTube(url, on_progress_callback=on_progress, client="WEB")
            logger.info(f"Getting: {yt.title}")
            video_stream = yt.streams.get_highest_resolution()
            if not video_stream:
                raise Exception("Stream not available.")
            
            # YouTube resource titles may contain special characters which 
            # can't be used when saving the file. So we need to clean the filename.
            cleaned = clean_filename(yt.title)
            
            video_output = f"{output_locn}/{cleaned}.mp4"
            logger.info(f"Downloading video {cleaned}.mp4 ...")
            video_stream.download(output_path=output_locn, filename=f"{cleaned}.mp4")
        
            logger.info(f"Creating audio...")
            audio_stream = yt.streams.get_audio_only()
            audio_stream.download(output_path=output_locn, filename=f"{cleaned}.mp3")
            
            logger.info("Done")
            
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.error(f"The cause was: {e}") 
            
    logger.info(f"Downloads finished. See files in {output_locn}.")
    
process_yt_videos()


[32m08:25:24.724:dazbo-yt-demos - INF: Downloads progress: 1/1[39m
[32m08:25:32.564:dazbo-yt-demos - INF: Getting: I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown![39m
[32m08:25:36.108:dazbo-yt-demos - INF: Downloading video I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown_.mp4 ...[39m
[31m08:25:36.177:dazbo-yt-demos - ERR: Error processing URL 'https://www.youtube.com/watch?v=ZbXErZeW-Zo&t=1148s'.[39m
[31m08:25:36.178:dazbo-yt-demos - ERR: The cause was: HTTP Error 403: Forbidden[39m
[32m08:25:36.179:dazbo-yt-demos - INF: Downloads finished. See files in /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/pytubefix.[39m


### Option 2 - PyTubeFix and MoviePy

Here I'm doing the same as before, but I'm extracting the audio using the Python [MoviePy](https://github.com/Zulko/moviepy) library. This is a powerful video and audio editing library. 

Pros:

- We can extract audio as mp3 with correct headers.
- It is [well documented](https://zulko.github.io/moviepy/).
- It is powerful.

Cons:

- It is slower to extract the audio than using `pytubefix` alone.

In [13]:

from pytubefix import YouTube
from pytubefix.cli import on_progress
from moviepy import VideoFileClip

output_locn = f"{locations.output_dir}/pytubefix_with_moviepy"

def process_yt_videos():
    for i, url in enumerate(urls):
        logger.info(f"Downloads progress: {i+1}/{len(urls)}")

        try:
            yt = YouTube(url, on_progress_callback=on_progress, client="WEB")
            logger.info(f"Getting: {yt.title}")
            video_stream = yt.streams.get_highest_resolution()
            if not video_stream:
                raise Exception("Stream not available.")
            
            # YouTube resource titles may contain special characters which 
            # can't be used when saving the file. So we need to clean the filename.
            cleaned = clean_filename(yt.title)

            video_output = f"{output_locn}/{cleaned}.mp4"
            logger.info(f"Downloading video {cleaned}.mp4 ...")
            video_stream.download(output_path=output_locn, filename=f"{cleaned}.mp4")
        
            logger.info(f"Creating audio...")
            video_clip = VideoFileClip(video_output) # purely to give us access to methods
            assert video_clip.audio is not None
            video_clip.audio.write_audiofile(f"{output_locn}/{cleaned}.mp3")
            video_clip.close()
            
            logger.info("Done")
            
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.debug(f"The cause was: {e}") 
            
    logger.info(f"Downloads finished. See files in {output_locn}.")
    
process_yt_videos()

[32m19:22:29.175:dazbo-yt-demos - INF: Downloads progress: 1/1[39m
[32m19:22:31.719:dazbo-yt-demos - INF: Getting: I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown![39m
[32m19:22:33.705:dazbo-yt-demos - INF: Downloading video I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown_.mp4 ...[39m


 ↳ |████████████████████████████████████████████| 100.0%

[32m19:22:45.021:dazbo-yt-demos - INF: Creating audio...[39m


MoviePy - Writing audio in /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/pytubefix_with_moviepy/I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown_.mp3


[32m19:23:19.642:dazbo-yt-demos - INF: Done[39m                       
[32m19:23:19.643:dazbo-yt-demos - INF: Downloads finished. See files in /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/pytubefix_with_moviepy.[39m


MoviePy - Done.


### Option 3 - With YT_DLP

I wanted to try the other popular YouTube package: [yt-dlp](https://pypi.org/project/yt-dlp/). The [repo](https://github.com/yt-dlp/yt-dlp) is a fork of the now unmaintained `youtube-dl`. 

Pros:

- It is very powerful, with far more options and features than `pytubefix`.
- It can be installed as a standalone command-line executable, or as a pip-installable Python package.
- Sets mp3 headers properly!
- It has some powerful and network proxy settings. This can be useful if, for example, you are trying to download videos that are geo-restricted.

Cons:

- It is more complicated to use.
- The documentation is complex and somewhat hard to understand. And there's no real Python-specific documentation.
- It depends on having ffmpeg installed for some use cases.
- It is significantly slower that `pytubefix` for performing video download and audio extraction.


In [14]:
import yt_dlp

output_locn = f"{locations.output_dir}/yt_dlp"

def process_yt_videos():
    for i, url in enumerate(urls):
        logger.info(f"Downloads progress: {i+1}/{len(urls)}")

        try:
            # Options for downloading the video
            video_opts = {
                'format': 'best',  # Download the best quality video
                'outtmpl': f'{output_locn}/%(title)s.%(ext)s',  # Save video in output directory
            }
            
            # Download the video
            with yt_dlp.YoutubeDL(video_opts) as ydl:
                logger.info("Downloading video...")
                ydl.download([url])
            
            # Options for extracting audio and saving as MP3
            audio_opts = {
                'format': 'bestaudio',  # Download the best quality audio
                'outtmpl': f'{output_locn}/%(title)s.%(ext)s',  # Save audio in output directory
                'postprocessors': [{
                    'key': 'FFmpegExtractAudio',
                    'preferredcodec': 'mp3',
                }],
            }
            
            # Download and extract audio
            with yt_dlp.YoutubeDL(audio_opts) as ydl:
                logger.info("Extracting and saving audio as MP3...")
                ydl.download([url])
            
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.debug(f"The cause was: {e}") 
            
    logger.info(f"Downloads finished. Check out files at {output_locn}.")
    
process_yt_videos()

[32m19:23:19.654:dazbo-yt-demos - INF: Downloads progress: 1/1[39m
[32m19:23:19.765:dazbo-yt-demos - INF: Downloading video...[39m


[youtube] Extracting URL: https://www.youtube.com/watch?v=ZbXErZeW-Zo&t=1148s
[youtube] ZbXErZeW-Zo: Downloading webpage
[youtube] ZbXErZeW-Zo: Downloading tv client config
[youtube] ZbXErZeW-Zo: Downloading tv player API JSON
[youtube] ZbXErZeW-Zo: Downloading ios player API JSON
[youtube] ZbXErZeW-Zo: Downloading m3u8 information
[info] ZbXErZeW-Zo: Downloading 1 format(s): 18
[download] /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp/I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown!.mp4 has already been downloaded
[download] 100% of  172.58MiB


[32m19:23:23.809:dazbo-yt-demos - INF: Extracting and saving audio as MP3...[39m


[youtube] Extracting URL: https://www.youtube.com/watch?v=ZbXErZeW-Zo&t=1148s
[youtube] ZbXErZeW-Zo: Downloading webpage
[youtube] ZbXErZeW-Zo: Downloading tv client config
[youtube] ZbXErZeW-Zo: Downloading tv player API JSON
[youtube] ZbXErZeW-Zo: Downloading ios player API JSON
[youtube] ZbXErZeW-Zo: Downloading m3u8 information
[info] ZbXErZeW-Zo: Downloading 1 format(s): 251-8
[download] Destination: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp/I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown!.webm
[download] 100% of   45.25MiB in 00:00:03 at 13.80MiB/s    
[ExtractAudio] Destination: /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp/I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown!.mp3
Deleting original file /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp/I Tested The

[32m19:24:17.642:dazbo-yt-demos - INF: Downloads finished. Check out files at /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp.[39m


### Conclusion

If you:

- Want to just download the videos and/or audio in the simplest and fastest way possible, then go with [Option 1](#option-1---with-pytubefix).
- Want to download the videos and/or audio and then carry out some sort of manipulation or conversion of the media, go with [Option 2](#option-2---pytubefix-and-moviepy).
- If you want out-of-the-box proxy configuration, e.g. to bypass geo-restrictions, then go with [Option 3](#option-3---with-yt_dlp).

## Transcribing Audio to Text

### Extracting Audio Using Python Speech Recognition

The Python `speech_recognition` package has a number of built in `Recognizer` implementations. Here I'm using the [Google Web Speech API](https://wicg.github.io/speech-api/) `Recognizer`, which has its default API key hard coded into the Python `speech_recognition` library. It is free, but has some limitations. For example, it only allows a max of 60s segments.

In [15]:
%pip install --upgrade --no-cache-dir pydub SpeechRecognition ffmpeg-python

Note: you may need to restart the kernel to use updated packages.


In [16]:
import speech_recognition as sr
from pydub import AudioSegment
import ffmpeg

In [17]:
def divide_chunks(sound, segment_size_secs=60):
    """ Split audio file into 60s chunks """
    
    segment_size_ms = segment_size_secs*1000
    for start_idx in range(0, len(sound), segment_size_ms):
        # Yield a chunk of audio data from start_idx to start_idx + segment_size_ms
        yield sound[start_idx:start_idx + segment_size_ms]

def transcribe_audio():
    """ Use Speech Recognition API with Google Web Speech API
    to convert audio dialog to text """
    recogniser = sr.Recognizer()        
    for mp3_file in Path(output_locn).glob(f'*.mp3'):
        transcribe_audio_file(recogniser, mp3_file)

def transcribe_audio_file(recogniser, mp3_file, language="en-US"):
    logger.info(f"Converting {mp3_file}...")
    try:
        audio = AudioSegment.from_file(mp3_file)
        # If AudioSegment is not working - e.g. due to broken mp3 headers - we
        # can use ffmpeg as a workaround. However, it's a lot slower.
        # ffmpeg.input(mp3_file).output(wav_file).run() # Convert with ffmpeg
        # logger.info(f"Successfully converted {mp3_file} to {wav_file}.")
        # audio = AudioSegment.from_wav(wav_file) # Read the audio

        segments = list(divide_chunks(audio, segment_size_secs=60)) # split the wav into 60s segments     
        transcription_extracts = {}
        for index, chunk in enumerate(segments):
            with io.BytesIO() as wav_io:
                chunk.export(wav_io, format='wav')
                wav_io.seek(0)  # Move to the start of the BytesIO object before reading from it
                        
                with sr.AudioFile(wav_io) as source:
                    audio_data = recogniser.record(source)

                try:
                    extracted = recogniser.recognize_google(audio_data, language=language)
                    logger.debug(f"Chunk {index} extracted.")
                    transcription_extracts[index] = extracted
                except sr.UnknownValueError:
                        # Log the unknown value error and continue
                    logger.warning(f"Chunk {index}: Could not understand the audio. Maybe it was empty.")
            
        logger.info("Extract:")
        for idx, extract in transcription_extracts.items():
            logger.info(f"{idx}: {extract}")

    except ffmpeg.Error as e:
        logger.error(f"FFmpeg failed to convert {mp3_file}: {str(e)}")
    except Exception as e:
        logger.error("Unexpected error.", exc_info=True)
            
transcribe_audio()
logger.info("Done")

[32m19:25:21.873:dazbo-yt-demos - INF: Converting /Users/yingy/AI projects/youtube transcribe summary/youtube-and-video/src/notebooks/dazbo-yt-demos/output/yt_dlp/I Tested The Top 35 Asian Sunscreens... Asian SPF Showdown!.mp3...[39m
[34m19:25:48.000:dazbo-yt-demos - DBG: Chunk 0 extracted.[39m
[34m19:26:04.972:dazbo-yt-demos - DBG: Chunk 1 extracted.[39m
[34m19:26:21.730:dazbo-yt-demos - DBG: Chunk 2 extracted.[39m
[34m19:26:40.974:dazbo-yt-demos - DBG: Chunk 3 extracted.[39m
[34m19:26:57.365:dazbo-yt-demos - DBG: Chunk 4 extracted.[39m
[34m19:27:16.002:dazbo-yt-demos - DBG: Chunk 5 extracted.[39m
[34m19:27:37.806:dazbo-yt-demos - DBG: Chunk 6 extracted.[39m
[34m19:27:59.420:dazbo-yt-demos - DBG: Chunk 7 extracted.[39m
[34m19:28:16.761:dazbo-yt-demos - DBG: Chunk 8 extracted.[39m
[34m19:28:33.828:dazbo-yt-demos - DBG: Chunk 9 extracted.[39m
[34m19:28:49.487:dazbo-yt-demos - DBG: Chunk 10 extracted.[39m
[34m19:29:06.858:dazbo-yt-demos - DBG: Chunk 11 extracted.

KeyboardInterrupt: 

### Results

It's a bit flakey!  Sometimes it runs, but sometimes the API returns errors and fails to run.

When the API does run...

- It fails to transcribe the Ukrainian song. Not too surprising, since this API does not detect language automatically, and defaults to recognising English.
- It does an amazing job with the Jim Carey speech.
- It is partially successful when transcribing songs.

### Conclusions

It's not great!  It's pretty good if there's no background sound or ambient noise.  But it's pretty poor when working with songs. And it seems unreliable.

### Transcribing Ukrainian

Let's try and transcribe from the Ukrainian song:

In [None]:
def transcribe_ua():
    recogniser = sr.Recognizer()
    for mp3_file in Path(output_locn).glob(f'alyona*.mp3'):
        transcribe_audio_file(recogniser, mp3_file, language="uk-UA")
        
transcribe_ua()

### Results

Partial success.  But overall... Not great!

## Extract Existing Transcripts from Videos

Now I'm going to use the [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api) to extract existing transcripts from YouTube videos. Not only will it return the transcript, but it can also be used to translate those to translate those transcripts into other languages.  So now I can download my Ukrainian song, and see both the Ukrainian transcript and the English translation. This is pretty awesome!

However, some videos do not contain transcripts.

In [None]:
%pip install --upgrade --no-cache-dir youtube_transcript_api

Note: you may need to restart the kernel to use updated packages.


In [None]:
import youtube_transcript_api as yt_api
from pytubefix import YouTube
from pytubefix.cli import on_progress

def get_transcripts():
    """ Extract existing transcript data from videos """
    for url in urls:
        try: # Just so we can get the video title
            yt = YouTube(url, on_progress_callback=on_progress, client="WEB")
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.error(f"The cause was: {e}") 
            continue
        
        logger.info(f"Processing '{yt.title}'...")
        video_id = get_video_id(url)
        
        try:
            # By default, we get a list of 1: only get the preferred language transcript
            transcript_list = yt_api.YouTubeTranscriptApi.list_transcripts(video_id)
        except Exception as e:
            logger.error(f"Unable to extract transcript for '{yt.title}'.")
            logger.error(e)
            continue
        
        # iterate over all available transcripts
        for transcript in transcript_list:
            # The Transcript object provides metadata properties. Here are some...
            properties = {
                "video_id": transcript.video_id,
                "language": transcript.language,
                "language_code": transcript.language_code,
                "is_generated": transcript.is_generated,  # Whether it has been manually created or generated by YouTube
                "is_translatable": transcript.is_translatable,  # Whether this transcript can be translated or not
                "translation_languages": transcript.translation_languages,
            }
            
            for prop, value in properties.items():
                logger.info(f"{prop}: {value}")

            # Fetch the actual transcript data
            transcript_data = transcript.fetch() # returns a list of dicts
            logger.info(f"Raw transcript:\n{transcript_data}") 
            
            processed_transcript = process_transcript(transcript_data)
            logger.info(f"Processed transcript:\n{processed_transcript}")
            
            # Translate to en if we can
            if (transcript.language_code != "en" and 
                    transcript.is_translatable and 
                    any(lang['language_code'] == 'en' for lang in transcript.translation_languages)):
                transcript_data = transcript.translate('en').fetch() # translate to en
                processed_transcript = process_transcript(transcript_data)
                logger.info(f"Processed translated transcript:\n{processed_transcript}")

def process_transcript(transcript_data):
    """ Get all entries that are of type 'text' and NOT starting with [ """
    return "\n".join([entry['text'] for entry in transcript_data 
                                     if entry['text'][0] != "["])
                
get_transcripts()

NameError: name 'urls' is not defined

How cool is this!?

## What's Next?

In the next notebook, we'll look at adding Google Smarts, with some Google AI.