# Dazbo's YouTube and Video Demos - with Google AI

## Overview

Welcome to notebook #2 in this tutorial guide. This notebook follows on from [YouTube and Video Demos #1](youtube-demos.ipynb). In the previous notebook I demonstrated:

- Multiple methods for downloading videos and extracting audio
- How to transcribe audio to text using a free speech-to-text API
- How to extract existing transcripts and translate to different languages

In this part we'll strip out parts of the first notebook we don't need, and add some smarts using Google technology.

## How to Launch and Run this Notebook

- The source for this notebook source lives in my GitHub repo, <a href="https://github.com/derailed-dash/youtube-and-video" target="_blank">Youtube-and-Video</a>.
- You can run the notebook locally, or with any of the options shown below.
- Check out further guidance - including tips on how to run the notebook - in the project's `README.md`.
- For more ways to run Jupyter Notebooks, check out [my guide](https://medium.com/python-in-plain-english/five-ways-to-run-jupyter-labs-and-notebooks-23209f71e5c0).

**When running this notebook, first execute the cells in the [Setup](#Setup) section, as described below.** Then you can experiment with any of the subsequent cells.


<table width="800px" style="border-collapse: collapse;">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/derailed-dash/youtube-and-video/blob/main/src/notebooks/youtube-demos-with-google-ai.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br>Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/derailed-dash/youtube-and-video/blob/main/src/notebooks/youtube-demos-with-google-ai.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br>View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/derailed-dash/youtube-and-video/blob/main/src/notebooks/youtube-demos-with-google-ai.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>

## Setup

### Packages

First, let's install any dependent packages:

In [None]:
%pip install --upgrade --no-cache-dir python-dotenv \
                                      dazbo-commons \
                                      pytubefix

In [2]:
from IPython.display import display, Markdown

import io
import logging
import os
import re
import sys
from pathlib import Path
from dataclasses import dataclass
from dotenv import load_dotenv
import dazbo_commons as dc

In [None]:
# Colab requires an older version of Ipykernel
if not "google.colab" in sys.modules:
    pass
    %pip install --upgrade --no-cache-dir ipykernel

### Logging

Now we'll setup logging. Here I'm using coloured logging from my [dazbo-commons](https://pypi.org/project/dazbo-commons/) package. Feel free to change the logging level.

In [None]:
# Setup logging
APP_NAME="dazbo-yt-demos"
logger = dc.retrieve_console_logger(APP_NAME)
logger.setLevel(logging.DEBUG)
logger.info("Logger initialised.")
logger.debug("DEBUG level logging enabled.")

### File Locations

Here we initialise some file path locations, e.g. an output folder.

In [None]:
locations = dc.get_locations(APP_NAME)
for attribute, value in vars(locations).items():
    logger.debug(f"{attribute}: {value}")

### Utility Functions

In [6]:
def clean_filename(filename):
    """ Create a clean filename by removing unallowed characters. """
    pattern = r'[^a-zA-Z0-9._\s-]'
    cleaned_name = re.sub(pattern, '_', filename).replace("_ _", "_").replace("__", "_")
    return  cleaned_name

### Videos to Work With

We start by defining a list of videos to test our application with, along with a function that takes a full YouTube URL and returns just the id portion.

I’ve used these videos because…

- The first is the fantastic [Burning Bridges](https://www.youtube.com/watch?v=udRAIF6MOm8) by Sigrid. The video has no embedded transcript.
- The second is the beautiful song [I Believe](https://www.youtube.com/watch?v=CiTn4j7gVvY) by Melissa Hollick. It’s one of my favourite songs of all time. When I get a migraine, I turn off the lights, and listen to this to feel better! And for those who enjoy gaming, this song is the end titles to the amazing Wolfenstein: New Order game. This video has an embedded transcript.
- Then we have a short [Jim Carey speech](https://www.youtube.com/watch?v=nLgHNu2N3JU), which gives us dialog without music or other ambient noise. It has an embedded transcript.
- And finally, a [Ukrainian song](https://www.youtube.com/watch?v=d4N82wPpdg8) from Eurovision 2024, by Jerry Heil and Alyona Alyona. This gives us an opportunity to test translation. It also has an embedded transcript.

In [7]:
# Videos to download
urls = [
    "https://www.youtube.com/watch?v=udRAIF6MOm8",  # Sigrid - Burning Bridges (English)
    "https://www.youtube.com/watch?v=CiTn4j7gVvY",  # Melissa Hollick - I Believe (English)
    "https://www.youtube.com/watch?v=nLgHNu2N3JU",  # Jim Carey - Motivational speech (English)
    "https://www.youtube.com/watch?v=d4N82wPpdg8",  # Jerry Heil & Alyona Alyona - Teresa & Maria (Ukrainian)
]

def get_video_id(url: str) -> str:
    """ Return the video ID, which is the part after 'v=' """
    return url.split("v=")[-1]

output_locn = f"{locations.output_dir}/pytubefix"

## Downloading Videos and Extracting Audio

Let's use the [pytubefix](https://github.com/JuanBindez/pytubefix) library to download YouTube videos, and then to download mp3 audio-only streams as files.

This library is a community-maintained fork of `pytube`. It was created to provide quick fixes for issues that the official pytube library faced, particularly when YouTube's updates break `pytube`.

In [None]:

from pytubefix import YouTube
from pytubefix.cli import on_progress

def process_yt_videos():
    for i, url in enumerate(urls):
        logger.info(f"Downloads progress: {i+1}/{len(urls)}")

        try:
            # YouTube now requires the PO token to be passed in the requet
            # The library will automatically generate a PO token, 
            # but nodejs must be installed to do so.
            yt = YouTube(url, on_progress_callback=on_progress, client="WEB")
            logger.info(f"Getting: {yt.title}")
            video_stream = yt.streams.get_highest_resolution()
            if not video_stream:
                raise Exception("Stream not available.")
            
            # YouTube resource titles may contain special characters which 
            # can't be used when saving the file. So we need to clean the filename.
            cleaned = clean_filename(yt.title)
            
            logger.info(f"Downloading video {cleaned}.mp4 ...")
            video_stream.download(output_path=output_locn, filename=f"{cleaned}.mp4")
        
            logger.info(f"Creating audio...")
            audio_stream = yt.streams.get_audio_only()
            audio_stream.download(output_path=output_locn, filename=f"{cleaned}.mp3")
            
            logger.info("Done")
            
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.error(f"The cause was: {e}") 
            
    logger.info(f"Downloads finished. See files in {output_locn}.")
    
process_yt_videos()


## Extract Existing Transcripts from Videos

Here I'm using the [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api) to extract existing transcripts from YouTube videos. Not only will it return the transcript, but it can also be used to translate those to translate those transcripts into other languages.  So now I can download my Ukrainian song, and see both the Ukrainian transcript and the English translation. This is pretty awesome!

In [None]:
%pip install --upgrade --no-cache-dir youtube_transcript_api

In [None]:
import youtube_transcript_api as yt_api
from pytubefix import YouTube
from pytubefix.cli import on_progress

def get_transcripts():
    """ Extract existing transcript data from videos """
    for url in urls:
        try: # Just so we can get the video title
            yt = YouTube(url, on_progress_callback=on_progress, client="WEB")
        except Exception as e:        
            logger.error(f"Error processing URL '{url}'.")
            logger.error(f"The cause was: {e}") 
            continue
        
        logger.info(f"Processing '{yt.title}'...")
        video_id = get_video_id(url)
        
        try:
            # By default, we get a list of 1: only get the preferred language transcript
            transcript_list = yt_api.YouTubeTranscriptApi.list_transcripts(video_id)
        except Exception as e:
            logger.error(f"Unable to extract transcript for '{yt.title}'.")
            logger.error(e)
            continue
        
        # iterate over all available transcripts
        for transcript in transcript_list:
            # The Transcript object provides metadata properties. Here are some...
            properties = {
                "video_id": transcript.video_id,
                "language": transcript.language,
                "language_code": transcript.language_code,
                "is_generated": transcript.is_generated,  # Whether it has been manually created or generated by YouTube
                "is_translatable": transcript.is_translatable,  # Whether this transcript can be translated or not
                "translation_languages": transcript.translation_languages,
            }
            
            for prop, value in properties.items():
                logger.info(f"{prop}: {value}")

            # Fetch the actual transcript data
            transcript_data = transcript.fetch() # returns a list of dicts
            logger.info(f"Raw transcript:\n{transcript_data}") 
            
            processed_transcript = process_transcript(transcript_data)
            logger.info(f"Processed transcript:\n{processed_transcript}")
            
            # Translate to en if we can
            if (transcript.language_code != "en" and 
                    transcript.is_translatable and 
                    any(lang['language_code'] == 'en' for lang in transcript.translation_languages)):
                transcript_data = transcript.translate('en').fetch() # translate to en
                processed_transcript = process_transcript(transcript_data)
                logger.info(f"Processed translated transcript:\n{processed_transcript}")

def process_transcript(transcript_data):
    """ Get all entries that are of type 'text' and NOT starting with [ """
    return "\n".join([entry['text'] for entry in transcript_data 
                                     if entry['text'][0] != "["])
                
get_transcripts()

How cool is this!?

Alas, some videos don't have pre-existing transcripts. So let's see if we can improve our transcription capability using some Google Cloud AI...

## Adding Google Cloud Intelligence

We're going to leverage Google Cloud APIs. In order to leverage these Google services, you'll need to have first created a Google Cloud project. So, if you haven't already, create your project, attach it to a billing account, and then come back.

### How to Consume Google Cloud Services from your Notebook

Then, in order to give your notebook access to the Google Cloud APIs, you broadly have three options:

1. You can build and run your notebook locally.
1. You can build and run your notebook in Google Colab.
1. You can build and run your notebook in the Google Vertex AI Workbench environment.

Let's look at the options...

#### Local Notebook

For local development - e.g. a Jupyter notebook running in your own machine - you will need to:

1. Have the Google Cloud `gcloud CLI` installed. See instructions [here](https://cloud.google.com/sdk/docs/install).
2. Authenticate to `gcloud`, so we can externally run `gcloud` commands from the notebook.
3. Set your quota project, and set your Application Default Credentials (ADC) by authenticating to your gcloud environment.

```bash
# From your terminal...
export PROJECT_ID = <your project>
gcloud auth login # authenticate to gcloud
gcloud auth application-default login # set up ADC
gcloud auth application-default set-quota-project $PROJECT_ID
gcloud config set project $PROJECT_ID
```

4. Use [Application Default Credentials](https://cloud.google.com/docs/authentication/application-default-credentials) from within the notebook.

```python
from google.auth import default
credentials, _ = default()

PROJECT = !gcloud config get-value project
PROJECT_ID = PROJECT[0]
REGION = "europe-west2"

# Now use whatever Google services...
import vertexai
vertexai.init(project=PROJECT_ID, location=REGION)
```

#### Google Colab

This is a great way to create and run Jupyter notebooks in the Cloud, and it makes them super-easy to share.

The great thing about this approach is that Colab provides native integration to authenticate your user account and provide your Google project details to the Colab environment.

Notes:

- No don't need to install Google `gcloud CLI` locally. It is pre-installed in the environment.
- You can share notebooks using Google Drive, with Drive-based access control.
- There are limitations for notebook size, and for notebook runtime instance size.

Check out [this guide](https://github.com/GoogleCloudPlatform/devrel-demos/blob/main/other/colab/Using%20Google%20Cloud%20from%20Colab.ipynb).

For example, in your notebook:

```python
import sys

# First, set PROJECT_ID and REGION variables from environment variables or secrets
# Then...
PROJECT_ID = str(os.environ.get("PROJECT_ID"))
REGION = str(os.environ.get("REGION"))
!gcloud config set project $PROJECT_ID

# Check if we're running in the Colab environment, and if so
# Use Colab native authentication to Google Cloud
if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud auth application-default login # set ADC

credentials, _ = google.auth.default()

# Now use your Google services...
import vertexai
vertexai.init(
    project=PROJECT_ID,
    location=REGION,
    credentials=credentials
)
```

#### Vertex AI Workbench

[Vertex AI Workbench](https://cloud.google.com/vertex-ai/docs/workbench/introduction) is Google's most powerful managed enterprise Jupyter notebook hosting service. Is is fully-integrated with the Google Cloud and Vertex AI ecosystem.

Notes:

- The gcloud CLI is pre-installed in the environment.
- The JupyterLab environment is pre-installed.
- Access control and sharing is managed by Google Cloud IAM, rather than Google Drive.
- Because it is natively integrated with the Google Cloud environment, you don't need to provide any credentials or authenticate. You just need to provide Google project ID and region to any services that require this information. E.g.

```python
# First, set PROJECT_ID and REGION variables from environment variables or secrets
# Then...
PROJECT_ID = str(os.environ.get("PROJECT_ID"))
REGION = str(os.environ.get("REGION"))

# Go ahead and use your Google services...
import vertexai
vertexai.init(project=PROJECT_ID, location=REGION)
```

### Retrieve Environment Variables

I put my environment variables in a `.env` file at the top level of my project. It looks like this...

```bash
PYTHONPATH=src;src/notebooks
PROJECT_ID=my-project-id
REGION=my-region
```

In [None]:
import sys
from getpass import getpass

# Retrieve PROJECT_ID and other variables from any .env we can find
try:
    dc.get_envs_from_file()
except ValueError as e:
    logger.error(f"Problem reading env file:\n{e}")

if not (PROJECT_ID := os.getenv("PROJECT_ID")):
    PROJECT_ID = input(f"Enter PROJECT_ID: ")

if not (REGION := os.getenv("REGION")):
    REGION = input(f"Enter REGION: ")
    
if not (GEMINI_API_KEY := os.getenv("GEMINI_API_KEY")):
    GEMINI_API_KEY = getpass(f"Enter Gemini API Key:")

logger.info(f"{PROJECT_ID=}")
logger.info(f"{REGION=}")
logger.info(f"GEMINI_API_KEY={GEMINI_API_KEY[:5]}...{GEMINI_API_KEY[-5:]}")

### Clear Environment Variables

**Only run the next cell if you want to manually clear the environment variables** and then input new values. In this scenario, you'll also want to comment out any variables in your .env file.

In [None]:
# Only run this if we want to clear env vars
del os.environ["PROJECT_ID"]
del os.environ["REGION"]
del os.environ["GEMINI_API_KEY"]

### A Notebook We Can Run in ANY Environment

Let's engineer the notebook to be agnostic of where it is hosted.

In [None]:
%pip install --upgrade google-auth \
                       google-auth-oauthlib \
                       google-auth-httplib2 \
                       google-cloud-storage

In [None]:
from google.auth import default
from google.auth.exceptions import DefaultCredentialsError
from google.cloud import storage # Enable 

# If we're running Google Colab, authenticate
if "google.colab" in sys.modules:
    from google.colab import auth # type: ignore
    auth.authenticate_user()
    !gcloud auth application-default login # set ADC
else: # If you're not running in a local dev CLI, from your terminal...
    # export PROJECT_ID=<your project ID>
    # gcloud auth login
    # gcloud auth application-default login
    # gcloud auth application-default set-quota-project $PROJECT_ID
    pass

try:
    credentials, _ = default() # Retrieve ADC
    !gcloud config set project $PROJECT_ID
except DefaultCredentialsError as e:
    logger.error(e)


### Video Transcription Using the Video Intelligence API

Recall that in the previous notebook, I tried to perform audio trascription using the Python `speech_recognition` package, and the built-in [Google Web Speech API](https://wicg.github.io/speech-api/) `Recognizer`. It wasn't great!

So now let's use Google's [Video Intelligence API](https://cloud.google.com/video-intelligence/docs) to perform transcription...

In [None]:
%pip install --upgrade google-cloud-videointelligence

In [None]:
from google.cloud import videointelligence

DEFAULT_MIN_CONFIDENCE=0.6

def transcribe_video(video, minimum_confidence=DEFAULT_MIN_CONFIDENCE, src_language="en-US"):
    logger.info(f"Processing {video.name}...")
    
    video_client = videointelligence.VideoIntelligenceServiceClient()

    # This API can do loads of things. Here I'll tell it to do speech transcription.
    features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]
    config = videointelligence.SpeechTranscriptionConfig(
        language_code=src_language, enable_automatic_punctuation=True
    )
    video_context = videointelligence.VideoContext(speech_transcription_config=config)

    try:
        with io.open(video, "rb") as file:
            input_content = file.read()
            
        operation = video_client.annotate_video(
            request={
                "features": features,
                "input_content": input_content, # for lcoal
                # "input_uri": path, # for objects in GCS
                "video_context": video_context,
            }
        )

        result = operation.result(timeout=600)

        # There is only one annotation_result per video.
        annotation_results = result.annotation_results[0]
        complete_transcript = ""
        for speech_transcription in annotation_results.speech_transcriptions:
            # Each SpeechTranscription can contain multiple alternatives.
            # Each alternative is a different possible transcription and has its own confidence score.
            # They are ordered in terms of accuracy. So we really only need the first.
            part = speech_transcription.alternatives[0]
            if part.confidence < minimum_confidence:
                logger.debug(f"Ignoring transcript alternative with confidence of {part.confidence}.")
                continue
                
            logger.debug("Part transcript: {}".format(part.transcript))
            logger.debug("Part confidence: {}\n".format(part.confidence))
            complete_transcript += part.transcript.strip() + "\n"
                
        return complete_transcript.strip()
    except Exception as e:
       logger.error(e)

It will take a few minutes to process these videos...

In [None]:
logger.debug(f"Looking for videos in {output_locn}...")
for video in Path(output_locn).glob(f'*.mp4'):
    # Fortunately, this API natively supports mp4 without any conversion
    transcript = transcribe_video(video)
    if transcript:
        logger.info(f"Transcript:\n{transcript}")
    else:
        logger.warning(f"Unable to retrieve a transcript with confidence.")

### Conclusions

It's pretty good!

- It's reliable and doesn't give random _pipe_ errors.
- It transcribes with higher accuracy than the Python `speech_recognition` package using the Google Web Speech API `Recognizer`. In particular, it does a much better job with the "I Believe" track.
- We don't need to split the video into chunks.
- The API provides an estimate of transcription accuracy. And we can use this to filter out transcriptions that we don't want to keep.

Some minor issues:

- It takes a long time to process each video.
- It doesn't automatically detect the source language. So it fails with the Ukrainian music video.

Let's now try and translate the Ukrainian song by passing in the language code.

In [None]:
video = next(Path(output_locn).glob(f'alyona*.mp4'), None)

transcript = transcribe_video(video, src_language="uk-UA")
if transcript:
    logger.info(f"Transcript:\n{transcript}")
else:
    logger.warning(f"Unable to retrieve a transcript with confidence.")

Not great. Oh well!

### What Next?

Now I'm going to:

- See if we can use a Google Gemini generative AI model to extract, transcribe and translate for us.
- Use Gemini to do some summarising.

## Vertex Gemini Generative AI

Let's integrate some generative AI!

There are a couple of APIs we can use to do this.  Here are some pointers...

| API | Requires | Use Cases |
|-----|----------|-----------|
| [Google Vertex AI SDK](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk) | A Google project | Working with Google Vertex services, including Gemini models. |
| [Google Gemini API](https://ai.google.dev/api?lang=python) | [An API key](https://medium.com/r/?url=https%3A%2F%2Faistudio.google.com%2Fapp%2Fapikey). You don't need a Google Cloud project! | Prototyping and experimentation. |

### Working with Google Cloud Storage (GCS)

When working with many Google Vertex AI APIs it can be convenient to store files i GCS. So it will be useful to create a GCS bucket in our project (if we haven't already done so). Of course, you can do that in the Cloud Console, but here I'll show you how to do that from the notebook.

Before you proceed, don't forget to enable the Cloud Storage API (`storage.googleapis.com`).

#### Create / Check the Bucket

If we need a bucket for any of our application, we can create one using the code here. Let's set the lifecycle policy so that files are deleted automatically after 3 days.

This cell is to create a policy for our bucket when we create it. You will only ever need to run this once.

In [None]:
%%writefile lifecycle-policy.json
{
  "rule": [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 3}
    }
  ]
}


In [None]:
BUCKET_NAME = f"{PROJECT_ID}-bucket"
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
bucket_check = !gcloud storage ls $BUCKET_URI # type: ignore
bucket_exists = True
for line in bucket_check:
    if "404" in line:
        bucket_exists = False
        break
        
if not bucket_exists:
    logger.info(f"Creating bucket {BUCKET_URI}")
    ! gcloud storage buckets create {BUCKET_URI} --location={REGION}
    
    # Set bucket so files are automatically deleted
    logger.info(f"Setting lifecycle policy for {BUCKET_URI}")
    ! gcloud storage buckets update {BUCKET_URI} --lifecycle-file=lifecycle-policy.json
else:
    logger.info(f"{BUCKET_URI} already exists.")

logger.info(f"{BUCKET_NAME=}")

#### Utility Function to Upload Files to GCS

Now let's create a utility function for uploading files from local storage to GCS.

In [None]:
from google.cloud import storage

def upload_to_gcs(bucket_name: str, src_file_name, dest_name):
    """ Upload a file to a GCS bucket. """
    try:
        storage_client = storage.Client()
        bucket = storage_client.bucket(bucket_name)
        
        # Destination blob name
        blob_name = dest_name 
        blob = bucket.blob(blob_name)
        logger.info(f"Uploading {src_file_name} to gs://{bucket.name}/{blob_name}")
        blob.upload_from_filename(src_file_name)
        # we could also upload a BytesIO e.g.
        # blobl.upload_from_file(src_file)
        
        return f"gs://{bucket}/{blob_name}" # Return the full GCS URI
    except Exception as e:
        logger.exception(f"Error uploading {src_file_name} to GCS: {e}")
        return

def get_gcs_uris(bucket_name:str, glob:str=None) -> list[str]:
    """
    Retrieves bucket URIs for files in a specified folder, 
    optionally matching a wildcard glob.

    Args:
        bucket_name: The name of the GCS bucket.
        glob: A wildcard string to match filenames (e.g., '*.mp4').

    Returns:
        A list of bucket URIs for matching files.
    """
    logger.debug(f"Listing blobs in {bucket_name}/{glob if glob else ''}...")
    client = storage.Client() # Uses ADC so we don't have to pass in credentials

    blobs = client.list_blobs(bucket_or_name=bucket_name, 
                              match_glob=glob)
    return [f"gs://{bucket_name}/{blob.name}" for blob in blobs]


### Upload our Videos to GCS

In [None]:
folder = "video_files"
logger.info(f"Upload .mp4 files to a bucket folder called {folder}...")
for file in Path(output_locn).glob(f'*.mp4'):
    response = upload_to_gcs(bucket=BUCKET_NAME, 
                             src_file_name=file,
                             dest_name=f"{folder}/{file.name}")
    
folder = "audio_files"
logger.info(f"Upload .mp3/.m4a files to a bucket folder called {folder}...")
for ext in ('*.mp3', '*.m4a'):
    for file in Path(output_locn).glob(ext):
        response = upload_to_gcs(bucket=BUCKET_NAME, 
                                src_file_name=file,
                                dest_name=f"{folder}/{file.name}")

In [None]:
# Test retrieval
matched_files = get_gcs_uris(BUCKET_NAME)
for file_uri in matched_files:
    logger.info(file_uri)
    
sigrid_video_uri = get_gcs_uris(BUCKET_NAME, glob="video_files/Sigrid*.mp4")[0]
logger.info(f"{sigrid_video_uri=}")

### Using the Vertex AI SDK

Start by installing the **Google Cloud Vertex AI SDK for Python**. 

From [Introduction to the Vertex AI SDK for Python](https://cloud.google.com/vertex-ai/docs/python-sdk/use-vertex-ai-python-sdk#sdk-vs-client-library):

When you install the Vertex AI SDK for Python (`google.cloud.aiplatform`), the Vertex AI Python client library (`google.cloud.aiplatform.gapic`) is also installed. The Vertex AI SDK and the Vertex AI Python client library provide similar functionality with different levels of granularity. The Vertex AI SDK operates at a higher level of abstraction than the client library and is suitable for most common data science workflows. If you need lower-level functionality, then use the Vertex AI Python client library.

In [None]:
# Install Vertex AI SDK for Python
%pip install --upgrade google-cloud-aiplatform 


At this point, if this is the first time installing the Vertex AI SDK in your notebook, you might want to **restart your kernel / runtime.**

Now let's load a model. Gemini 1.5 Flash (`gemini-1.5-flash`) is a multimodal model that supports multimodal prompts. You can include text, image(s), and video in your prompt requests and get text or code responses.

In [None]:
import vertexai # Google Cloud Vertex Generative AI SDK for Python
from vertexai.generative_models import GenerationConfig, GenerativeModel, Part
from google.api_core.exceptions import ResourceExhausted

In [None]:
vertexai.init(project=PROJECT_ID, location=REGION)
model = GenerativeModel("gemini-1.5-flash-002")

#### Test with a Simple Prompt

In [None]:
response = model.generate_content("Write a story about a silly black and white cat called Mycroft")
display(Markdown(f"### A Story About Mycroft\n\n{response.text}"))

#### Process a Video

Let's upload a video to the model and ask it some questions. We'll also ask it to transcribe.  This might take a minute or two, so be patient!

In [None]:
# The pro-vision model gave me nonsense lyrics
# model = GenerativeModel("gemini-1.0-pro-vision")

try:
    # See if already defined
    sigrid_video_uri # type: ignore
except NameError:
    sigrid_video_uri = get_gcs_uris(BUCKET_NAME, glob="video_files/Sigrid*.mp4")[0]
    
logger.info(f"{sigrid_video_uri=}")

video = Part.from_uri(
    uri=sigrid_video_uri,
    mime_type="video/mp4",
)

# But we could also do this...
# video = Part.from_data(data=video_bytes_io, mime_type="video/mp4")

prompt = """
What is shown in this video?
Who is the artist?
What are the lyrics?
Can you summarise them?
What is the meaning of the lyrics?
"""

contents = [prompt, video]

try:
    logger.info("Asking the model. Please wait...")
    display(Markdown(f"### Prompt:\n\n{prompt}"))
    response = model.generate_content(contents, stream=False)    

    display(Markdown(f"### Response:\n"))
    display(Markdown(f"{response.text}"))
except ResourceExhausted as e:
    logger.warning(f"Resource exhausted: {e}")
except Exception as e: # Handle other exceptions separately
    logger.exception(f"An unexpected error occurred: {e}")   


#### Results

Wow, that's pretty amazing! This seems like the perfect solution!

#### Will It Be Faster With Just Audio Files?

This works, but it is a little slow, and potentially costly. I wonder if we can do the same, but using just our audio files instead?

In [None]:
try:
    # See if already defined
    sigrid_audio_uri # type: ignore 
except NameError:
    sigrid_audio_uri = get_gcs_uris(BUCKET_NAME, glob="audio_files/Sigrid*.m4a")[0]
    
logger.info(f"{sigrid_audio_uri=}")

audio = Part.from_uri(
    uri=sigrid_audio_uri,
    mime_type="audio/mpeg",
)

# But we could also do this...
# audio = Part.from_data(data=audio_bytes_io, mime_type="audio/mpeg")

prompt = """
In this audio file, please tell me:
- What are the lyrics?
- Can you summarise them?
- What is the meaning of the lyrics?
"""

contents = [prompt, audio] # multimodal input

try:
    logger.info("Asking the model. Please wait...")
    display(Markdown(f"### Prompt:\n\n{prompt}"))
    response = model.generate_content(contents, stream=False)    

    display(Markdown(f"### Response:\n"))
    display(Markdown(f"{response.text}"))
except ResourceExhausted as e:
    logger.warning(f"Resource exhausted: {e}")
except Exception as e: # Handle other exceptions separately
    logger.exception(f"An unexpected error occurred: {e}")   


#### Results

Amazing! Super-easy to modify the code, and it runs in a fraction of the time!

#### Ukranian Translation?

One last test... Can it do the same with my Ukrainian song?

In [None]:
try:
    # See if already defined
    ukrainian_audio_uri # type: ignore 
except NameError:
    ukrainian_audio_uri = get_gcs_uris(BUCKET_NAME, glob="audio_files/alyona*.m4a")[0]
    
logger.info(f"{ukrainian_audio_uri=}")

audio = Part.from_uri(
    uri=ukrainian_audio_uri,
    mime_type="audio/mpeg",
)

prompt = """
In this audio file, please tell me:
- What languages are being sung?
- What are the lyrics? Please show me in the native language, and translated to English.
- What is the meaning of the lyrics?
"""

contents = [prompt, audio] # multimodal input

try:
    logger.info("Asking the model. Please wait...")
    display(Markdown(f"### Prompt:\n\n{prompt}"))
    response = model.generate_content(contents, stream=False)    

    display(Markdown(f"### Response:\n"))
    display(Markdown(f"{response.text}"))
except ResourceExhausted as e:
    logger.warning(f"Resource exhausted: {e}")
except Exception as e: # Handle other exceptions separately
    logger.exception(f"An unexpected error occurred: {e}")   


This is amazing! It detects the correct language, transcribes it, and translates it.  And it does a MUCH BETTER JOB than the Video Intelligence API!!

## But Can I Do All This Without a Google Cloud Project?

### Yes - Use the Gemini API

All you need is an API key!

In [None]:
# Install the Gemini API SDK
%pip install --upgrade google-generativeai

#### Test a Simple Prompt

In [None]:
import google.generativeai as genai # Use the Gemini API

genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash-002")

In [None]:
response = model.generate_content("Write a story about a silly black and white cat called Mycroft")
display(Markdown(f"### A Story About Mycroft\n\n{response.text}"))

#### Test with a Video

In [None]:
import time

video_file = next(Path(output_locn).glob(f'Sigrid*.mp4'), None)
uploaded = genai.upload_file(video_file)
logger.info(f"{uploaded=}")

# Videos need to be processed before you can use them.
while uploaded.state.name == "PROCESSING":
    logger.info("Processing video...")
    time.sleep(5)
    uploaded = genai.get_file(uploaded.name)

response = model.generate_content([uploaded, "Describe this video clip"])
display(Markdown(f"### Video Description\n\n{response.text}"))


#### Try the Ukrainian Song

In [None]:
import time

audio_file = next(Path(output_locn).glob(f'alyona*.m4a'), None)
uploaded = genai.upload_file(audio_file)
logger.info(f"{uploaded=}")

# Videos need to be processed before you can use them.
while uploaded.state.name == "PROCESSING":
    logger.info("Processing audio...")
    time.sleep(5)
    uploaded = genai.get_file(uploaded.name)

prompt = """
In this audio file, please tell me:
- What languages are being sung?
- What are the lyrics? Please show me in the native language, and translated to English.
- What is the meaning of the lyrics?
"""

response = model.generate_content([uploaded, prompt])
display(Markdown(f"### Audio Description\n\n{response.text}"))

## Conclusions

This was amazing!

- We can use the multimodal Gemini model to transcribe audio from our video clips, including songs, or from our audio files.
- It transcribes with a high degree of accuracy; much better than the Video Intelligence API.
- It automatically detects the language of the source material.
- It successfully translates.
- You can do this from within a Google Cloud project using the Vertex AI API.  But you'll need a Google Cloud project, you'll need to enable the necessary APIs.
- Or, you can use the Gemini API. The capability is the same, but you don't need a Google Cloud project. All you need is an API key.