# üîä Adding AI Generated voiceovers to silent footage

<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/examples/AI_Voiceover.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Overview

Voiceovers are the secret sauce that turns silent footage into captivating stories. They add depth, emotion, and excitement, elevating the viewing experience.

Traditionally, this workflow required stitching together multiple tools: one for script writing (LLM), one for voice generation (TTS), and another for video editing.

**VideoDB** simplifies this by bringing everything under one roof. In this tutorial, we will:
1.  **Upload** a silent video.
2.  **Analyze** the video to understand its visual content.
3.  **Generate** a narration script using VideoDB's text generation.
4.  **Generate** a professional AI voiceover using VideoDB's voice generation.
5.  **Merge** them instantly into a final video.


We will take [this silent footage](https://youtu.be/RcRjY5kzia8) of underwater life and automatically generate a nature-documentary style narration for it.

---

## Setup

### üì¶  Installing VideoDB

In [None]:
%pip -q install videodb

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/43.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m43.3/43.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for videodb (setup.py) ... [?25l[?25hdone


### üîë API Keys
You only need your **VideoDB API Key**.
> Get your API key from [VideoDB Console](https://console.videodb.io). (Free for first 50 uploads, **No credit card required**).

In [None]:
import videodb
import os
from getpass import getpass

# Prompt user for API key securely
api_key = getpass("Please enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key

Please enter your VideoDB API Key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


---

## Implementation


### üåê Step 1: Connect to VideoDB
Connect to VideoDB using your API key to establish a session.

In [None]:
from videodb import connect

# Connect to VideoDB
conn = connect()
coll = conn.get_collection()

### üé• Step 2: Upload Video
We'll upload the silent underwater footage directly from YouTube.

In [None]:
# Upload a video by URL
video = coll.upload(url='https://youtu.be/RcRjY5kzia8')
print(f"Uploaded ID: {video.id}")

Uploaded ID: m-z-019beaef-56ce-7f73-b0a9-1b243ba3b5f3


Lets have a look at the video

In [None]:
video.play()

### üîç Step 3: Analyze Visuals
We need to know what is happening in the video to write a script for it. We'll use `index_scenes()` to analyze the visual content.

In [None]:
print("Indexing scenes... this might take a moment.")
video_scenes_id = video.index_scenes()

Indexing scenes... this might take a moment.


Let's view the description of first scene of the video

In [None]:
video_scenes = video.get_scene_index(video_scenes_id)

import json
print(json.dumps(video_scenes[0], indent=2))

{
  "description": "The scene immerses the viewer in a vibrant, fluid expanse dominated by myriad blue and aqua forms. These countless, somewhat irregular shapes are densely packed, giving the impression of an immense, teeming mass in constant, gentle motion. Each form possesses a darker core that gradually lightens towards its edges, creating a translucent, almost glowing effect, as if illuminated from within. The varying shades, ranging from deep sapphire to brilliant turquoise, blend and shift across the frame, conjuring the image of a vast underwater environment. It evokes a colossal school of luminous marine creatures, perhaps fish or jellyfish, drifting together in a mesmerizing, organic dance, filling the visual field with their shimmering presence and dynamic, watery energy.",
  "end": 15.033,
  "metadata": {},
  "scene_metadata": {},
  "start": 0.0
}


### üìù Step 4: Generate Script
Now, we use VideoDB's `generate_text` method to write a voiceover script based on the scene descriptions we just retrieved.

In [None]:
# Construct a prompt with the scene context
scene_context = "\n".join([f"- {scene['description']}" for scene in video_scenes])

prompt = f"""
Here is a visual description of a video about the underwater world:
{scene_context}

Based on this, write a short, engaging voiceover script in the style of a nature documentary narrator (like David Attenborough).
Keep it synced to the flow of the visuals described.
Return ONLY the raw text of the narration, no stage directions or titles.
"""

# Generate the script using VideoDB
script_response = coll.generate_text(
    prompt=prompt,
    model_name="pro"
)

# Extract text if response is JSON, otherwise use directly
voiceover_script = script_response.get("text", script_response) if isinstance(script_response, dict) else script_response

print("--- Generated Script ---")
print(voiceover_script)

--- Generated Script ---
{'output': "In the vast, sunlit expanses of the ocean, life gathers in extraordinary numbers. Here, a living galaxy of countless beings drifts on the currents, each a shimmering pulse in a single, immense organism.\n\nFor many, survival depends on staying together. A school of fish moves as one mind, a mesmerising ballet of silver and blue. This is a fortress in motion, where the sheer confusion of numbers offers protection from any single threat.\n\nIn more sheltered waters, other communities thrive. Here, in a submerged forest, intricate gardens of green provide both food and sanctuary. Fish, great and small, navigate these verdant corridors‚Äîa delicate, perfectly balanced world built on the partnership between plant and animal.\n\nBut it is on the coral reef that life explodes into its most vibrant and spectacular forms. These are not rocks, but vast colonies of tiny animals, architects of sprawling cities that are home to a quarter of all marine species. A

### üéôÔ∏è Step 5: Generate Voiceover Audio
We can now turn that text into speech using `generate_voice`. This returns an Audio object directly, so we don't need to save or upload files manually.

In [None]:
print("Generating voiceover audio...")

# Generate speech directly as a VideoDB Audio Asset
audio = coll.generate_voice(
    text=voiceover_script['output'],
    voice_name="Default"
)

print(f"Generated Audio Asset ID: {audio.id}")

Generating voiceover audio...
Generated Audio Asset ID: a-z-019beb01-927e-7272-94e6-63ee182b971a


### üé¨ Step 6: Compose the Video
We have the video and the generated voiceover. Now we merge them using the Timeline Editor.

In [None]:
from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset

# Create a timeline
timeline = Timeline(conn)

# 1. Create a Video Track
video_track = Track()
video_asset = VideoAsset(id=video.id)
# Add the video clip
video_clip = Clip(asset=video_asset, duration=float(video.length))
video_track.add_clip(0, video_clip)

# 2. Create an Audio Track for the voiceover
audio_track = Track()
# Use the audio object we generated in Step 5
audio_asset = AudioAsset(id=audio.id)
audio_clip = Clip(asset=audio_asset, duration=float(audio.length))
audio_track.add_clip(0, audio_clip)

# Add tracks to timeline
timeline.add_track(video_track)
timeline.add_track(audio_track)

### ü™Ñ Step 7: Review and Share
Generate the final stream URL and watch your AI-narrated video!

In [None]:
from videodb import play_stream

stream_url = timeline.generate_stream()
print(f"Stream URL: {stream_url}")
play_stream(stream_url)

Stream URL: https://play.videodb.io/v1/920e9b2d-a122-4d8a-8d25-15d02aea2637.m3u8


---

### üéâ Conclusion:
Congratulations! You have successfully automated the process of creating custom and personalized voiceovers based on a simple prompt and raw video footage using VideoDB.

By leveraging advanced AI technologies, you can enhance the storytelling and immersive experience of your video content. Experiment with different prompts and scene analysis techniques to further improve the quality and accuracy of the voiceovers. Enjoy creating captivating narratives with AI-powered voiceovers using VideoDB!

Explore more at [docs.videodb.io](https://docs.videodb.io/).