Skip to content

vladkol/video-avatars-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Agent Team for Creating Long-form Videos

This repository demonstrates how to build a long-form video generation agent using the Google Agent Development Kit (ADK), Gemini 3.1 Flash Image(Nano Banana2), and Veo 3.1.

The purpose of the agent is to generate long-form videos with custom avatars delivering educational content.

It demonstrates character and environment consistency techniques that allow producing long videos as a series of 8-second chunks.

It also shows how to perform conversion of technical documentation to video scripts that sound natural and engaging.

We provide demo assets in the assets sub-folder:

  • 4 images to use as starting frames.
  • Full prompt with the following sections:
    1. Character Description
    2. Voice Description.
    3. Visual Appearance.
    4. Video Shot Instructions.
    5. Document to adapt and split across video chunks.

Capy image

Features

  • Original text content conversion for making it sound natural
  • Continuous video generation with character and scene consistency

Architecture

It is a full-stack web application designed to be deployed on Google Cloud Run, with ADK Web UI, using Vertex AI Agent Engine Sessions Service for session management and Google Cloud Storage for storing artifacts.

Architecture

Agents

  • Orchestrator (root agent) - The main agent that orchestrates the video generation process. It takes user input and calls sub-agents to perform specific tasks.
  • Script Sequencer - Adapts the content into a script that sounds natural when delivered by a speaker. Splits script into chunks up to 8 seconds long.
  • Video Agent - facilitates video generation according to instructions and input provided by the Orchestrator.

MCP Server

MediaGenerators MCP Server with 2 tools:

  1. generate_video - uses Veo 3.1 model to generate videos. It can use start and frame in addition to the text prompt.
  2. generate_image - uses Gemini 3.1 Flash Image (Nano Banana 2 🍌) to generate images. It can use source image as a reference. This tool is not used by the repo's agent.

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/vladkol/video-avatars-agent
    cd video-avatars-agent
  2. Create a Python virtual environment and activate it:

    We recommend using uv

    uv venv .venv
    source .venv/bin/activate
  3. Install the Python dependencies:

    uv pip install pip
    uv pip install -r agents/video_avatar_agent/requirements.txt
    uv pip install -r mcp/requirements.txt

Configuration

  1. Create a .env file in the root of the project by copying the .env-template file:

    cp .env-template .env
  2. Update the .env file with your Google Cloud project ID, location, and the name of your GCS bucket for AI assets.

Running Locally

To start the MCP server run:

./deployment/run_mcp_local.sh

The MCP server will run on http://localhost:8080.

To run the agent locally, use the run_agent_local.sh script:

./deployment/run_agent_local.sh

This will:

  1. Register an Agent Engine resource for using with the session service.
  2. Start a local a web server with the ADK Web UI, which you can access in your browser.

Deployment

To deploy the MCP server and the agent to Cloud Run, use the deploy.sh script:

./deployment/deploy.sh

This script will:

  1. Register an Agent Engine resource for using with the session service.
  2. Deploy the MCP server to Cloud Run.
  3. Deploy the agent to Cloud Run, with the ADK Web UI.

How to use the agent

  1. Open the agent's ADK Web UI.

  2. Insert content of assets/prompt.md file to the chat box.

  3. Click on the paperclip button 📎, and attach 4 source strip files:

    Note: It is important to select view1.jpeg file first. The first view is what the video starts with.

  4. Hit Enter key to submit the request. The agent will start converting the script and generating videos.

License

This repository is licensed under the Apache 2.0 License - see the LICENSE file for details.

Disclaimers

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.

About

Multi-Agent Team for Creating Long-form Videos

Resources

License

Stars

Watchers

Forks

Contributors