This repository demonstrates how to build a long-form video generation agent using the Google Agent Development Kit (ADK), Gemini 3.1 Flash Image(Nano Banana2), and Veo 3.1.
The purpose of the agent is to generate long-form videos with custom avatars delivering educational content.
It demonstrates character and environment consistency techniques that allow producing long videos as a series of 8-second chunks.
It also shows how to perform conversion of technical documentation to video scripts that sound natural and engaging.
We provide demo assets in the assets sub-folder:
- 4 images to use as starting frames.
- Full prompt with the following sections:
- Character Description
- Voice Description.
- Visual Appearance.
- Video Shot Instructions.
- Document to adapt and split across video chunks.
- 📄 Original documentation to deliver as a training video: Safety and Security for AI Agents.
- 🎬 Final demo video.
- Original text content conversion for making it sound natural
- Continuous video generation with character and scene consistency
It is a full-stack web application designed to be deployed on Google Cloud Run, with ADK Web UI, using Vertex AI Agent Engine Sessions Service for session management and Google Cloud Storage for storing artifacts.
- Orchestrator (root agent) - The main agent that orchestrates the video generation process. It takes user input and calls sub-agents to perform specific tasks.
- Script Sequencer - Adapts the content into a script that sounds natural when delivered by a speaker. Splits script into chunks up to 8 seconds long.
- Video Agent - facilitates video generation according to instructions and input provided by the Orchestrator.
MediaGenerators MCP Server with 2 tools:
generate_video- uses Veo 3.1 model to generate videos. It can use start and frame in addition to the text prompt.generate_image- uses Gemini 3.1 Flash Image (Nano Banana 2 🍌) to generate images. It can use source image as a reference. This tool is not used by the repo's agent.
- An existing Google Cloud Project. New customers get $300 in free credits to run, test, and deploy workloads.
- Google Cloud SDK.
- Python 3.11+.
-
Clone the repository:
git clone https://github.com/vladkol/video-avatars-agent cd video-avatars-agent -
Create a Python virtual environment and activate it:
We recommend using
uvuv venv .venv source .venv/bin/activate -
Install the Python dependencies:
uv pip install pip uv pip install -r agents/video_avatar_agent/requirements.txt uv pip install -r mcp/requirements.txt
-
Create a
.envfile in the root of the project by copying the.env-templatefile:cp .env-template .env
-
Update the
.envfile with your Google Cloud project ID, location, and the name of your GCS bucket for AI assets.
To start the MCP server run:
./deployment/run_mcp_local.shThe MCP server will run on http://localhost:8080.
To run the agent locally, use the run_agent_local.sh script:
./deployment/run_agent_local.shThis will:
- Register an Agent Engine resource for using with the session service.
- Start a local a web server with the ADK Web UI, which you can access in your browser.
To deploy the MCP server and the agent to Cloud Run, use the deploy.sh script:
./deployment/deploy.shThis script will:
- Register an Agent Engine resource for using with the session service.
- Deploy the MCP server to Cloud Run.
- Deploy the agent to Cloud Run, with the ADK Web UI.
-
Open the agent's ADK Web UI.
-
Insert content of assets/prompt.md file to the chat box.
-
Click on the paperclip button 📎, and attach 4 source strip files:
Note: It is important to select
view1.jpegfile first. The first view is what the video starts with. -
Hit Enter key to submit the request. The agent will start converting the script and generating videos.
This repository is licensed under the Apache 2.0 License - see the LICENSE file for details.
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
Code and data from this repository are intended for demonstration purposes only. It is not intended for use in a production environment.