#media-tools at VeUP
A command-line tool that uses a multi-modal Large Language Model (LLM) to automatically analyze video files and generate descriptive metadata in a structured JSON format.
Given a video file, the tool extracts a series of frames, sends them to an OpenAI-compatible API, and saves the AI-generated analysis—including a title, summary, tags, and more—as a JSON file.
- Automated Video Analysis: Leverages models like GPT-4o to understand video content.
- Structured JSON Output: Generates detailed, easy-to-parse JSON metadata for each video.
- Batch Processing: Analyze multiple videos in a single command.
- Flexible Configuration: Works with the official OpenAI API or any OpenAI-compatible endpoint.
- Easy to Use: Simple command-line interface.
- Python 3.7+
- An API key for an OpenAI-compatible service that provides a multi-modal model (e.g., GPT-4o).
-
Clone the repository or download the script:
git clone <repository_url> cd videotool
-
Create and activate a Python virtual environment (recommended):
-
Create and activate a Python virtual environment (required on many modern systems):
python3 -m venv venv source venv/bin/activate # On Windows, use: venv\Scripts\activate
-
Install the required Python libraries:
pip install -r requirements.txt
-
Configure your API credentials: Create a file named
.envin the project root and add your API key.# .env OPENAI_API_KEY="your_api_key_here" # Optional: If using a custom endpoint, specify the base URL # OPENAI_BASE_URL="https://api.your-custom-provider.com/v1"
The script is run from the command line, passing the paths to the video files you want to analyze.
./video_tagger.py [OPTIONS] <VIDEO_FILE_1> [VIDEO_FILE_2] ...Analyze a single video:
This will create my_video.json in the same directory as the video file.
./video_tagger.py /path/to/my_video.mp4Analyze multiple videos:
./video_tagger.py clip1.mov clip2.webmSave all output to a specific directory:
Use the -o or --output-dir flag.
mkdir analysis_results
./video_tagger.py videos/*.mp4 -o analysis_resultsvideos: (Required) One or more paths to video files.-o, --output-dir: (Optional) Directory to save the output JSON files. If not provided, JSON files are saved next to their corresponding video files.--api-key: (Optional) Your API key. Overrides the key in the.envfile.--base-url: (Optional) The base URL for the API. Overrides the URL in the.envfile.
This tool uses OpenCV for video processing, which in turn relies on FFmpeg. As a result, it supports a wide variety of common video formats and containers, including but not limited to:
- MP4 (
.mp4,.m4v) - MOV (
.mov) - AVI (
.avi) - MKV (
.mkv) - WebM (
.webm) - WMV (
.wmv)
If OpenCV and its FFmpeg backend are correctly installed on your system, most standard video files should work without issue.
For a video of a historical speech, the tool will generate a JSON file (<video_name>.json) with content similar to this:
{
"title": "Martin Luther King Jr.'s 'I Have a Dream' Speech",
"summary": "The video appears to be archival footage of Dr. Martin Luther King Jr. delivering a powerful and passionate speech to a large crowd. The setting seems to be a major public gathering, likely the March on Washington. The speaker is animated and uses strong gestures, indicating a speech of great significance.",
"tags": [
"speech",
"historical",
"civil rights",
"Martin Luther King Jr.",
"I Have a Dream",
"activism"
],
"visual_style": "black and white",
"estimated_date": "c. 1963",
"sentiment": "inspirational",
"key_elements": [
"Martin Luther King Jr.",
"podium",
"large crowd",
"Washington Monument (inferred)",
"microphones"
],
"audio_inference": "A historic and influential political speech about civil rights, equality, and freedom."
}