Skip to content

veup-engineering/media-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#media-tools at VeUP

Video AI Tagger

A command-line tool that uses a multi-modal Large Language Model (LLM) to automatically analyze video files and generate descriptive metadata in a structured JSON format.

Given a video file, the tool extracts a series of frames, sends them to an OpenAI-compatible API, and saves the AI-generated analysis—including a title, summary, tags, and more—as a JSON file.

Features

  • Automated Video Analysis: Leverages models like GPT-4o to understand video content.
  • Structured JSON Output: Generates detailed, easy-to-parse JSON metadata for each video.
  • Batch Processing: Analyze multiple videos in a single command.
  • Flexible Configuration: Works with the official OpenAI API or any OpenAI-compatible endpoint.
  • Easy to Use: Simple command-line interface.

Requirements

  • Python 3.7+
  • An API key for an OpenAI-compatible service that provides a multi-modal model (e.g., GPT-4o).

Setup & Installation

  1. Clone the repository or download the script:

    git clone <repository_url>
    cd videotool
  2. Create and activate a Python virtual environment (recommended):

  3. Create and activate a Python virtual environment (required on many modern systems):

    python3 -m venv venv
    source venv/bin/activate
    # On Windows, use: venv\Scripts\activate
  4. Install the required Python libraries:

    pip install -r requirements.txt
  5. Configure your API credentials: Create a file named .env in the project root and add your API key.

    # .env
    OPENAI_API_KEY="your_api_key_here"
    
    # Optional: If using a custom endpoint, specify the base URL
    # OPENAI_BASE_URL="https://api.your-custom-provider.com/v1"

Usage

The script is run from the command line, passing the paths to the video files you want to analyze.

./video_tagger.py [OPTIONS] <VIDEO_FILE_1> [VIDEO_FILE_2] ...

Examples

Analyze a single video: This will create my_video.json in the same directory as the video file.

./video_tagger.py /path/to/my_video.mp4

Analyze multiple videos:

./video_tagger.py clip1.mov clip2.webm

Save all output to a specific directory: Use the -o or --output-dir flag.

mkdir analysis_results
./video_tagger.py videos/*.mp4 -o analysis_results

Command-Line Arguments

  • videos: (Required) One or more paths to video files.
  • -o, --output-dir: (Optional) Directory to save the output JSON files. If not provided, JSON files are saved next to their corresponding video files.
  • --api-key: (Optional) Your API key. Overrides the key in the .env file.
  • --base-url: (Optional) The base URL for the API. Overrides the URL in the .env file.

Supported Video Formats

This tool uses OpenCV for video processing, which in turn relies on FFmpeg. As a result, it supports a wide variety of common video formats and containers, including but not limited to:

  • MP4 (.mp4, .m4v)
  • MOV (.mov)
  • AVI (.avi)
  • MKV (.mkv)
  • WebM (.webm)
  • WMV (.wmv)

If OpenCV and its FFmpeg backend are correctly installed on your system, most standard video files should work without issue.

Example Output

For a video of a historical speech, the tool will generate a JSON file (<video_name>.json) with content similar to this:

{
  "title": "Martin Luther King Jr.'s 'I Have a Dream' Speech",
  "summary": "The video appears to be archival footage of Dr. Martin Luther King Jr. delivering a powerful and passionate speech to a large crowd. The setting seems to be a major public gathering, likely the March on Washington. The speaker is animated and uses strong gestures, indicating a speech of great significance.",
  "tags": [
    "speech",
    "historical",
    "civil rights",
    "Martin Luther King Jr.",
    "I Have a Dream",
    "activism"
  ],
  "visual_style": "black and white",
  "estimated_date": "c. 1963",
  "sentiment": "inspirational",
  "key_elements": [
    "Martin Luther King Jr.",
    "podium",
    "large crowd",
    "Washington Monument (inferred)",
    "microphones"
  ],
  "audio_inference": "A historic and influential political speech about civil rights, equality, and freedom."
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published