Extract frames from videos and analyze them using AI-powered image recognition.
VisionFrameAnalyzer is a Go-based tool that:
β
Extracts frames from a video at set intervals using ffmpeg
β
Uses an AI-powered vision model to analyze and describe each frame
β
Provides a structured pipeline for video-to-image processing
- π Frame Extraction β Converts video frames into images
- πΌ AI-Powered Analysis β Describes each frame using an LLM vision model
- β‘ Multi-Frame Processing β Handles multiple frames efficiently
- π Detailed Logging β Provides structured logs for debugging
- Go (Golang)
- FFmpeg (Frame Extraction)
- Ollama (LLM-powered image analysis)
- Slog + Tint (Logging)
- Kubernetes Ready (Optional Multi-Cluster Support)
brew install ffmpeg
brew install ollama
go mod tidy
# Build the container
docker build -t vision-analyzer .
# Run the container
docker run -v $(pwd):/data vision-analyzer --video /data/input.mp4 --output /data/frames
- Ensure Ollama is running locally on port 11434
- The tool uses
llama3.2-vision:11b
model by default
--video
: Path to input video file (required)--output
: Output directory for frames (default: "output_frames")
# Basic usage
go run main.go --video path/to/video.mp4
# Specify custom output directory
go run main.go --video path/to/video.mp4 --output custom_output
# Show help
go run main.go --help
output_frames/
βββ video_name/
βββ frame_0001.jpg
βββ frame_0002.jpg
βββ analysis_results.json
βββ ...
The analysis_results.json
file contains frame-by-frame analysis:
[
{
"frame": "frame_0001.jpg",
"content": "Detailed analysis of frame contents..."
}
]
π½οΈ Automated Video Analysis β Extract insights from video feeds
π Content Moderation β Detect and describe images in video content
π Machine Learning Pipelines β Pre-process video datasets for AI models
MIT License. See LICENSE for details.