Claude Code plugin that lets Claude "watch" videos.
Claude can read images natively but not video. This plugin extracts scene-change frames + audio transcript from any video — local file, public URL (Loom/YouTube/Vimeo/raw mp4), or private URL with auth — and feeds them to Claude as images + text.
/plugin marketplace add https://github.com/vusallyv/video-context-plugin.git
/plugin install video-context@video-context-marketplace
/video-context:setup
/video-context:setup installs ffmpeg, yt-dlp, whisper-cpp, and the whisper model. Skip it if you want — extract.sh auto-installs the binaries on first use, but the whisper model (~150MB) is only fetched by setup.
Private repo: works with your gh auth (or GITHUB_TOKEN).
Just paste a video link or path and ask Claude to analyze it:
Analyze this recording: https://www.loom.com/share/abc123 What does this video show? /Users/me/Downloads/bug-repro.mp4
Claude detects the video and runs the skill automatically.
Set VIDEO_AUTH_HEADER before invoking Claude (or tell Claude what it is in chat):
export VIDEO_AUTH_HEADER="Authorization: Bearer $TOKEN"For Atlassian/Jira basic auth:
export VIDEO_AUTH_HEADER="Authorization: Basic $(echo -n "$EMAIL:$JIRA_API_TOKEN" | base64)"- ffmpeg — required
- yt-dlp — recommended (handles Loom/YouTube/Vimeo)
- whisper-cpp + model — optional, for audio transcript
/video-context:setup installs everything on macOS (Homebrew) and Linux (apt/dnf/pacman/zypper; builds whisper-cpp from source). On macOS, extract.sh also auto-installs missing binaries on first use via brew. Windows: install manually.
| Env var | Default | Effect |
|---|---|---|
SCENE_THRESHOLD |
0.4 |
Lower = more frames. 0.2 for screen recordings. |
MAX_FRAMES |
20 |
Hard cap; trimmed evenly across timeline. |
FRAME_WIDTH |
1280 |
Downscale to save tokens. |
WHISPER_MODEL |
/opt/homebrew/share/whisper-cpp/ggml-base.en.bin |
Whisper model path. |
- Resolves source — local path used as-is; URLs fetched via
yt-dlp(orcurlwith auth). ffmpegscene-detect extracts keyframes atSCENE_THRESHOLD, plus the first + last frame.- Caps to
MAX_FRAMESevenly across timeline. - Extracts audio →
whisper-cpp→transcript.txt. - Prints frame paths + transcript so Claude can
Readeach frame and the text.
MIT