v0.1.0
First public release of video-recap-skills — a Claude Code plugin that turns any video into a Chinese-narration recap, running on just ffmpeg and one Xiaomi MiMo API key (no GPU, no model downloads; macOS / Linux / Windows).
Highlights
- One key, whole pipeline. ASR (
mimo-v2.5-asr), VLM (mimo-v2.5), and TTS (mimo-v2.5-tts) all go through Xiaomi MiMo. - Research-first understanding. Story/character research feeds the VLM, so it names people on screen instead of "黑衣男子".
- Agent writes, scripts execute. Five small independent skills + a thin orchestrator, communicating only via JSON/MP4 in a shared
work_dir. An LLM review gate gives feedback before TTS. - Dynamic mix. Narration over a gap-fill–ducked original (the source/BGM swell back in the gaps, no dead air), with optional looped BGM and burned subtitles.
- Cut mode.
--edit-mode cutcondenses a long video into a shorter narrated edit. - Multi-track timeline + optional 剪映 export. Assembly emits a backend-neutral
timeline.json;--export-jianyingwrites an editable 剪映/JianYing draft (original clips, separate audio tracks, volume keyframes). Fully decoupled — the core render never depends on 剪映. Media is bundled by default so the draft opens on sandboxed macOS 剪映.
Requirements
ffmpegonPATH, Python 3.10+ (standard library only — the pipeline needs nopip install)- A Xiaomi MiMo API key (
MIMO_API_KEY)
Install
Ask Claude Code:
Install this plugin: https://github.com/worldwonderer/video-recap-skills
CI is green on Ubuntu / macOS / Windows. See the README to get started.