Skip to content

v0.1.0

Choose a tag to compare

@worldwonderer worldwonderer released this 14 Jun 04:10
· 47 commits to main since this release
c7ab23d

First public release of video-recap-skills — a Claude Code plugin that turns any video into a Chinese-narration recap, running on just ffmpeg and one Xiaomi MiMo API key (no GPU, no model downloads; macOS / Linux / Windows).

Highlights

  • One key, whole pipeline. ASR (mimo-v2.5-asr), VLM (mimo-v2.5), and TTS (mimo-v2.5-tts) all go through Xiaomi MiMo.
  • Research-first understanding. Story/character research feeds the VLM, so it names people on screen instead of "黑衣男子".
  • Agent writes, scripts execute. Five small independent skills + a thin orchestrator, communicating only via JSON/MP4 in a shared work_dir. An LLM review gate gives feedback before TTS.
  • Dynamic mix. Narration over a gap-fill–ducked original (the source/BGM swell back in the gaps, no dead air), with optional looped BGM and burned subtitles.
  • Cut mode. --edit-mode cut condenses a long video into a shorter narrated edit.
  • Multi-track timeline + optional 剪映 export. Assembly emits a backend-neutral timeline.json; --export-jianying writes an editable 剪映/JianYing draft (original clips, separate audio tracks, volume keyframes). Fully decoupled — the core render never depends on 剪映. Media is bundled by default so the draft opens on sandboxed macOS 剪映.

Requirements

  • ffmpeg on PATH, Python 3.10+ (standard library only — the pipeline needs no pip install)
  • A Xiaomi MiMo API key (MIMO_API_KEY)

Install

Ask Claude Code:

Install this plugin: https://github.com/worldwonderer/video-recap-skills

CI is green on Ubuntu / macOS / Windows. See the README to get started.