Releases · worldwonderer/video-recap-skills

27 Jun 16:53

worldwonderer

v0.3.3

e0d5345

v0.3.3 — 多视频解说 + 文件系统素材库复用 · 异源剪辑修复 · 跨 harness Latest

Latest

多源视频剪辑解说 + 文件系统素材库复用为主线，并合入跨 harness 支持、成片兼容性与竖屏字幕修复、解说评审硬闸等改进。

新增

多视频剪辑解说（cut 模式）。 一次传入多个源视频，按 source_id 选取片段剪成一个成片；项目级 multi_source_manifest.json 作为 recap / cut / assemble 的来源契约，每个 clip 带 source_id，重叠检测按源隔离。多视频 MVP 仅开放 --edit-mode cut，单视频 full/cut/dub 保持兼容。
文件系统素材库复用。 --material-library-dir + --save-materials / --use-materials 把每个源视频的分析沉淀为 grep 友好的 material.json / material.md / 追加式 materials_index.jsonl，不复制原始媒体；按源指纹 + 设置指纹门控恢复。无 DB / embedding / 语义检索，纯文件系统 + grep。
多源 provenance 透出。 assemble / inspect 在时间线与剪映草稿中保留 source_id / source_path；个别源缺失时按片段降级并标记，保留其余在场源。
video-understanding --brief-only：从已恢复 / 缓存产物重建 OUTPUT 时间轴 brief，不重跑抽帧 / ASR / VLM。
跨 harness 支持 + Claude Code marketplace (#50)：Codex 与 OpenClaw 直接读取 .claude-plugin 包。
解说评审 scorecard + dub-lint 硬闸 + partial-TTS 可见性 (#49)。

修复

异源 concat 几何归一化。 多源片段先归一到统一画布（scale / pad / setsar / fps / yuv420p）再 concat——分辨率 / 帧率不同的多视频可正常合成一个成片，不再让 ffmpeg 报错。
多源音轨按源处理。 个别无声源不再导致整段成片静音。
密钥脱敏更精确。 只脱敏凭证形态（tp- / sk- / gh*_ / AKIA / JWT 与 KEY=VALUE）与凭证命名的 JSON key，不再误伤 transcript / summary 里的 secret / token 等普通词。
出片强制 yuv420p + faststart，微信 / 手机可播、边下边播 (#51)。
字幕按探测画布缩放，修复竖屏 (9:16) 被拉伸 (#53)。

验证

全套 python3 scripts/test.py 全部 skill groups passed（551 tests），ruff / compileall clean；新增真实 ffmpeg 多分辨率 + 混合音频渲染测试与密钥脱敏 / provenance 降级测试。

Full Changelog: v0.3.2...v0.3.3

Assets 2

22 Jun 17:21

worldwonderer

v0.3.2

9a3b152

v0.3.2 — 新版剪映草稿 schema 导出

新增

新版剪映 schema-driven 草稿导出 (#48)：剪映导出从单文件 JSON 拼装拆成 schema / model / builder / track / writer 分层，草稿基线升级到 version: 360000、new_version: 111.0.0、app_version: 5.9.5-beta1，并补齐包含 common_mask 在内的新版 materials skeleton。
素材类型注册表与能力清单 (#48)：明确区分已支持的 video / audio / text / subtitle / speed，以及预留但暂不写出的 image/sticker/effect/mask 等类别；未知或暂不支持类别会输出 note 并跳过，避免生成畸形草稿。

改进

剪映导出保持可选、懒加载、stdlib-only (#48)：export_jianying.py 现在只是薄 facade，核心 ffmpeg 渲染路径不会导入任何 jianying_* 模块；timeline.json 仍是后端无关的 canonical input，ffmpeg 仍是最终成片判定标准。
草稿写入更安全 (#48)：保留非空目录避让、媒体打包、路径重写、临时目录原子替换；新增 draft_name 校验，拒绝空名、绝对路径、..、以及路径分隔符，防止错误名称逃逸草稿父目录。
BGM 循环与音量自动化覆盖更完整 (#48)：循环 BGM 会拆成多段铺满时间线，并把窗口内 KFTypeVolume 音量关键帧放到对应片段。

验证

ruff / py_compile / mypy --ignore-missing-imports 覆盖剪映导出模块。
相关 assemble/timeline 测试 84 passed；全项目 scripts/test.py 全部 skill groups passed。
GitHub Actions：ubuntu / macOS / Windows validate 全部通过。
本机剪映专业版 10.8.7 实测：生成并打开 schema E2E 草稿；历史 longvacation_2min_work/timeline.json 转出的草稿也已被剪映登记并可打开。

完整对比：v0.3.1...v0.3.2

Assets 2

22 Jun 01:49

worldwonderer

v0.3.1

2ae220e

v0.3.1 — 英→中配音模式（克隆原音色·跟随原片节奏）· 语速校准 · doctor 预检

新增

英语视频 → 中文配音 · 保留原音色 (#47)：编排器新增 --edit-mode dub。用自然语言触发即可把英文视频翻译成中文、并用原说话人的音色克隆配音（mimo-v2.5-tts-voiceclone，同一个 MIMO_API_KEY，纯 stdlib + ffmpeg，无 GPU、无新厂商）。和解说一样一次暂停、由 Agent 写稿（dub_script.json）：逐句忠实翻译、跟随原片节奏——每句放在原声被说出的时间、按该句区间贴合时长，不整体提速、不提前结束，原声怎么走配音就怎么走（替换原声而非叠加）。v1：单说话人、整轨替换（暂不保留背景音乐）。

改进

语速基准校准 (#45)：speech_rate 默认 3.5 → 3.9（实测 mimo-tts 中位，旧值系统性偏低 ~10-17%），并支持 SPEECH_RATE / SPEECH_SAFETY_MARGIN 环境变量覆盖（5 个 skill 一致）；字数预算与 7:3 覆盖率判定更贴近真实语速。
超时长截断不再静默 (#45)：解说超出片段时长触发的句末截断会输出提示日志，便于发现并改写。
doctor 预检能力清单 (#45)：--doctor 新增 ready / blocked / degraded / 可选升级分组汇总，每项附可执行建议。

完整对比：v0.3.0...v0.3.1

Assets 2

19 Jun 18:21

worldwonderer

v0.3.0

9a2c1c9

v0.3.0 — 长视频更稳 · 跨语言更干净 · 剪辑更顺眼 · 新增解说导航与压缩

本轮聚焦长视频稳定性、跨语言成片质量与剪辑观感，并新增解说导航与成片压缩工具。合并自 #28–#39，外加发布前评审修复。

新增

VLM 断点续传 + 429 自愈（#33）— 逐场景缓存 + 限流降并发重试；长视频遇到偶发限流不再从头重跑。
FOREIGN_SOURCE_AUDIO（#34）— 跨语言解说时把解说下的原声压到近静音，消除「怪音」。
SCENE_CUT_SNAP（#35）— 把剪辑边界吸附到原片硬切点，消除剪辑点闪烁。
OUTPUT_CRF / OUTPUT_PRESET / OUTPUT_MAX_HEIGHT（#36）— 成片压缩参数（demo 119MB → 16.9MB）。
咨询性 inspect + 故事板（#28、#29）— 只读导航（state / clip-map / 缩略图总览），缺失即降级、绝不阻断流程。

变更

review.py 自动按成片/原片时间轴评审（#31）· ASR 默认分段 30→15s（#32）· 自带字幕在不遮挡时也显示（#37）。

修复

cut 模式 pass2 简报 int/str scene_id 崩溃（#30）· 原声留白字幕滞后（#32）。

发布前评审修复

#36 健壮性：奇数 OUTPUT_MAX_HEIGHT 会产生奇数高度令 libx264 崩溃（空成片）→ 现强制宽高偶数；OUTPUT_CRF=0（无损）被误当假值改成 18 → 现原样保留（对 diff 做对抗式评审 + 实跑 ffmpeg 复现验证）。
CI 缺口：inspect 测试组只在 scripts/test.sh，而 CI 跑的是 scripts/test.py，#28 的 22 个测试从未在 CI 运行 → 已接入运行器。
#33 Windows CI：两处续传缓存测试未带 encoding="utf-8"，在 Windows cp1252 下崩溃 → 合并前已修。

其他

demo 换成《悠长假日》第一集 2 分钟解说（#38）+ README demo 链接（#39）。

全量测试（每个 skill 独立进程，三平台 CI 通过）：understanding 133 · cut 31 · voiceover 29 · assemble 129 · script 61 · orchestrator 59 · inspect 22。

Full Changelog: v0.2.3...v0.3.0

Assets 2

19 Jun 11:52

worldwonderer

v0.2.3

d104333

v0.2.3

一轮成片质量打磨：原声留白字幕更准（可自带字幕）、画面理解更密、解说去掉破折号、评审更稳。

新增

自带原声字幕（更准）。 解说留白处的原声字幕，除了 Agent 校对、ASR 兜底，现在可直接放一份准确字幕作为首选来源：work_dir/user_subtitles.json（[{start,end,text}]，默认成片时间轴；或 {"timeline":"source","lines":[...]} 用原片时间轴按剪辑计划自动映射）或 user_subtitles.srt / .ass。优先级：自带字幕 › Agent 校对的 original_subtitles.json › ASR 兜底。
逐帧采样随场景时长伸缩。 去掉每场景 6 帧的硬上限（约每 4 秒一帧、下限 3、上限 16，VLM_SECONDS_PER_FRAME / VLM_MAX_FRAMES），长场景的画面理解不再被饿死；VLM max_tokens 800→1500。
MiMo 视频概览可作主理解来源。 开启（--mimo-video-overview）时成为每个场景的主要描述，逐帧 frame_facts 仍作锚点与兜底；概览默认仍关闭。

变更

解说不再用破折号。 写作规则禁止破折号（——／—），渲染时再归一化为逗号兜底；只改字幕显示，不动 TTS 朗读。
解说评审更确定、只对硬伤拦。 评委固定 temperature=0+种子；只有幻觉／不完整能在严格模式拦截，文笔类意见降为提示；承认 background_research 与画面、对白并列为有效依据。
覆盖率指标按写作预算同速率计（统一 3.87 字／秒，含 speech_safety_margin），不再误报「讲得太少」；阈值提升为真正的 CONFIG 项。
ASR 人名按背景资料纠错（叶青眉→叶轻眉），严格限定一字之差且窗口本身不是已知人名，避免误改。
视频概览部分被审核拦截时降级（用可用分片产出、未覆盖场景回退逐帧），不再整体中止；概览帧率 mimo_video_fps 2→3。

修复

原声留白字幕与原声对不上。 精确来源（自带字幕／Agent 校对稿）按句区间裁剪精确落到留白：跨解说块的句子按时间比例切片、不再整句重复；过密的行截断显示而非丢成空白。

Full Changelog: v0.2.2...v0.2.3

Assets 2

17 Jun 16:49

worldwonderer

v0.2.2

94ae33d

v0.2.2：原声留白字幕校对 · 解说原声自然衔接 · 剪辑不切断台词

让分块解说的成片更连贯、更好看。

主要更新

原声留白也烧字幕，且可由 Agent 校对。 解说块之间留给原声的留白不再字幕空白——会烧上原声台词，并用 「」 与解说区分开。优先采用 Agent 校对过的 original_subtitles.json（订正 ASR 错字与人名、只保留真正出声的台词）；缺省时退回保守的 ASR 兜底（按句归到它所在的留白、跳过太密读不完的行）。修复了早先把没说出口的原声、整段 ASR 塞进小留白的过度渲染问题。
解说与原声自然衔接。 原声留白前的解说块把原声引出来，留白后的解说块接住原声刚呈现的内容——不再各说各的（评审新增 disjoint_handoff 检查）。
剪辑不再切断一句台词（cut 模式）。 片段结尾会向后吸附到最近的自然停顿（silence_periods.json，上限 2 秒），让原声把话说完；选片 brief 也提示在完整句尾收口。
字幕烧录预检（快速失败）。 缺少带 libass（subtitles 滤镜）的 ffmpeg 时，整条流程开跑前就报错并给出处置，而不是跑完理解 / VLM / ASR / TTS、到最后渲染才炸。
成片时直接给出解说评审入口（建议性，硬门禁仍是 validate.py）；并修正文档——字幕烧录是默认开启（用 --no-burn-subtitles 关闭）。

已在庆余年 EP01 → 5 分钟解说短片上端到端验证。完整记录见 CHANGELOG.md。合入 PR：#20 #21 #22。

Assets 2

17 Jun 00:47

worldwonderer

v0.2.1

961390f

v0.2.1 — block delivery: full-volume original blocks + one-line subtitle band

A delivery-quality release: narration now plays in blocks with the original audio breathing between them at full volume, and the burned-in subtitle band no longer compresses the picture.

Changed

Narration is delivered in BLOCKS, ~7:3. Each beat is a few sentences written as one continuous thought and synthesized as a single fluent TTS utterance — fixing the choppy, sentence-by-sentence delivery. Between blocks the recap leaves deliberate original-audio blocks (~30% of the timeline) where the original scene plays at full volume.
Original-audio blocks play at full volume. idle_orig_volume now defaults to 1.0 and duck_bridge_seconds to 1.5 (was 12) — the original is ducked only under a narration block and swells back to full in the gaps, instead of one permanent low bed. Reverses the 0.2.0 "continuous bed". Tune with IDLE_ORIG_VOLUME / DUCK_BRIDGE_SECONDS.
Burned-in subtitles split into short one-line chunks timed karaoke-style across each block; the source-subtitle masking band sizes for ONE line (~14% of height) instead of two (~23%) — no longer compresses the picture.
Brief + lint steer block authoring, with a block-coverage lint (no_original_blocks / under_narrated / no_original_breaks / fragmented_beats) replacing the per-sentence density lint.

Fixed

Blocks are no longer truncated by the speed-up. voiceover sized a segment against the raw TTS duration, ignoring the narration_speed (1.3×) atempo assemble applies before placement — clipping well-sized blocks into fragments. The truncation budget now accounts for narration_speed.

275 tests green on macOS / Linux / Windows; brief.py⇄narration.py parity preserved. See PR #17.

Full changelog: v0.2.0...v0.2.1

Assets 2

15 Jun 17:20

worldwonderer

v0.2.0

187dd2b

v0.2.0 — agentic re-architecture + continuous-bed mix

A quality-focused release: cut mode is in sync by construction, the narration mix is a continuous bed, the narration itself is less "cold", and full-episode understanding survives the MiMo cluster's rate limit.

Changed

Cut mode is cut-first / narrate-second. The cut is rendered from clip_plan.json first, then narration is written against the real output timeline — no source→output remap, so narration and picture stay in sync by construction. Full mode unchanged.
Continuous original-audio bed. The original is ducked into one held low bed under the narration; inter-beat gaps shorter than duck_bridge_seconds (default 12s) no longer pop the source dialogue back up. Tune with DUCK_BRIDGE_SECONDS.
Narration density is a guide, not a quota — no padding with filler/pixel-description to hit a number.
--consolidate story index on by default (backward-compatible manifest shim).
Research directive only fires when the substrate is thin/empty.

Added

Cut-desync floor: normalized-plan lint + blocking preflight before TTS, with --allow-sparse-cut.
Phase ledger (recap_phase.json) for deterministic cut-mode resume.
duck_bridge_seconds config knob.

Fixed

Long-video understanding rides out MiMo cluster rate limits (retries 10× / 60s cap / 10s floor; optional ASR_THROTTLE_SECONDS).
Resume cannot reuse stale artifacts (proves current bytes/settings before reuse).

Full notes: CHANGELOG.md. Verified: 261 tests across 6 groups, ruff clean, no-ffmpeg CI green on macOS/Linux/Windows.

Assets 2

14 Jun 04:10

worldwonderer

v0.1.0

c7ab23d

v0.1.0

First public release of video-recap-skills — a Claude Code plugin that turns any video into a Chinese-narration recap, running on just ffmpeg and one Xiaomi MiMo API key (no GPU, no model downloads; macOS / Linux / Windows).

Highlights

One key, whole pipeline. ASR (mimo-v2.5-asr), VLM (mimo-v2.5), and TTS (mimo-v2.5-tts) all go through Xiaomi MiMo.
Research-first understanding. Story/character research feeds the VLM, so it names people on screen instead of "黑衣男子".
Agent writes, scripts execute. Five small independent skills + a thin orchestrator, communicating only via JSON/MP4 in a shared work_dir. An LLM review gate gives feedback before TTS.
Dynamic mix. Narration over a gap-fill–ducked original (the source/BGM swell back in the gaps, no dead air), with optional looped BGM and burned subtitles.
Cut mode. --edit-mode cut condenses a long video into a shorter narrated edit.
Multi-track timeline + optional 剪映 export. Assembly emits a backend-neutral timeline.json; --export-jianying writes an editable 剪映/JianYing draft (original clips, separate audio tracks, volume keyframes). Fully decoupled — the core render never depends on 剪映. Media is bundled by default so the draft opens on sandboxed macOS 剪映.

Requirements

ffmpeg on PATH, Python 3.10+ (standard library only — the pipeline needs no pip install)
A Xiaomi MiMo API key (MIMO_API_KEY)

Install

Ask Claude Code:

Install this plugin: https://github.com/worldwonderer/video-recap-skills

CI is green on Ubuntu / macOS / Windows. See the README to get started.

Assets 2

Releases: worldwonderer/video-recap-skills

v0.3.3 — 多视频解说 + 文件系统素材库复用 · 异源剪辑修复 · 跨 harness

新增

修复

验证

Uh oh!

v0.3.2 — 新版剪映草稿 schema 导出

新增

改进

验证

Uh oh!

v0.3.1 — 英→中配音模式（克隆原音色·跟随原片节奏）· 语速校准 · doctor 预检

新增

改进

Uh oh!

v0.3.0 — 长视频更稳 · 跨语言更干净 · 剪辑更顺眼 · 新增解说导航与压缩

新增

变更

修复

发布前评审修复

其他

Uh oh!

v0.2.3

新增

变更

修复

Uh oh!

v0.2.2：原声留白字幕校对 · 解说原声自然衔接 · 剪辑不切断台词

主要更新

Uh oh!

v0.2.1 — block delivery: full-volume original blocks + one-line subtitle band

Changed

Fixed

Uh oh!

v0.2.0 — agentic re-architecture + continuous-bed mix

Changed

Added

Fixed

Uh oh!

v0.1.0

Highlights

Requirements

Install

Uh oh!