video-transcript turns public video URLs into structured transcript bundles, bilingual lecture notes, visual summaries, and optional short-form newsroom-style video assets.
skhynix_warroom_v5_en_shorts.mp4
Source files: examples/skhynix_warroom_v5_en_shorts.mp4 · examples/skhynix_warroom_v5_en_shorts_preview.mp4
It is built around a simple priority order:
- Reuse the best available subtitles first.
- Fall back to Whisper only when subtitles are missing.
- Preserve timestamps so the transcript stays referenceable.
- Split the transcript into sections and export publishable artifacts.
- Optionally hand the result to Pixelle for short-video rendering.
- Extracts transcripts from public video URLs supported by
yt-dlp - Prefers human subtitles, then auto subtitles, then audio transcription
- Preserves timestamps and emits structured transcript JSON
- Splits long transcripts into logical sections
- Captures section images from the source video with
ffmpeg - Falls back to OpenAI image generation when frames are unavailable
- Exports clean Markdown and Word
.docxdocuments - Supports bilingual output when the source is not Chinese
- Builds newsroom-style frame payloads for downstream Pixelle rendering
Anything supported by yt-dlp, including common workflows for:
- YouTube
- Bilibili
- TikTok / Douyin
- Vimeo
- TED
- Coursera
.
├── examples/
│ ├── demo_player.html
│ ├── skhynix_warroom_v5_en_shorts_preview.mp4
│ └── skhynix_warroom_v5_en_shorts.mp4
├── README.md
├── SKILL.md
├── references/
│ └── platforms_and_formats.md
├── scripts/
│ ├── capture_frames.py
│ ├── extract_transcript.py
│ ├── generate_docx.py
│ ├── newsroom_story_builder.py
│ ├── pixelle_end_to_end.py
│ ├── section_splitter.py
│ └── whisper_api.py
└── tests/
├── conftest.py
├── test_newsroom_story_builder.py
└── test_pixelle_end_to_end.py
scripts/extract_transcript.py tries, in order:
- Manual subtitles
- Auto-generated subtitles
- Audio download + Whisper transcription
Typical outputs include:
metadata.jsonraw_transcript.txttimestamped_transcript.jsonsections.jsontranscript_enhanced.md
scripts/section_splitter.py groups transcript segments into topic-level sections using pauses, transitions, and size heuristics.
scripts/capture_frames.py captures frames near section timestamps with ffmpeg. If frame extraction is not possible and OPENAI_API_KEY is available, it can fall back to AI-generated concept images.
scripts/generate_docx.py converts the enhanced Markdown output into a professionally formatted .docx document with headings, metadata, and embedded images.
scripts/newsroom_story_builder.py converts transcript content into newsroom-style cards.
scripts/pixelle_end_to_end.py reuses the transcript bundle, prepares story frames, and can pass the result to a sibling Pixelle-Video repo for rendering.
Python 3.11+ is recommended.
Install Python dependencies:
python3 -m pip install yt-dlp python-docx openai
python3 -m pip install openai-whisperSystem dependencies:
brew install ffmpeg
yt-dlp --version
ffmpeg -versionNotes:
openai-whisperis only needed when subtitles are unavailable and you want local transcription.openaiis needed for Whisper API usage and image-generation fallback.- Keep credentials out of the repository. Use environment variables only.
Pixelle-Video is not bundled with this repository and is not installed automatically when someone clones this repo or installs the skill.
If you only need transcript extraction, section splitting, Markdown export, or .docx export, you can ignore Pixelle entirely.
If you want to run scripts/pixelle_end_to_end.py, prepare Pixelle-Video separately:
# Example workspace layout
workspace/
├── video-transcript/
└── Pixelle-Video/
Recommended setup steps:
# 1. Create a workspace directory
mkdir -p workspace
cd workspace
# 2. Clone this repository
git clone https://github.com/ylouis83/video_transcript.git
# 3. Clone or place Pixelle-Video beside it
# Replace this with your actual Pixelle-Video source
git clone <pixelle-video-repo> Pixelle-Video
# 4. Install Pixelle-Video using its own README / setup instructions
cd Pixelle-Video
# ...install Pixelle-Video dependencies here...By default, scripts/pixelle_end_to_end.py looks for a sibling directory named Pixelle-Video.
If your Pixelle checkout lives somewhere else, pass it explicitly:
python3 scripts/pixelle_end_to_end.py \
"https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir ./output/pixelle_demo \
--story-mode newsroom \
--pixelle-repo /absolute/path/to/Pixelle-VideoExtract a transcript bundle:
python3 scripts/extract_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir ./output/example \
--no-imagesGenerate a Word document from an enhanced Markdown transcript:
python3 scripts/generate_docx.py \
--input ./output/example/transcript_enhanced.md \
--output ./output/example/transcript.docx \
--title "Transcript" \
--base-dir ./output/exampleRun the Pixelle-oriented end-to-end flow:
python3 scripts/pixelle_end_to_end.py \
"https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir ./output/pixelle_demo \
--story-mode newsroomIf Pixelle-Video lives beside this repository, the default path works automatically. Otherwise pass --pixelle-repo /path/to/Pixelle-Video.
Direct demo files:
examples/demo_player.htmlexamples/skhynix_warroom_v5_en_shorts_preview.mp4examples/skhynix_warroom_v5_en_shorts.mp4
The pipeline is designed to produce reusable transcript bundles rather than a single text file. Depending on the flags and source quality, the output directory may include:
metadata.jsonraw_transcript.txttimestamped_transcript.jsonsections.jsontranscript_enhanced.mdimages/.docxexports- Pixelle payload and final rendered video artifacts
The repository includes focused tests for the newsroom formatter and Pixelle helper logic:
python3 -m pytest testsIf pytest is not installed in your current environment:
python3 -m pip install pytest- Do not commit API keys, tokens, or generated deliverables.
- Keep local virtual environments and caches ignored.
OPENAI_API_KEYshould be provided via the shell environment, not hardcoded.Pixelle-Videoshould be installed separately and kept out of this repository unless you intentionally choose a monorepo layout.- Public uploads should include only source files, references, tests, and docs.
No license file is included yet. Add one before wider third-party reuse.