v1.4.0 — Major Defuddle Upgrade
What's New
YouTube Transcript Extraction — Save a YouTube video page and get the full transcript with timestamps, speaker-turn detection, and chapter markers, all in clean Org-mode format.
26 Site-Specific Extractors — Defuddle now has specialized extractors for:
- Video: YouTube (transcript!), Bilibili
- AI Chat: ChatGPT, Claude, Gemini, Grok
- Social: Twitter/X, Reddit (threaded comments), Hacker News, LinkedIn, Bluesky, Mastodon, Threads
- Articles: Wikipedia, Medium, Substack, NY Times, GitHub (README/Issues/PRs), LeetCode
- Forums: Discourse, LWN.net
When no extractor matches, the general extraction algorithm is significantly more aggressive at removing clutter than Readability.
New Setting: Language for Transcriptions & Extraction — Set a BCP 47 language code (e.g. zh-Hans, en, fr) in Options. This selects YouTube subtitle tracks and sets Accept-Language headers for site extractors.
How It Works
Previously, Defuddle was called synchronously without a URL — meaning none of the site-specific extractors ever triggered. Now it:
- Passes
document.URLso Defuddle knows what site it's on - Calls
parseAsync()to allow API-fetched content (YouTube transcripts via InnerTube API) - Passes the user's language preference for subtitle selection
Full Changes
- Upgrade defuddle 0.6 → 0.19 (self-contained, no extra dependencies)
- All 26 site extractors now active
- YouTube InnerTube API transcript fetching with speaker diarization
- New options page field: Language for Transcriptions & Extraction
parseAsync()pipeline for async content fetching- Add site extractor table to README
- Comprehensive README rewrite with feature documentation