Skip to content

v1.4.0

Latest

Choose a tag to compare

@yibie yibie released this 23 Jun 08:40

v1.4.0 — Major Defuddle Upgrade

What's New

YouTube Transcript Extraction — Save a YouTube video page and get the full transcript with timestamps, speaker-turn detection, and chapter markers, all in clean Org-mode format.

26 Site-Specific Extractors — Defuddle now has specialized extractors for:

  • Video: YouTube (transcript!), Bilibili
  • AI Chat: ChatGPT, Claude, Gemini, Grok
  • Social: Twitter/X, Reddit (threaded comments), Hacker News, LinkedIn, Bluesky, Mastodon, Threads
  • Articles: Wikipedia, Medium, Substack, NY Times, GitHub (README/Issues/PRs), LeetCode
  • Forums: Discourse, LWN.net

When no extractor matches, the general extraction algorithm is significantly more aggressive at removing clutter than Readability.

New Setting: Language for Transcriptions & Extraction — Set a BCP 47 language code (e.g. zh-Hans, en, fr) in Options. This selects YouTube subtitle tracks and sets Accept-Language headers for site extractors.

How It Works

Previously, Defuddle was called synchronously without a URL — meaning none of the site-specific extractors ever triggered. Now it:

  1. Passes document.URL so Defuddle knows what site it's on
  2. Calls parseAsync() to allow API-fetched content (YouTube transcripts via InnerTube API)
  3. Passes the user's language preference for subtitle selection

Full Changes

  • Upgrade defuddle 0.6 → 0.19 (self-contained, no extra dependencies)
  • All 26 site extractors now active
  • YouTube InnerTube API transcript fetching with speaker diarization
  • New options page field: Language for Transcriptions & Extraction
  • parseAsync() pipeline for async content fetching
  • Add site extractor table to README
  • Comprehensive README rewrite with feature documentation