Skip to content

Releases: vitormf/submatch

v0.7.0

04 Jun 16:22

Choose a tag to compare

What's new

New features

  • Per-segment cross-language scoring — each segment now detects its own audio language via Whisper; cross-language scoring activates per segment rather than all-or-nothing, so dubbed or mixed-language files are handled correctly even when not every segment is cross-language
  • Per-segment audio language in output — --verbose now shows asr[lang] for each segment; segment audio languages are also included in CSV and HTML reports
  • Language confidence gate — segments where Whisper reports low confidence are excluded from audio language voting; unsupported languages (Basque, Filipino) bail out early to avoid wasting time on unreliable transcription

Bug fixes

  • Fixed crash on video containers that omit duration at the format level (e.g., raw MPEG-TS recordings)
  • Fixed audio candidate positions exceeding the audio track duration, which caused ffmpeg errors on recordings padded with video after audio ends
  • Fixed audio language voting to only count segments that pass the quality gate
  • Fixed segment_langs padding in cache store when fewer segments were transcribed than expected
  • Fixed quality gate not applying to the --no-cache transcription voting path

v0.6.1

03 Jun 12:58

Choose a tag to compare

Bug fixes

  • Fix crash on embedded image track extraction — when ffmpeg failed to extract a VOBSUB or PGS subtitle track (e.g. corrupted stream, unsupported mux format), the CalledProcessError propagated and aborted extraction for all remaining tracks on that video. The fix catches the error per-track and skips the failing one, so other tracks continue normally. Text tracks (SRT, ASS) are unaffected.

v0.6.0

03 Jun 09:00

Choose a tag to compare

What's new

New features

  • Image-based subtitle support (VOBSUB/PGS): submatch can now score bitmap subtitle formats. pytesseract is bundled with pip install submatch; only the Tesseract engine binary needs to be installed separately. If Tesseract is missing when an image subtitle is processed, submatch exits with code 2 and prints installation instructions.
  • Cross-language threshold now defaults to 0.20: The --cross-threshold default has been recalibrated from 0.35 to 0.20, based on empirical data showing true positive cross-language pairs typically score 0.24–0.49 while false positives peak at 0.18. Use --cross-threshold to override.
  • Lazy sync: ffsubsync now runs only when the initial score is FAIL, cutting runtime for passing pairs.
  • GPU mismatch detection: warns when CPU-only PyTorch is installed on a machine with an NVIDIA GPU, with instructions for installing the CUDA-enabled build.
  • Crash telemetry: pipeline errors are reported to Sentry to help improve reliability. No file paths or personal data are transmitted. Opt out with SUBMATCH_NO_TELEMETRY=1 or telemetry = false in config.

Bug fixes

  • Audio language detection: plurality rule now accepts ≥50% (was >50%), fixing edge cases where the correct audio language was rejected in content with mixed-language segments (e.g. segments that confuse Whisper into tagging parts as a different language).
  • Temp file cleanup: resync temp files are cleaned up on copy failure.

Improvements

  • Embedded subtitle tracks are extracted in a single ffmpeg pass (faster batch processing).
  • Telemetry is automatically disabled on editable installs to avoid sending development errors to production.

v0.5.0

01 Jun 21:14

Choose a tag to compare

What's new

New features

  • Persistent transcription cache — Whisper transcriptions are now saved to ~/.cache/submatch/ keyed by video path, modification time, model, and segment count. Repeated runs on the same video skip audio extraction and Whisper entirely, making it fast to test multiple subtitles against the same video.
  • Audio-driven segment selection — Segments are now chosen using ffmpeg silencedetect to locate speech-rich regions, independent of any subtitle file. This lets the cache work across all subtitle files tested against the same video.
  • Transcription quality gate — After each Whisper call, segments are validated (no_speech_prob < 0.6, word count ≥ 3). If a candidate fails (silence, music, noise), the next candidate in the zone is tried automatically. The best available candidate is used as a fallback if all fail.
  • --no-cache — Bypass the cache entirely and use the original subtitle-driven segment selection for a single run.
  • --clear-cache — Delete all cached transcriptions and exit.
  • Cache configuration — Three new config keys: cache_ttl_days (default: 30), cache_max_mb (default: 200), cache_dir (default: ~/.cache/submatch). Cache is automatically evicted by TTL then LRU when limits are exceeded.

Bug fixes

  • Language detection across zones now requires a strict majority (>50% of zones) before setting audio_lang, preventing a false cross_language flag when some zones hit music or noise.
  • Cache hits are now returned correctly even when the last_used write-back fails (e.g. read-only filesystem).

v0.4.0

31 May 15:23

Choose a tag to compare

What's new

New features

  • --json FILE, --csv FILE, --html FILE: write results to JSON, CSV, or self-contained HTML report files. Breaking change: --json previously printed JSON to stdout; it now requires a file path. Update scripts from --json to --json output.json.
  • --embedded: score subtitle tracks embedded in the video container (MKV, MP4, etc.) without needing external SRT files
  • --watch: monitor a directory for new video/subtitle pairs and score them as they appear; --poll and --interval for network mounts (NFS, SMB)
  • Config file support: set persistent defaults in ~/.config/submatch/config.toml or ./submatch.toml

Bug fixes

  • Terminate child process groups (ffmpeg, ffs) on Ctrl+C to prevent orphan processes
  • Fix config file validation for --model / --device choices and sub_lang string values

v0.3.0

29 May 18:48

Choose a tag to compare

What's new

New features

  • Fractional progress bar updates per segment and dynamic terminal resize support
  • Transcription caching to skip re-transcribing already-processed segments
  • ISO 639-2 language code normalisation
  • Batch report headers showing source directory and pair count

Improvements

  • Cross-language subtitle matching using multilingual sentence embeddings (paraphrase-multilingual-MiniLM-L12-v2)

v0.2.0

29 May 00:19

Choose a tag to compare

What's new

New features

  • Flexible input: pass any mix of video files, subtitle files, and directories — submatch auto-pairs them
  • --no-recursive flag to disable recursive directory scanning (directories are scanned recursively by default)
  • ffmpeg is now bundled via static-ffmpeg — no system ffmpeg install required
  • --drift-threshold flag to control how many seconds of offset trigger a drift warning (default: 2.0)

Bug fixes

  • Chinese, Japanese, and Korean subtitles now score correctly (character-level tokenization)
  • Unknown file types (.DS_Store, .nfo, images) are no longer misclassified as video when scanning directories
  • Spurious "no subtitles found" warnings are suppressed when inputs come from directory scans
  • UTF-8 output on Windows no longer crashes with UnicodeEncodeError when piped

Improvements

  • Parallel batch workers now default to up to 4 regardless of device

Install / upgrade

pip install --upgrade submatch

v0.1.0 — Initial release

28 May 13:46

Choose a tag to compare

`submatch` verifies that a subtitle file actually matches the audio content of a video — catching the case where subtitle tools like subliminal or Bazarr return correctly-timed but wrong-content subtitles.

Install

```bash
pip install submatch
```

System dependencies: `ffmpeg` (`brew install ffmpeg`) and `ffsubsync` (`pip install ffsubsync`).

What's in this release

Core

  • Transcribes short audio segments with Whisper and scores against subtitle text using token F1
  • Dialogue-density segment sampling — picks the 30s windows with the most subtitle words per zone, skipping intros/credits
  • Timing drift detection via ffsubsync, flagging offsets > 2s
  • Three language signals: Whisper audio language, langdetect on subtitle text, filename convention + ffprobe metadata
  • 4-state result system: PASS, DRIFT (content matches but timing drift detected), FAIL (wrong content), UNSURE (insufficient transcription data)
  • --resync: auto-correct drift in place on DRIFT; --pass-unsure: exit 0 for UNSURE results
  • --keep-synced: save the timing-corrected subtitle to disk; --delete-failures: remove subtitle files that fail the match check

Cross-language matching

When subtitle and audio languages differ (e.g. English audio + Portuguese subtitles), scoring automatically switches from token F1 to multilingual semantic similarity via paraphrase-multilingual-MiniLM-L12-v2. Use --cross-threshold to tune the cutoff independently.

Batch mode

  • Pair a directory of videos with their same-stem subtitles, or score one video against a subtitle directory
  • --recursive / -r for Plex/Kodi/Jellyfin nested library layouts
  • --sub-lang CODE to filter by language tag (e.g. pt, en, pt-BR)
  • --filter GLOB to filter by filename pattern
  • --workers for parallel processing; --device to target CPU, MPS (Apple Silicon), or CUDA
  • Live progress with ETA, in-place result lines, and --compact one-line-per-pair summary

Subtitle formats

SRT, WebVTT, and ASS/SSA — via pysubs2.

Output

Human-readable with ANSI colour, or --json for machine-readable output. Transcription results are cached per video so re-runs against a different subtitle skip re-transcription.

States and exit codes

State Meaning Exit code
PASS Content matches, no timing drift 0
DRIFT Content matches, but timing drift detected 1 (use --resync to fix in place)
FAIL Content does not match 1
UNSURE Not enough transcription data to decide 1 (use --pass-unsure to exit 0)
Error (missing dependency, unreadable file, no audio track) 2