Skip to content

v0.6.0

Choose a tag to compare

@vitormf vitormf released this 03 Jun 09:00
· 260 commits to main since this release

What's new

New features

  • Image-based subtitle support (VOBSUB/PGS): submatch can now score bitmap subtitle formats. pytesseract is bundled with pip install submatch; only the Tesseract engine binary needs to be installed separately. If Tesseract is missing when an image subtitle is processed, submatch exits with code 2 and prints installation instructions.
  • Cross-language threshold now defaults to 0.20: The --cross-threshold default has been recalibrated from 0.35 to 0.20, based on empirical data showing true positive cross-language pairs typically score 0.24–0.49 while false positives peak at 0.18. Use --cross-threshold to override.
  • Lazy sync: ffsubsync now runs only when the initial score is FAIL, cutting runtime for passing pairs.
  • GPU mismatch detection: warns when CPU-only PyTorch is installed on a machine with an NVIDIA GPU, with instructions for installing the CUDA-enabled build.
  • Crash telemetry: pipeline errors are reported to Sentry to help improve reliability. No file paths or personal data are transmitted. Opt out with SUBMATCH_NO_TELEMETRY=1 or telemetry = false in config.

Bug fixes

  • Audio language detection: plurality rule now accepts ≥50% (was >50%), fixing edge cases where the correct audio language was rejected in content with mixed-language segments (e.g. segments that confuse Whisper into tagging parts as a different language).
  • Temp file cleanup: resync temp files are cleaned up on copy failure.

Improvements

  • Embedded subtitle tracks are extracted in a single ffmpeg pass (faster batch processing).
  • Telemetry is automatically disabled on editable installs to avoid sending development errors to production.