Releases: vitormf/submatch
v0.7.0
What's new
New features
- Per-segment cross-language scoring — each segment now detects its own audio language via Whisper; cross-language scoring activates per segment rather than all-or-nothing, so dubbed or mixed-language files are handled correctly even when not every segment is cross-language
- Per-segment audio language in output —
--verbosenow showsasr[lang]for each segment; segment audio languages are also included in CSV and HTML reports - Language confidence gate — segments where Whisper reports low confidence are excluded from audio language voting; unsupported languages (Basque, Filipino) bail out early to avoid wasting time on unreliable transcription
Bug fixes
- Fixed crash on video containers that omit duration at the format level (e.g., raw MPEG-TS recordings)
- Fixed audio candidate positions exceeding the audio track duration, which caused ffmpeg errors on recordings padded with video after audio ends
- Fixed audio language voting to only count segments that pass the quality gate
- Fixed
segment_langspadding in cache store when fewer segments were transcribed than expected - Fixed quality gate not applying to the
--no-cachetranscription voting path
v0.6.1
Bug fixes
- Fix crash on embedded image track extraction — when ffmpeg failed to extract a VOBSUB or PGS subtitle track (e.g. corrupted stream, unsupported mux format), the
CalledProcessErrorpropagated and aborted extraction for all remaining tracks on that video. The fix catches the error per-track and skips the failing one, so other tracks continue normally. Text tracks (SRT, ASS) are unaffected.
v0.6.0
What's new
New features
- Image-based subtitle support (VOBSUB/PGS): submatch can now score bitmap subtitle formats.
pytesseractis bundled withpip install submatch; only the Tesseract engine binary needs to be installed separately. If Tesseract is missing when an image subtitle is processed, submatch exits with code 2 and prints installation instructions. - Cross-language threshold now defaults to 0.20: The
--cross-thresholddefault has been recalibrated from 0.35 to 0.20, based on empirical data showing true positive cross-language pairs typically score 0.24–0.49 while false positives peak at 0.18. Use--cross-thresholdto override. - Lazy sync: ffsubsync now runs only when the initial score is FAIL, cutting runtime for passing pairs.
- GPU mismatch detection: warns when CPU-only PyTorch is installed on a machine with an NVIDIA GPU, with instructions for installing the CUDA-enabled build.
- Crash telemetry: pipeline errors are reported to Sentry to help improve reliability. No file paths or personal data are transmitted. Opt out with
SUBMATCH_NO_TELEMETRY=1ortelemetry = falsein config.
Bug fixes
- Audio language detection: plurality rule now accepts ≥50% (was >50%), fixing edge cases where the correct audio language was rejected in content with mixed-language segments (e.g. segments that confuse Whisper into tagging parts as a different language).
- Temp file cleanup: resync temp files are cleaned up on copy failure.
Improvements
- Embedded subtitle tracks are extracted in a single ffmpeg pass (faster batch processing).
- Telemetry is automatically disabled on editable installs to avoid sending development errors to production.
v0.5.0
What's new
New features
- Persistent transcription cache — Whisper transcriptions are now saved to
~/.cache/submatch/keyed by video path, modification time, model, and segment count. Repeated runs on the same video skip audio extraction and Whisper entirely, making it fast to test multiple subtitles against the same video. - Audio-driven segment selection — Segments are now chosen using ffmpeg
silencedetectto locate speech-rich regions, independent of any subtitle file. This lets the cache work across all subtitle files tested against the same video. - Transcription quality gate — After each Whisper call, segments are validated (
no_speech_prob < 0.6, word count ≥ 3). If a candidate fails (silence, music, noise), the next candidate in the zone is tried automatically. The best available candidate is used as a fallback if all fail. --no-cache— Bypass the cache entirely and use the original subtitle-driven segment selection for a single run.--clear-cache— Delete all cached transcriptions and exit.- Cache configuration — Three new config keys:
cache_ttl_days(default: 30),cache_max_mb(default: 200),cache_dir(default:~/.cache/submatch). Cache is automatically evicted by TTL then LRU when limits are exceeded.
Bug fixes
- Language detection across zones now requires a strict majority (>50% of zones) before setting
audio_lang, preventing a falsecross_languageflag when some zones hit music or noise. - Cache hits are now returned correctly even when the
last_usedwrite-back fails (e.g. read-only filesystem).
v0.4.0
What's new
New features
--json FILE,--csv FILE,--html FILE: write results to JSON, CSV, or self-contained HTML report files. Breaking change:--jsonpreviously printed JSON to stdout; it now requires a file path. Update scripts from--jsonto--json output.json.--embedded: score subtitle tracks embedded in the video container (MKV, MP4, etc.) without needing external SRT files--watch: monitor a directory for new video/subtitle pairs and score them as they appear;--polland--intervalfor network mounts (NFS, SMB)- Config file support: set persistent defaults in
~/.config/submatch/config.tomlor./submatch.toml
Bug fixes
- Terminate child process groups (ffmpeg, ffs) on Ctrl+C to prevent orphan processes
- Fix config file validation for
--model/--devicechoices andsub_langstring values
v0.3.0
What's new
New features
- Fractional progress bar updates per segment and dynamic terminal resize support
- Transcription caching to skip re-transcribing already-processed segments
- ISO 639-2 language code normalisation
- Batch report headers showing source directory and pair count
Improvements
- Cross-language subtitle matching using multilingual sentence embeddings (paraphrase-multilingual-MiniLM-L12-v2)
v0.2.0
What's new
New features
- Flexible input: pass any mix of video files, subtitle files, and directories — submatch auto-pairs them
--no-recursiveflag to disable recursive directory scanning (directories are scanned recursively by default)- ffmpeg is now bundled via
static-ffmpeg— no system ffmpeg install required --drift-thresholdflag to control how many seconds of offset trigger a drift warning (default: 2.0)
Bug fixes
- Chinese, Japanese, and Korean subtitles now score correctly (character-level tokenization)
- Unknown file types (
.DS_Store,.nfo, images) are no longer misclassified as video when scanning directories - Spurious "no subtitles found" warnings are suppressed when inputs come from directory scans
- UTF-8 output on Windows no longer crashes with UnicodeEncodeError when piped
Improvements
- Parallel batch workers now default to up to 4 regardless of device
Install / upgrade
pip install --upgrade submatchv0.1.0 — Initial release
`submatch` verifies that a subtitle file actually matches the audio content of a video — catching the case where subtitle tools like subliminal or Bazarr return correctly-timed but wrong-content subtitles.
Install
```bash
pip install submatch
```
System dependencies: `ffmpeg` (`brew install ffmpeg`) and `ffsubsync` (`pip install ffsubsync`).
What's in this release
Core
- Transcribes short audio segments with Whisper and scores against subtitle text using token F1
- Dialogue-density segment sampling — picks the 30s windows with the most subtitle words per zone, skipping intros/credits
- Timing drift detection via ffsubsync, flagging offsets > 2s
- Three language signals: Whisper audio language, langdetect on subtitle text, filename convention + ffprobe metadata
- 4-state result system:
PASS,DRIFT(content matches but timing drift detected),FAIL(wrong content),UNSURE(insufficient transcription data) --resync: auto-correct drift in place on DRIFT;--pass-unsure: exit 0 for UNSURE results--keep-synced: save the timing-corrected subtitle to disk;--delete-failures: remove subtitle files that fail the match check
Cross-language matching
When subtitle and audio languages differ (e.g. English audio + Portuguese subtitles), scoring automatically switches from token F1 to multilingual semantic similarity via paraphrase-multilingual-MiniLM-L12-v2. Use --cross-threshold to tune the cutoff independently.
Batch mode
- Pair a directory of videos with their same-stem subtitles, or score one video against a subtitle directory
--recursive/-rfor Plex/Kodi/Jellyfin nested library layouts--sub-lang CODEto filter by language tag (e.g.pt,en,pt-BR)--filter GLOBto filter by filename pattern--workersfor parallel processing;--deviceto target CPU, MPS (Apple Silicon), or CUDA- Live progress with ETA, in-place result lines, and
--compactone-line-per-pair summary
Subtitle formats
SRT, WebVTT, and ASS/SSA — via pysubs2.
Output
Human-readable with ANSI colour, or --json for machine-readable output. Transcription results are cached per video so re-runs against a different subtitle skip re-transcription.
States and exit codes
| State | Meaning | Exit code |
|---|---|---|
PASS |
Content matches, no timing drift | 0 |
DRIFT |
Content matches, but timing drift detected | 1 (use --resync to fix in place) |
FAIL |
Content does not match | 1 |
UNSURE |
Not enough transcription data to decide | 1 (use --pass-unsure to exit 0) |
| — | Error (missing dependency, unreadable file, no audio track) | 2 |