CLI tool to transcribe audio or video files using faster-whisper.
- Python
>=3.13 ffmpegavailable onPATH(required for video inputs)yt-dlpavailable onPATH(optional, for URL transcription)- PortAudio runtime (required by
sounddevicefortsr watch)
On macOS:
brew install ffmpeg yt-dlp portaudioGlobal install (recommended):
uv tool install git+https://github.com/ynbh/tsr.gitThen use tsr directly:
tsr --helpLocal development:
uv sync
uv run tsr --help- Download a model:
tsr download base- (Optional) Set default model:
tsr model base- Transcribe a file:
tsr run path/to/audio_or_video.mp4This writes an .srt file by default next to the input.
tsr download [tiny|base|small|medium|large]
- Prefetches and caches the mapped
faster-whispermodel into~/.config/tsr/models. largemaps tolarge-v3.
tsr model [size]
- Without
size, opens an interactive model list (arrow keys + Enter) and saves your selection as default. - With
size, updates default model in~/.config/tsr/config.json.
tsr run <input> [--output <path>] [--format srt|json] [--model <size>] [--plain]
- Transcribes supported audio/video input.
- If the selected model is missing locally, it is downloaded automatically by
faster-whisper. - If
--outputis omitted, output path is inferred from input extension. --plainprints plaintext transcript to terminal and skips writing a file.
tsr watch [--wav-output <path>] [--output <path>] [--format srt|json] [--model <size>] [--chunk-seconds <n>] [--sample-rate <hz>] [--device <id|name>] [--plain/--no-plain]
- Records microphone audio with
sounddevice. - Writes a
.wavfile as audio arrives. - Transcribes fixed chunks and rewrites the transcript file until you stop with
Ctrl+C.
- Audio:
.wav,.mp3,.m4a,.flac,.ogg,.opus - Video:
.mp4,.mkv,.avi,.mov,.webm - URLs: Any site supported by yt-dlp (YouTube, Vimeo, etc.)
flowchart LR
subgraph Input
URL[URL]
FILE[Local File]
MIC[Microphone]
end
subgraph Processing
URL --> YTDLP[yt-dlp]
YTDLP --> AUDIO[audio.wav]
FILE --> FFMPEG[ffmpeg]
FFMPEG --> AUDIO
MIC --> AUDIO
AUDIO --> WHISPER[faster-whisper]
end
subgraph Output
WHISPER --> SRT[.srt]
WHISPER --> JSON[.json]
WHISPER --> PLAIN[plaintext]
end
subgraph Config
MODELS[(~/.config/tsr/models cache)]
CFG[(config.json)]
end
MODELS --> WHISPER
CFG --> WHISPER
- Video files are converted to mono 16kHz WAV with
ffmpegbefore transcription. - URLs are downloaded as audio-only via
yt-dlp. watchrecords mono PCM16 WAV and performs progressive chunk transcription.faster-whisperloads models by name and caches them under~/.config/tsr/models.- Subprocess failures from
ffmpegoryt-dlpare surfaced directly.