tsr

CLI tool to transcribe audio or video files using faster-whisper.

Requirements

Python >=3.13
ffmpeg available on PATH (required for video inputs)
yt-dlp available on PATH (optional, for URL transcription)
PortAudio runtime (required by sounddevice for tsr watch)

On macOS:

brew install ffmpeg yt-dlp portaudio

Install

Global install (recommended):

uv tool install git+https://github.com/ynbh/tsr.git

Then use tsr directly:

tsr --help

Local development:

uv sync
uv run tsr --help

Quick Start

Download a model:

tsr download base

(Optional) Set default model:

tsr model base

Transcribe a file:

tsr run path/to/audio_or_video.mp4

This writes an .srt file by default next to the input.

Commands

tsr download [tiny|base|small|medium|large]

Prefetches and caches the mapped faster-whisper model into ~/.config/tsr/models.
large maps to large-v3.

tsr model [size]

Without size, opens an interactive model list (arrow keys + Enter) and saves your selection as default.
With size, updates default model in ~/.config/tsr/config.json.

tsr run <input> [--output <path>] [--format srt|json] [--model <size>] [--plain]

Transcribes supported audio/video input.
If the selected model is missing locally, it is downloaded automatically by faster-whisper.
If --output is omitted, output path is inferred from input extension.
--plain prints plaintext transcript to terminal and skips writing a file.

tsr watch [--wav-output <path>] [--output <path>] [--format srt|json] [--model <size>] [--chunk-seconds <n>] [--sample-rate <hz>] [--device <id|name>] [--plain/--no-plain]

Records microphone audio with sounddevice.
Writes a .wav file as audio arrives.
Transcribes fixed chunks and rewrites the transcript file until you stop with Ctrl+C.

Supported Input Types

Audio: .wav, .mp3, .m4a, .flac, .ogg, .opus
Video: .mp4, .mkv, .avi, .mov, .webm
URLs: Any site supported by yt-dlp (YouTube, Vimeo, etc.)

Architecture

flowchart LR
    subgraph Input
        URL[URL]
        FILE[Local File]
        MIC[Microphone]
    end

    subgraph Processing
        URL --> YTDLP[yt-dlp]
        YTDLP --> AUDIO[audio.wav]
        FILE --> FFMPEG[ffmpeg]
        FFMPEG --> AUDIO
        MIC --> AUDIO
        AUDIO --> WHISPER[faster-whisper]
    end

    subgraph Output
        WHISPER --> SRT[.srt]
        WHISPER --> JSON[.json]
        WHISPER --> PLAIN[plaintext]
    end

    subgraph Config
        MODELS[(~/.config/tsr/models cache)]
        CFG[(config.json)]
    end

    MODELS --> WHISPER
    CFG --> WHISPER

Notes

Video files are converted to mono 16kHz WAV with ffmpeg before transcription.
URLs are downloaded as audio-only via yt-dlp.
watch records mono PCM16 WAV and performs progressive chunk transcription.
faster-whisper loads models by name and caches them under ~/.config/tsr/models.
Subprocess failures from ffmpeg or yt-dlp are surfaced directly.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
tsr		tsr
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tsr

Requirements

Install

Quick Start

Commands

Supported Input Types

Architecture

Notes

About

Uh oh!

Releases

Packages

Languages

ynbh/tsr

Folders and files

Latest commit

History

Repository files navigation

tsr

Requirements

Install

Quick Start

Commands

Supported Input Types

Architecture

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages