Skip to content
/ tsr Public

Transcribe audio/video via whisper.cpp from the terminal (auto model download, SRT/JSON/plain output).

Notifications You must be signed in to change notification settings

ynbh/tsr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tsr

CLI tool to transcribe audio or video files using faster-whisper.

Requirements

  • Python >=3.13
  • ffmpeg available on PATH (required for video inputs)
  • yt-dlp available on PATH (optional, for URL transcription)
  • PortAudio runtime (required by sounddevice for tsr watch)

On macOS:

brew install ffmpeg yt-dlp portaudio

Install

Global install (recommended):

uv tool install git+https://github.com/ynbh/tsr.git

Then use tsr directly:

tsr --help

Local development:

uv sync
uv run tsr --help

Quick Start

  1. Download a model:
tsr download base
  1. (Optional) Set default model:
tsr model base
  1. Transcribe a file:
tsr run path/to/audio_or_video.mp4

This writes an .srt file by default next to the input.

Commands

tsr download [tiny|base|small|medium|large]

  • Prefetches and caches the mapped faster-whisper model into ~/.config/tsr/models.
  • large maps to large-v3.

tsr model [size]

  • Without size, opens an interactive model list (arrow keys + Enter) and saves your selection as default.
  • With size, updates default model in ~/.config/tsr/config.json.

tsr run <input> [--output <path>] [--format srt|json] [--model <size>] [--plain]

  • Transcribes supported audio/video input.
  • If the selected model is missing locally, it is downloaded automatically by faster-whisper.
  • If --output is omitted, output path is inferred from input extension.
  • --plain prints plaintext transcript to terminal and skips writing a file.

tsr watch [--wav-output <path>] [--output <path>] [--format srt|json] [--model <size>] [--chunk-seconds <n>] [--sample-rate <hz>] [--device <id|name>] [--plain/--no-plain]

  • Records microphone audio with sounddevice.
  • Writes a .wav file as audio arrives.
  • Transcribes fixed chunks and rewrites the transcript file until you stop with Ctrl+C.

Supported Input Types

  • Audio: .wav, .mp3, .m4a, .flac, .ogg, .opus
  • Video: .mp4, .mkv, .avi, .mov, .webm
  • URLs: Any site supported by yt-dlp (YouTube, Vimeo, etc.)

Architecture

flowchart LR
    subgraph Input
        URL[URL]
        FILE[Local File]
        MIC[Microphone]
    end

    subgraph Processing
        URL --> YTDLP[yt-dlp]
        YTDLP --> AUDIO[audio.wav]
        FILE --> FFMPEG[ffmpeg]
        FFMPEG --> AUDIO
        MIC --> AUDIO
        AUDIO --> WHISPER[faster-whisper]
    end

    subgraph Output
        WHISPER --> SRT[.srt]
        WHISPER --> JSON[.json]
        WHISPER --> PLAIN[plaintext]
    end

    subgraph Config
        MODELS[(~/.config/tsr/models cache)]
        CFG[(config.json)]
    end

    MODELS --> WHISPER
    CFG --> WHISPER
Loading

Notes

  • Video files are converted to mono 16kHz WAV with ffmpeg before transcription.
  • URLs are downloaded as audio-only via yt-dlp.
  • watch records mono PCM16 WAV and performs progressive chunk transcription.
  • faster-whisper loads models by name and caches them under ~/.config/tsr/models.
  • Subprocess failures from ffmpeg or yt-dlp are surfaced directly.

About

Transcribe audio/video via whisper.cpp from the terminal (auto model download, SRT/JSON/plain output).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages