Skip to content

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

Notifications You must be signed in to change notification settings

winstxnhdw/CapGen

Repository files navigation

CapGen

linting: pylint main.yml deploy.yml formatter.yml warmer.yml dependabot.yml

Open in Spaces Open a Pull Request

A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces. A pip installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.

Requirements

  • Python 3.11
  • 4 GB RAM

Usage (API)

Simply cURL the endpoint like in the following. Currently, the only available caption format are srt, vtt and txt.

curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

You can also redirect the output to a file.

curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt

You can stream the captions in real-time with the following.

curl -N "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe/stream?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

Usage (CLI)

CapGen is available as a CLI tool with CUDA support. You can install it with pip.

pip install git+https://github.com/winstxnhdw/CapGen

You may also install CapGen with the necessary CUDA binaries.

pip install "capgen[cuda] @ git+https://github.com/winstxnhdw/CapGen"

Now, you can run the CLI tool with the following command.

capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3
usage: capgen [-h] [-g] [-t] [-w] -c  -o  [file]

transcribe a compatible audio/video file into a chosen caption file format

positional arguments:
  file            the file path to a compatible audio/video

options:
  -h, --help      show this help message and exit
  -g, --cuda      whether to use CUDA for inference

cpu:
  -t, --threads   the number of CPU threads
  -w, --workers   the number of CPU workers

required:
  -c, --caption   the chosen caption file format
  -o, --output    the output file path

Development

You can install the required dependencies for your editor with the following.

poetry install

You can spin the server up locally with the following. You can access the Swagger UI at localhost:7860/api/docs.

docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen