CapGen

A fast cross-platform CPU-first video/audio English-only transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces. A pip installable offline CLI tool with CUDA support is provided. By default, Voice Activity Detection (VAD) preprocessing is always enabled.

Requirements

Python 3.11
4 GB RAM

Usage (API)

Simply cURL the endpoint like in the following. Currently, the only available caption format are srt, vtt and txt.

curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

You can also redirect the output to a file.

curl "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH" | jq -r ".result" > result.srt

You can stream the captions in real-time with the following.

curl -N "https://winstxnhdw-CapGen.hf.space/api/v1/transcribe/stream?caption_format=$CAPTION_FORMAT" \
  -F "file=@$AUDIO_FILE_PATH"

Usage (CLI)

CapGen is available as a CLI tool with CUDA support. You can install it with pip.

pip install git+https://github.com/winstxnhdw/CapGen

You may also install CapGen with the necessary CUDA binaries.

pip install "capgen[cuda] @ git+https://github.com/winstxnhdw/CapGen"

Now, you can run the CLI tool with the following command.

capgen -c srt -o ./result.srt --cuda < ~/Downloads/audio.mp3

usage: capgen [-h] [-g] [-t] [-w] -c  -o  [file]

transcribe a compatible audio/video file into a chosen caption file format

positional arguments:
  file            the file path to a compatible audio/video

options:
  -h, --help      show this help message and exit
  -g, --cuda      whether to use CUDA for inference

cpu:
  -t, --threads   the number of CPU threads
  -w, --workers   the number of CPU workers

required:
  -c, --caption   the chosen caption file format
  -o, --output    the output file path

Development

You can install the required dependencies for your editor with the following.

poetry install

You can spin the server up locally with the following. You can access the Swagger UI at localhost:7860/api/docs.

docker build -f Dockerfile.build -t capgen .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 capgen

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
.github		.github
capgen		capgen
server		server
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.build		Dockerfile.build
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CapGen

Requirements

Usage (API)

Usage (CLI)

Development

About

Packages

Contributors 3

Languages

winstxnhdw/CapGen

Folders and files

Latest commit

History

Repository files navigation

CapGen

Requirements

Usage (API)

Usage (CLI)

Development

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Contributors 3

Languages

Packages