WhisperX API

The WhisperX API is a containerized solution for transcribing audio files with diarization using the powerful whisperX project. This API provides an easy-to-use endpoint for audio transcription and is packaged into a Docker container for easy deployment.

Prerequisites

Docker with GPU support. Follow the instructions to install NVIDIA Docker.
A Huggingface API token.

Huggingface Token

Include your Huggingface access token that you can generate from Here. After generating the token, accept the user agreement for the following models:

Building the Docker Image

Rename config.py.example to config.py and update it with your Huggingface token:

mv config.py.example config.py
echo "HF_TOKEN = '<my-hf-token>'" > config.py

Replace with your actual Hugging Face token.

Build the Docker image for the WhisperX API:

docker build -t whisperx-api --network=host --build-arg hftoken=<my-hf-token> .

Again, replace <my-hf-token> with your Huggingface token. This might take a while.

Running the API

After building the Docker image, you can run the WhisperX API with:

docker run --gpus all -p 5000:5000 whisperx-api

This will start the API and make it accessible on port 5000.

Using the API

To transcribe an audio file, send a POST request to the API endpoint. Here's an example using curl:

curl http://127.0.0.1:5000/transcribe -X POST -F "file=@./audio_en.mp3"

Replace ./audio.mp3 with the path to your audio file.

The output looks as following:

{
   "segments" : [
      {
         "end" : 10.192,
         "speaker" : "SPEAKER_01",
         "start" : 2.883,
         "text" : " This is a test audio file of about phone line quality in English.",
         "words" : [
            {
               "end" : 3.043,
               "score" : 0.718,
               "speaker" : "SPEAKER_00",
               "start" : 2.883,
               "word" : "This"
            },
            {
               "end" : 3.163,
               "score" : 0.096,
               "speaker" : "SPEAKER_00",
               "start" : 3.123,
               "word" : "is"
            },
            {
               "end" : 3.344,
               "score" : 0.456,
               "speaker" : "SPEAKER_00",
               "start" : 3.324,
               "word" : "a"
            },
            <...>
         ],
      }
   ],
    "word_segments" : [
        {
            "end" : 3.043,
            "score" : 0.718,
            "speaker" : "SPEAKER_00",
            "start" : 2.883,
            "word" : "This"
        },
        {
            "end" : 3.163,
            "score" : 0.096,
            "speaker" : "SPEAKER_00",
            "start" : 3.123,
            "word" : "is"
        },
        {
            "end" : 3.344,
            "score" : 0.456,
            "speaker" : "SPEAKER_00",
            "start" : 3.324,
            "word" : "a"
        },
        <...>
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.dockerignore		.dockerignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
audio_en.mp3		audio_en.mp3
audio_nl.mp3		audio_nl.mp3
config.py.example		config.py.example
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperX API

Prerequisites

Huggingface Token

Building the Docker Image

Running the API

Using the API

About

Releases

Packages

Languages

License

tijszwinkels/whisperX-api

Folders and files

Latest commit

History

Repository files navigation

WhisperX API

Prerequisites

Huggingface Token

Building the Docker Image

Running the API

Using the API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages