<a target="_blank" href="https://colab.research.google.com/github/vizagite/transcribe_whisper/blob/main/transcribe.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


We are using Distilled whisper large model v3 with fp16 quantization which is pretty fast for inference and best results (just around 2 min for 2 hour audio transcription on free T4 Colab GPU with 90%+ accuracy)

In [None]:
!pip install -q pipx && apt install python3.11-venv

In [None]:
!pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python" #lets use package instead of rewriting model pipeline

In [None]:
# install flash attention if you are on Pro colab plan, it wont work on free and below Ampere GPUs
# !pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation --ignore-requires-python

In [None]:
import os

# since we are running headless, detaching inline things
os.environ["MPLBACKEND"] = "Agg"
os.environ["PATH"] += ":/root/.local/bin"
import matplotlib

In [None]:
# test if setup is good
!insanely-fast-whisper --model-name distil-whisper/distil-large-v3 --file-name https://huggingface.co/datasets/reach-vb/random-audios/resolve/main/ted_60.wav

In [None]:
# you can pass --hf-token ... (your huggingace token - https://hf.co/settings/tokens and fill details at https://huggingface.co/pyannote/speaker-diarization-3.1 form, if you want diarized transcript)
# upload audio file to colab and use the same filename below

!insanely-fast-whisper --model-name distil-whisper/distil-large-v3 --file-name audio_file.mp3 --task transcribe --transcript-path transcript.json --min-speakers 2

In [None]:
# lets print the above transcript json with proper format

import json
import datetime

def format_timestamp(seconds):
  seconds = int(seconds)
  return str(datetime.timedelta(seconds=seconds))

def process_json(file_path="transcript.json"):
    try:
        with open(file_path, 'r') as f:
            data = json.load(f)
    except FileNotFoundError:
        print(f"Error: File '{file_path}' not found.")
        return
    except json.JSONDecodeError:
        print(f"Error: Invalid JSON format in '{file_path}'.")
        return

    if "chunks" in data:
      for chunk in data["chunks"]:
          if "timestamp" in chunk and "text" in chunk:
              start_time = chunk["timestamp"][0]
              formatted_time = format_timestamp(start_time)
              print(f"{formatted_time} - {chunk['text']}")
      return
    # if diarized above with another model
    # person_dict = {"SPEAKER_01": "some name", "SPEAKER_02": "..", "SPEAKER_03": ".."}
    for chunk in data["speakers"]:
        if "timestamp" in chunk and "text" in chunk:
            start_time = chunk["timestamp"][0]
            formatted_time = format_timestamp(start_time)
            print(f"{formatted_time} - {person_dict[chunk['speaker']]} - {chunk['text']}")

process_json()