### Deepgram Live Transcription API Demo

This is a demonstration of the live transcription API offered by Deepgram, using their in-house Nova-2 model. You can use this notebook to transcribe live audio using your device's microphone input or a streaming audio from the Internet, using a websocket connection with Deepgram's API server.

Transcribing from a microphone will unfortunately only work on your local machine, and with macOS or linux as it requires portaudio.

Speaker diarization is not available for live transcription.

The radio streaming function by running the notebook on JupyterLab / Vertex AI workbench. There are some display issues on Colab unfortunately.

Running the service requires credits for Deepgram API (provided by a trial account). Please get in touch with @yaufai.chau on Slack if credits runs out or you have futher enquiries.


In [None]:
# Uncomment the line below to install portaudio on macOS
!brew install portaudio

# Uncomment the lines below install portaudio on Linux
# !sudo apt-get install portaudio19-dev


Install dependencies

In [None]:
%pip install deepgram-sdk httpx ipywidgets pyaudio

API Key and imports

In [7]:
from deepgram import (
    DeepgramClient,
    LiveTranscriptionEvents,
    LiveOptions,
    Microphone,
)
import httpx
from IPython.display import display, HTML, Markdown
import threading
import ipywidgets as widgets
import time

# Please create your own free API key at https://console.deepgram.com/signup
DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"

The main transcription code

In [3]:
is_finals = []
def live_transcription(streaming_url=None):
    global is_finals
    try:
        deepgram = DeepgramClient(DEEPGRAM_API_KEY)
        dg_connection = deepgram.listen.websocket.v('1')
        stop_event = threading.Event()
        
        def on_open(self, open, **kwargs):
            print("Connection Open")
            
        def on_message_mic(self, result, **kwargs):
            global is_finals
            sentence = result.channel.alternatives[0].transcript
            if len(sentence) == 0:
                return
            if result.is_final:
                is_finals.append(sentence)
                if result.speech_final:
                    utterance = " ".join(is_finals)
                    display(Markdown(f'**Transcript:** {utterance}'))
                    is_finals = []

        def on_message_radio_stream(self, result, **kwargs):
            global is_finals
            sentence = result.channel.alternatives[0].transcript
            if len(sentence) == 0:
                return
            else:
                display(Markdown(f'**Transcript:** {sentence}'))
                time.sleep(RADIO_TRANSCRIPT_INTERVAL)

        def on_metadata(self, metadata, **kwargs):
            print(f"Metadata: {metadata}")

        def on_speech_started(self, speech_started, **kwargs):
            print("Speech Started")

        def on_utterance_end(self, utterance_end, **kwargs):
            global is_finals
            # print("Utterance End")
            if len(is_finals) > 0:
                utterance = " ".join(is_finals)
                # print(f"Utterance End: {utterance}")
                display(Markdown(f'**Transcript:** {utterance}'))
                is_finals = []

        def on_close(self, close, **kwargs):
            print("Connection Closed")

        def on_error(self, error, **kwargs):
            print(f"Handled Error: {error}")

        def on_unhandled(self, unhandled, **kwargs):
            print(f"Unhandled Websocket Message: {unhandled}")

        # dg_connection.on(LiveTranscriptionEvents.Open, on_open)
        # dg_connection.on(LiveTranscriptionEvents.Metadata, on_metadata)
        # dg_connection.on(LiveTranscriptionEvents.SpeechStarted, on_speech_started)

        # dg_connection.on(LiveTranscriptionEvents.Close, on_close)
        dg_connection.on(LiveTranscriptionEvents.Error, on_error)
        dg_connection.on(LiveTranscriptionEvents.Unhandled, on_unhandled)

        # Shared options for radio or microphone transcription
        options: LiveOptions = LiveOptions(
            model="nova-2",
            language="en-US",
            smart_format=True,)

        if streaming_url:
            # Send streaming audio from the URL to Deepgram
            # Define output format for radio streaming transcription
            dg_connection.on(LiveTranscriptionEvents.Transcript, on_message_radio_stream)

            def stream_audio_url():
                print("Audio stream started...")
                with httpx.stream('GET', streaming_url) as r:
                    for data in r.iter_bytes():
                        if stop_event.is_set():
                            break
                        dg_connection.send(data)
                print("Audio stream thread stopping...")

            audio_thread = threading.Thread(target=stream_audio_url)
        else:
            # Send streaming audio from the microphone to Deepgram
            # Define output format for microphone transcription
            dg_connection.on(LiveTranscriptionEvents.Transcript, on_message_mic)
            dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)

            # Additional options for microphone transcription
            additional_options = {
                "encoding": "linear16",
                "channels": 1,
                "sample_rate": 16000,
                "interim_results": True,
                "utterance_end_ms": "1000",
                "vad_events": True,
                "endpointing": 100,
            }

            for key, value in additional_options.items():
                setattr(options, key, value)

            def stream_audio_mic():
                print("Mic stream started...")
                # Open a microphone stream on the default input device
                microphone = Microphone(dg_connection.send)
                microphone.start()
                while not stop_event.is_set():
                    time.sleep(0.1)
                microphone.finish()
                print("Mic stream thread stopping...")

            audio_thread = threading.Thread(target=stream_audio_mic)
        
        dg_connection.start(options)
        audio_thread.start()

        # Create a button to stop transcription
        stop_button = widgets.Button(
            description='Stop Transcription',
            disabled=False,
            button_style='',  # 'success', 'info', 'warning', 'danger' or ''
            tooltip='Click to stop transcription',
            icon='stop'  # (FontAwesome names without the `fa-` prefix)
        )
        display(stop_button)

        def on_button_click(b):
            stop_event.set()  # Signal all threads to stop
            stop_button.description = "Stopping..."
            stop_button.disabled = True
            audio_thread.join()
            dg_connection.finish()
            print('Finished')
            stop_button.description = "Stopped"

        stop_button.on_click(on_button_click)

    except Exception as e:
        print(f'Error occurred: {e}')
        return

Run the cell and press the Live transcribe button to transcribe microphone input  
(Requies running the notebook to your local laptop)

In [8]:
# Create a button to start mic transcription
start_button = widgets.Button(
    description='Live transcribe',
    disabled=False,
    button_style='info',  # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to start transcription',
    icon='play'
)
display(start_button)

def on_start_button_click(b):
    live_transcription()  # Start transcription from mic

start_button.on_click(on_start_button_click)

Button(button_style='info', description='Live transcribe', icon='play', style=ButtonStyle(), tooltip='Click to…

Mic stream started...


Button(description='Stop Transcription', icon='stop', style=ButtonStyle(), tooltip='Click to stop transcriptio…

**Transcript:** Hello. This is a test of the live transcription capability of Deepgram API. You can transcribe your live voice or a radio stream

**Transcript:** using this API service

Mic stream thread stopping...
Finished


Start a radio stream from Internet  
(Display will not work on Colab)

In [9]:
# Times Radio streaming URL
URL = 'https://timesradio.wireless.radio/stream'

# Create a audio player using HTML for a live stream with autoplay
audio_html = f"""
<audio controls autoplay>
  <source src="{URL}" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>
"""

display(HTML(audio_html))

RADIO_TRANSCRIPT_INTERVAL = 3 # seconds
# Start live transcription
live_transcription(URL)
# It might take a few seconds for the first sentence to appear.

Audio stream started...


Button(description='Stop Transcription', icon='stop', style=ButtonStyle(), tooltip='Click to stop transcriptio…

**Transcript:** The one

**Transcript:** election campaign

**Transcript:** and it was the first one I directed for the Labour Party in 1987. That was a campaign

Audio stream thread stopping...
Finished
