<a href="https://colab.research.google.com/github/wassimchouchen/Automatic-Speech-Recognition-/blob/main/Deepgram_Speaker_Labeling_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transcribe any audio file with Deepgram!

**Make a copy of this notebook into your own drive, and follow the instructions below!** 🥳🥳🥳


----------------------------

# Get started:

1) Copy this notebook (`File > Save a copy in Drive`) or download the .ipynb (`File > Download > Download as .ipynb`).

2) Follow the instructions below!

----------------------------
#Instructions:

Running the following three cells will allow you to transcribe and diarize any audio you wish. That is, your output should contain labeled speakers. The comments in the code below point out the variables you can manipulate to modify your output as you desire.

Before running this notebook, you'll need to have a couple audio files on-hand
that you wish to transcribe. Once you have those files in a folder, you should be able to transcribe as you please. Just specify the filepaths as outlined below!

And by the way, if you haven't yet signed up for Deepgram, check out this link here: https://dpgr.am/7407694

# Step 1: Dependencies

Run this cell to download all necessary dependencies.

Note: You can run a cell by clicking the play button on the left or by clicking on the cell and pressing `shift`+`ENTER` at the same time. (Or `shift` + `return` on Mac).

In [1]:
! pip install requests ffmpeg-python
! pip install deepgram-sdk --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ffmpeg-python
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Installing collected packages: ffmpeg-python
Successfully installed ffmpeg-python-0.2.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting deepgram-sdk
  Downloading deepgram_sdk-2.5.0-py3-none-any.whl (17 kB)
Collecting aiohttp (from deepgram-sdk)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting websockets (from deepgram-sdk)
  Downloading websockets-11.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 kB[0m [31m13.0 MB/s[0m eta [3

# Step 2: Upload audio files to this Colab!

On the left, you'll see a side-bar with a folder icon. Click that icon, and you'll see a series of folders. This is where you'll upload your audios.

You can upload your files directly into this directory by clicking the upload icon in the top left. The icon looks like a sheet of paper with an upwards-pointing arrow on it.

Click the upload icon and select the audio file you wish to transcribe. It will take a few moments for the audio to appear, but once it does, move onto Step 3.

In [None]:
# Have you completed Step 2 above? 👀
# Do you see all your desired audio file(s) in the folder on the left? 📂

# Step 3: Transcription

Fill in the following variables:


* `dg_key` = Your personal Deepgram API key
* `MIMETYPE` = the type of audio file you're working with (mp3, mp4, m4a, etc.)
* `DIRECTORY` = The name of the folder that contains the audio(s) you wish to transcribe. Note, unless you created a new folder for your audios, the default `'.'` value should be fine. (Or, if you placed your audio file in the default `sample_data` folder, set the value of `DIRECTORY` to `sample_data`.)


Now run the cell! (`Shift` + `Enter`)

-----------



And by the way, if you're already a Deepgram user, and you're getting an error in this cell the most common fixes are:

1. You may need to update your installation of the deepgram-sdk.
2. You may need to check how many credits you have left in your Deepgram account.

In [5]:
from deepgram import Deepgram
import asyncio, json, os

'''
 Sign up at https://dpgr.am/7407694
 to get an API key and 45,000 minutes
 for free!
'''
dg_key = '932ffbfc9daaba5b03b12415537979aa72731b1e'
dg = Deepgram(dg_key)

'''
The most common audio formats and encodings we support
include mp3, mp4, mp2, aac, wav, flac, pcm, m4a, ogg, opus, and webm,
So feel free to adjust the `MIMETYPE` variable as needed
'''
MIMETYPE = 'wav'

#Note: You can use '.' if your audio is in the root
DIRECTORY = '.'


# Feel free to modify your model's parameters as you wish!
options = {
    "punctuate": True,
    "diarize": True,
    "model": 'general',
    "tier": 'nova'
}

#This function is what calls on the model to transcribe
def main():
    audio_folder = os.listdir(DIRECTORY)
    for audio_file in audio_folder:
        if audio_file.endswith(MIMETYPE):
          with open(f"{DIRECTORY}/{audio_file}", "rb") as f:
              source = {"buffer": f, "mimetype":'audio/'+MIMETYPE}
              res = dg.transcription.sync_prerecorded(source, options)
              with open(f"./{audio_file[:-4]}.json", "w") as transcript:
                  json.dump(res, transcript, indent=4)
    return

main()

In [None]:
'''
If the cell above succeeds, you should see JSON output file(s) as siblings
next to your audio files.

Note: There may be a small delay between when the cell finishes running
and when the JSON appears. This is normal. Just wait a few moments for
the JSON(s) to appear. It should take less than a minute, depending on
the size of your file(s).
'''

# Step 4: Check out your transcription!

The function below parses the output JSON and prints out the pure transcription of one of the files you just transcribed! (Make sure
the file you're trying to examine is indeed already loaded into the
folder on the left!)

Then run this cell (`Shift`+`Enter`) to see a speaker-labeled transcription of your audio!


----

Note: Due to Colab's underlying functionality, there will be a slight delay
between the moment this cell finishes running and the moment that the output
.txt file appears in the folder on the left-hand side of your screen.

This delay should take less than a minute, depending on how large the file is. Nevertheless, just wait a moment, and your transcript will appear! **bold text**

In [6]:
'''
The JSON is loaded with information, but if you just want to read the
transcript, run the code below!

One .txt file will be generated per JSON; this .txt file will contain
the diarized, human-readable transcript.
'''

TAG = 'SPEAKER '

def create_transcript(output_json, output_transcript):
  lines = []
  with open(output_json, "r") as file:
    words = json.load(file)["results"]["channels"][0]["alternatives"][0]["words"]
    curr_speaker = 0
    curr_line = ''
    for word_struct in words:
      word_speaker = word_struct["speaker"]
      word = word_struct["punctuated_word"]
      if word_speaker == curr_speaker:
        curr_line += ' ' + word
      else:
        tag = TAG + str(curr_speaker) + ':'
        full_line = tag + curr_line + '\n'
        curr_speaker = word_speaker
        lines.append(full_line)
        curr_line = ' ' + word
    lines.append(TAG + str(curr_speaker) + ':' + curr_line)
    with open(output_transcript, 'w') as f:
      for line in lines:
        f.write(line)
        f.write('\n')
  return

def print_transcript():
  for filename in os.listdir(DIRECTORY):
    if filename.endswith('.json'):
      output_transcript = os.path.splitext(filename)[0] + '.txt'
      create_transcript(filename, output_transcript)

print_transcript()

In [7]:
'''
If you see a .txt file appear in the folder on the left-hand
side of the screen, that means your speaker-labeled (read: diarized)
transcript has been generated!

Click into that file to read it. Or, if you wish to print the
transcript down below, run this cell!
'''

SEPARATOR = '--------------------------'

def print_lines(filename):
  with open(filename, 'r') as f:
    for line in f:
      print(line)

def print_transcript():
  for filename in os.listdir(DIRECTORY):
    if filename.endswith('.txt'):
      print_lines(filename)
      print(SEPARATOR)

print_transcript()

SPEAKER 0: Contest k t n rendez vous. Deux, Esque de voir, doctor. Catra. The contest is going to be proposed the veneer, sank,

--------------------------
