<a href="https://colab.research.google.com/github/tosinadegoke/Speech_To_Text-Project-with-DeepSpeech/blob/main/STT_Project_with_DeepSpeech.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Speech

(DeepSpeech on Github)[https://github.com/mozilla/DeepSpeech]

#### Installing Package

In [None]:
!pip install deepspeech

Collecting deepspeech
  Downloading deepspeech-0.9.3-cp37-cp37m-manylinux1_x86_64.whl (9.2 MB)
[K     |████████████████████████████████| 9.2 MB 16.6 MB/s 
Installing collected packages: deepspeech
Successfully installed deepspeech-0.9.3


#### Importing Packages

In [None]:
import pandas as pd
import tensorflow as tf
import random
import numpy as np
import os
import wave

from deepspeech import Model
from IPython.display import Audio

#### Downloading DeepSpeech Models and Files

In [None]:
!wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
!wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

--2022-04-07 20:59:54--  https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/60273704/8b25f180-3b0f-11eb-8fc1-de4f4ec3b5a3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220407%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220407T205954Z&X-Amz-Expires=300&X-Amz-Signature=845b9987f962fca2ec92a0338eb38e307a74d74bfc83591a217d61445c19b461&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.9.3-models.pbmm&response-content-type=application%2Foctet-stream [following]
--2022-04-07 20:59:54--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/60273704/8b25f180-3b0f-11eb-8

In [None]:
!ls

#### Setting the environment and get an Instance of the Model

In [None]:
model_file_path = 'deepspeech-0.9.3-models.pbmm'
lm_file_path = 'deepspeech-0.9.3-models.scorer'
beam_width = 100
lm_alpha = 0.93
lm_beta = 1.18

model = Model(model_file_path)                    # Acoustic
model.enableExternalScorer(lm_file_path)          # Language

In [None]:
model.setScorerAlphaBeta(lm_alpha, lm_beta)
model.setBeamWidth(beam_width)

0

#### Mount Google Drive

In [None]:
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


#### Get the Dataset's directory

In [None]:
dataset = '/content/gdrive/MyDrive/LJSpeech-1.1'
wavs_path = dataset + "/wavs/"
metadata_path = dataset + "/metadata.csv"


# Read metadata file and parse it
metadata_df = pd.read_csv(metadata_path, sep="|", header=None, quoting=3)
metadata_df.columns = ["file_name", "transcription", "normalized_transcription"]
metadata_df = metadata_df[["file_name", "normalized_transcription"]]
metadata_df = metadata_df.sample(frac=1).reset_index(drop=True)
metadata_df.head(6)


Unnamed: 0,file_name,normalized_transcription
0,LJ016-0361,the moment too that the condemned man had pass...
1,LJ006-0188,"The days were passed in idleness, debauchery, ..."
2,LJ001-0139,"a blemish which can be nearly, though not whol..."
3,LJ024-0063,"President Wilson, three; President Harding, fo..."
4,LJ014-0162,But it may be mentioned that the concourse was...
5,LJ013-0204,"he admitted that he had been justly convicted,..."


In [None]:
print(f"Total size of the Dataset: {len(metadata_df)}")
metadata_df   # List the data in the datased

Total size of the Dataset: 6914


Unnamed: 0,file_name,normalized_transcription
0,LJ016-0361,the moment too that the condemned man had pass...
1,LJ006-0188,"The days were passed in idleness, debauchery, ..."
2,LJ001-0139,"a blemish which can be nearly, though not whol..."
3,LJ024-0063,"President Wilson, three; President Harding, fo..."
4,LJ014-0162,But it may be mentioned that the concourse was...
...,...,...
6909,LJ012-0038,"but could not positively identify it, and Ikey..."
6910,LJ016-0360,The change added greatly to the responsibiliti...
6911,LJ013-0182,"took Courvoisier into custody, and placed the ..."
6912,LJ001-0140,"the desirable thing being ""the breaking of the..."


#### Preprocessing of the Audio

In [None]:
# 1. Read wav file
file = tf.io.read_file(wavs_path + list(metadata_df["file_name"])[0] + ".wav")
# 2. Decode the wav file
audio, _ = tf.audio.decode_wav(file)
audio = tf.squeeze(audio, axis=-1)
# 3. Change type to float
audio = tf.cast(audio, tf.float32)
Audio(np.transpose(audio), rate=16000)


#### Batch Mode

In [None]:
def read_wav_file(filename):
    with wave.open(filename, 'rb') as w:
        rate = w.getframerate()
        frames = w.getnframes()
        buffer = w.readframes(frames)
        # print("Rate:", rate)
        # print("Frames:", frames)
        # print("Buffer Len:", len(buffer))

    return buffer, rate

In [None]:
def transcribe_batch(audio_file):
    buffer, rate = read_wav_file(audio_file)
    data16 = np.frombuffer(buffer, dtype=np.int16)
    return model.stt(data16)

In [None]:
for i in range(100):
  print(f'No. {i+1}')
  print('*** Predicted Transcription ***')
  print(transcribe_batch(wavs_path + list(metadata_df["file_name"])[i] + ".wav"))

  print('*** Normalized Transcription ***')
  print(list(metadata_df["normalized_transcription"])[i])

  print('-' * 100)
  

No. 1
*** Predicted Transcription ***
two that the condemned man had passed through the debtors doors on to the scaffold the prison had done with him
*** Normalized Transcription ***
the moment too that the condemned man had passed through the debtors' door on to the scaffold the prison had done with him,
----------------------------------------------------------------------------------------------------
No. 2
*** Predicted Transcription ***
days were passed in idleness debauchery rhinosceri immoral conversation
*** Normalized Transcription ***
The days were passed in idleness, debauchery, riotous quarreling, immoral conversation,
----------------------------------------------------------------------------------------------------
No. 3
*** Predicted Transcription ***
blemish which can be nearly though not coldly avoided by care and forethought
*** Normalized Transcription ***
a blemish which can be nearly, though not wholly, avoided by care and forethought
-----------------------------