# <center> <font size = 24 color = 'steelblue'>**Speech Recognition**

<center><img src= 'https://drive.google.com/uc?export=view&id=1ACG-2N9lRygp6no9MJTJJYbeAuuOiUtW' height = 500, width = 800>

<div class="alert alert-block alert-info">
    
<font size = 4> 

**By the end of this notebook you will be able to:**

* Learn to preprocess audio files
* Understand audio features
* Identify methods of speech recognition

# <a id= 's0'> 
<font size = 4>
    
**Table of contents:**<br>
[1. Data description](#s1)<br>
[2. Preprocessing of the audio files](#s2)<br>
[3. General audio features exploration](#s3)<br>
[4. Speech analysis](#s4)<br>
[5. Sentiment analysis](#s5)

###### <a id = 's1'>
<font size = 10 color = 'midnightblue'> **Data Description**

<div class="alert alert-block alert-success"> 
<font size = 4>
    
**Audio files used in this analysis are available in Wikimedia:**

- VOA News Report: "FIFA to Re-Examine Video Replays After World Cup Referee Mistakes" https://commons.wikimedia.org/wiki/File:2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.ogg
- Aldi speaking Bible verse in Bahasa Indonesia https://commons.wikimedia.org/wiki/File:Aldi_-_Indonesian_language_-_Bible_Verse_John_3-16.ogg

<font size = 5 color = seagreen > **Load the libraries**

In [None]:
# Import generic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from pydub import AudioSegment

[top](#s0)

###### <a id = 's2'>
<font size = 10 color = 'midnightblue'> **Preprocessing**

<div class="alert alert-block alert-success"> 
<font size = 4>
    
- For processing in most Python audio-processing libraries, audio files typically need to be in waveform (.wav) format.
- WAV is preferred due to its lossless attributes, unlike compressed formats such as mp3/m4a.
- Hence, it is recommended to convert your audio file to WAV before analysis if it's not already in that format.
- In Python, audio file conversion is simplified by the pydub package.
- To achieve this, create an AudioSegment instance from the input file and utilize the "export" function to generate the desired output file.
</div>

In [None]:
#pwd()

<font size = 6 color = pwdrblue> **Creating a clone to enable `ogg` to `wav` conversion**

In [None]:
#!git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg

In [None]:
from pydub import AudioSegment

# Load the OGG file
ogg_file = "/voc/work/2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.ogg"
audio = AudioSegment.from_file(ogg_file, format="ogg")

# Export as WAV
wav_file = "2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav"
audio.export(wav_file, format="wav")

print(f"Conversion complete. WAV file saved at {wav_file}")

In [None]:
ipd.Audio("/voc/work/2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav")

[top](#s0)

###### <a id = 's3'>
<font size = 10 color = 'midnightblue'> **General audio features explorations**

<font size = 5 color = seagreen> **Amplitude**

In [None]:
import librosa
import librosa.display

In [None]:
y, sr = librosa.load('/voc/work//2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav', mono=False)

fig, ax = plt.subplots(figsize=(15, 3))
img = librosa.display.waveshow(y, sr=sr, ax=ax)

ax.set(title='Envelope view, stereo')
ax.label_outer()

plt.show()

<font size = 5 color = seagreen> **Frequency**

In [None]:
# Load audio file to Librosa
y, sr = librosa.load('/voc/work/2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav')

# Converts data into short term Fourier transform. 
# STFT converts signals such that we can know the amplitude of the given frequency at a given time
D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)

# Display spectogram
fig, ax = plt.subplots(figsize=(15, 3))
img = librosa.display.specshow(D, y_axis='linear', x_axis='time', sr=sr)
ax.set(title='Linear-frequency power spectrogram')
ax.label_outer()

plt.colorbar()

In [None]:
fig, ax = plt.subplots(figsize=(15, 3))
img = librosa.display.specshow(D, y_axis='log', x_axis='time', sr=sr)
ax.set(title='Logarithmic-frequency power spectrogram')
ax.label_outer()

plt.colorbar()

[top](#s0)

###### <a id = 's4'>
<font size = 10 color = 'midnightblue'> **Speech Analysis**

In [None]:
#pip install --user SpeechRecognition

<font size = 4 color = seagreen><b>Speech recognition</b>

<div class="alert alert-block alert-success"> 
<font size = 4> 
    
- Input the audio file(s) into a predefined engine for textual transcription.
- Utilize the speech recognition library in python for this transcription task.
- The library supports various engines, including CMU Sphinx, Google Cloud Speech API, Microsoft Bing Voice Recognition, and IBM Speech to Text.
- It's important to note that certain engines may require API tokens.
- For this project, **Google Speech Recognition** is used with the default API key.
</div>

<font size = 4 color = seagreen> **Key Features of Speech Recognition:**

<div class="alert alert-block alert-success"> 
<font size = 4> 
    
> The Speech Recognition library encompasses a Recognizer class, equipped with a set of built-in functions for configuring speech recognition settings and accessing features.
    
- To begin, we import the library and configure the Recognizer. <br>
- Subsequently, we import the audio file intended for transcription.

In [None]:
import speech_recognition as sr

# Set up recognizer
r = sr.Recognizer()
r.energy_threshold = 300

# Import Audio data
test_audio = sr.AudioFile('/voc/work/2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav')
with test_audio as source:
    r.adjust_for_ambient_noise(source)
    audio = r.record(source)
    
r.recognize_google(audio, show_all=True)

<div class="alert alert-block alert-success"> 
<font size = 4> 

- Transcription can now be directly performed using the recognize_google function, which takes the audio file as input and produces the transcription as output. 
- However, depending on the audio file quality, several other functionalities can be employed to improve the audio file, thus resulting in enhanced transcription.
- If unnecessary content is present in your audio file that does not require transcription, the duration and offset variables in the record function can be utilized to specify the particular segment of the audio file to transcribe.
- In the case of a noisy audio file, the adjust_for_ambient_noise function can be employed to adjust the energy threshold for ambient noise levels. This calibration enables the noise to be disregarded, allowing the recognizer to concentrate on the actual speech.
- The code for speech recognition on the VOA News audio file is provided below.

In [None]:
# Import Audio data

test_audio2 = sr.AudioFile('/voc/work/2010-06-29_VOA_News_report_-_FIFA_to_Re-Examine_Video_Replays_After_World_Cup_Referee_Mistakes.wav')
with test_audio2 as source2:
    audio2 = r.record(source2, duration=60)
    audio3 = r.record(source2, duration=60)
    audio4 = r.record(source2)

<div class="alert alert-block alert-success"> 
<font size = 4> 
    
- Multiple transcription results may be generated by different recognition techniques within the Speech Recognition engine.
- The Recognizer class automatically selects the result with the highest confidence score.
- To examine other transcription results, the recognize_google function can include the show_all=True variable.
</div>

In [None]:
text_audio = r.recognize_google(audio2) + " " + r.recognize_google(audio3) + " " + r.recognize_google(audio4)
text_audio

<font size = 5 color = seagreen> <b>Add punctuator</b>

In [None]:
#!pip install deepmultilingualpunctuation

In [None]:
from deepmultilingualpunctuation import PunctuationModel

In [None]:
model = PunctuationModel()

In [None]:
text_audio_punc = model.restore_punctuation(text_audio)
text_audio_punc

<font size = 5 color = seagreen> <b>Recognition on non-english language</b>

<div class="alert alert-block alert-success"> 
<font size = 4>
    
- Speech Recognition supports recognition from non-English languages.
- In such cases, include the language="id-ID" variable within the recognize_google function.
</div>

In [None]:
raw_file = AudioSegment.from_file(file="/voc/work/Aldi_-_Indonesian_language_-_Bible_Verse_John_3-16.ogg", format="ogg")
raw_file.export(out_f="/voc/work/Aldi_-_Indonesian_language_-_Bible_Verse_John_3-16.wav", format="wav")

wav_file = AudioSegment.from_file(file="/voc/work/Aldi_-_Indonesian_language_-_Bible_Verse_John_3-16.wav", format="wav")

In [None]:
# Set up recognizer
r = sr.Recognizer()
r.energy_threshold = 300

# Import Audio data
test_audio = sr.AudioFile('/voc/work/Aldi_-_Indonesian_language_-_Bible_Verse_John_3-16.wav')
with test_audio as source:
    r.adjust_for_ambient_noise(source)
    audio = r.record(source)
    
r.recognize_google(audio, language="id-ID", show_all=True)

In [None]:
text_audio_id = r.recognize_google(audio, language="id-ID")
text_audio_id

[top](#s0)

###### <a id = 's5'>
<font size = 10 color = 'midnightblue'> <b>Sentiment Analysis using VADER

<div class="alert alert-block alert-info"> 
<font size = 4> 
    
<center><b>Sentiment analysis using VADER library has already been covered in previous chapters. 

In [None]:
import nltk
nltk.download("punkt")
nltk.download("vader_lexicon")

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

In [None]:
print(sid.polarity_scores(text_audio_punc))

In [None]:
print(sid.polarity_scores(text_audio_id))

[top](#s0)