# Transcription of an mp3 file into text (Speech Recognition)

In this tutorial I want to explain how to use Python to transcribe an mp3 file named p1.mp3 and extract text, step by step.
The mp3 file is store in input_dir.

The used packages and tools are:
1. tools:

    -ffmpeg: to convert mp3 to wave file
    
    -FLAC command line tools: to read the wave file using SpeechRecognition package
2. packages:

    -librosa: get the length of wave file
    
    -pydub: manipulate wave audio files, i.e., split a large wave file into multiple small wave files
    
    -speech_recognition: to recognize the speech using the Google Web Speech API recognizer.

#### 1. Convert mp3 file to wave file format using ffmpeg
In this step you need to assign three folders input_dir, output_dir and path_to_ffmpeg_exe.
The input_dir is the folder containing mp3 audio files.
The output_dir is the folder which will store the converted wave files using ffmpeg.
The path_to_ffmpeg_exe is the path of ffmpeg.exe which is in bin folder of ffmpeg.
You need to have ffmpeg in your system. You can download latest version of ffmpeg-master-latest-win64-gpl.zip from https://github.com/BtbN/FFmpeg-Builds/releases, then extract it to some folder and copy the address of ffmpeg.exe which is in bin folder to the path_to_ffmpeg_exe variable.

In [None]:
import os
import subprocess


input_dir = 'audio-examples/f/'
output_dir = 'audio-examples/f/'
parts_dir = "audio-examples/f/parts"
path_to_ffmpeg_exe = r'C:\\ffmpeg2023\\ffmpeg-master-latest-win64-gpl\\bin\\ffmpeg.exe'

files_list = []

for path in os.listdir(input_dir):
    if os.path.isfile(os.path.join(input_dir, path)):
        files_list.append(path)

for file_nm in files_list:
    print(file_nm)
    subprocess.call([path_to_ffmpeg_exe, '-i', os.path.join(input_dir, file_nm), os.path.join(output_dir, str(file_nm.split(".")[0] + ".wav"))])

#### 2. Get the length of wav file you want to transcribe in milliseconds

In [None]:
import librosa
l = librosa.get_duration(filename="audio-examples/f/p1.wav") # get length in seconds
lms = int(l*1000) # get length in milliseconds

#### 3. Split single wave file to multiple small wave files
Since this transcription is based on the request from the Google Web Speech API, when size of audio wave file exceeds some limitations (I think it is around 10MB), the request fails. When converting mp3 file to wave, size of wave file increases dramatically, i.e., if size of mp3 is around 14MB, the size of output wave file would be 150MB. Hence, in this step we split a single wave file to multiple small wave files and store them in subfolder called parts.

In [None]:
from pydub import AudioSegment
step = 60*1000 # 60 seconds as milliseconds

audio = AudioSegment.from_wav("audio-examples/f/p1.wav")
audio_parts=[]

for i in range(0, lms, step):
    t1 = i
    t2 = (i+step)
    audio_parts.append(audio[t1:t1+step])    

k=1
for audio_part in audio_parts:
    audio_part.export('audio-examples/f/parts/p1_part'+str(k).zfill(5)+'.wav', format="wav") #Exports to a wav file
    k=k+1

#### 4. Get the list of small wave files
Get total list of waves to be transcribed and store it in the audiofiles list: 

In [None]:
from os import listdir
from os.path import isfile, join
mypath = 'audio-examples/f/parts/'
audiofiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

#### 5. Transcribe each small wave file
Use SpeechRecognition package and transcribe each wave file. In this code the recognize_google() method is used as the Recognizer which is Google Web Speech API. So, you need to be online.

Note: in this section for reading wave files, the package uses FLAC command line tools. So, you need to have the FLAC command line tools in your Windows32 folder. For this, you can do as following (the steps is from https://stackoverflow.com/a/71228053):
1. Go to flac download page
2. Choose OS (Windows, in your case)
3. Download win.zip version (latest), probably at the end of the page.
4. Move to the download directory and extract it (You can simple use Extract Here)
5. Move to win64 or win32 according to your system architecture
6. Copy flac.exe and paste it inside C:\Windows\System32 this directory.
7. Try running your code. If it executes successfully, No need to do last step. 8. If not, perform 7th step too.
        Remove .exe .i.e. Rename it to flac
        
In my case, I downloaded flac-1.4.2-win.zip (https://ftp.osuosl.org/pub/xiph/releases/flac/) and copied the contents of Win64 folder, including flac.exe, libFLAC.dll, libFLAC++.dll and metaflac.exe files into my System32 folder and then renamed flac.exe into flac (without extension).

In [None]:
import speech_recognition as sr
from os import path
from pydub import AudioSegment



for i in range(0, len(audiofiles)+1):
    # transcribe audio file                                                         
    AUDIO_FILE = "audio-examples/f/parts/"+audiofiles[i]

    # use the audio file as the audio source                                        
    r = sr.Recognizer()
    with sr.AudioFile(AUDIO_FILE) as source:
        audio = r.record(source)  # read the entire audio file                  
        print("part "+str(i)+": " + r.recognize_google(audio)+"\n")

#### 6. Just finished.
After running above section, you will have the transcribe of your audio file. It takes some time to finish.