# The whole approach of the notebook is described as below:

1. Loading the data and categorizing the audio into one of the following labels:

    ['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go']
2. Having loaded the data, we would be resampling the audio to a sample rate of 8000, we would be mixing the audio, with the help of the background noise,and then would train the data.
3. Reducing the dimension to 4000 (earlier it was 8000), and then observing the metrics
4. Again, applying the cross-validation technique (on both non-reduced non-dimensional and reduced dimensional data) and , and then comparing the results

In [1]:
!pip install pyunpack
!pip install patool
# Extracting the .7z file, as the given file is in .7z format and the notebook is running on kaggle
import os
from pyunpack import Archive
os.system('apt-get install p7zip')
import shutil
if not os.path.exists('/kaggle/working/train/'):
    '''If already, some file is created don't make it
    '''
    os.makedirs('/kaggle/working/train/')

# Extracting the .7z file 
Archive('../input/tensorflow-speech-recognition-challenge/train.7z').extractall('/kaggle/working/train/')


# Checking the number of each file
import os
path = os.listdir('./train/train/audio/')
size = {}
for i in path:
      size[i] = len(os.listdir('./train/train/audio/'+i))
print(size)

Collecting pyunpack
  Downloading pyunpack-0.2.2-py2.py3-none-any.whl (3.8 kB)
Collecting entrypoint2
  Downloading entrypoint2-0.2.4-py3-none-any.whl (6.2 kB)
Collecting easyprocess
  Downloading EasyProcess-0.3-py2.py3-none-any.whl (7.9 kB)
Installing collected packages: entrypoint2, easyprocess, pyunpack
Successfully installed easyprocess-0.3 entrypoint2-0.2.4 pyunpack-0.2.2
Collecting patool
  Downloading patool-1.12-py2.py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 927 kB/s 
[?25hInstalling collected packages: patool
Successfully installed patool-1.12
{'right': 2367, 'on': 2367, 'four': 2372, 'no': 2375, 'happy': 1742, 'six': 2369, 'up': 2375, 'one': 2370, 'marvin': 1746, 'yes': 2377, 'stop': 2380, '_background_noise_': 7, 'three': 2356, 'dog': 1746, 'eight': 2352, 'bird': 1731, 'wow': 1745, 'house': 1750, 'seven': 2377, 'nine': 2364, 'five': 2357, 'bed': 1713, 'off': 2357, 'cat': 1733, 'zero': 2376, 'tree': 1733, 'two': 2373, 'down': 2359

In [2]:
# The total categories of labels
print("The total labels are: ",len(os.listdir('./train/train/audio'))-1)  # excluding the _background_noise_

The total labels are:  30


## Making the labelled dataset

Basically, we have some folders containing the audio files, in which most of them are of fixed duration (i.e of one sec), however, many of them are of the duration less than 1 sec, and the audio dataset is basically taken in a quiet place,however in real life scenario, there is a bit of background noise, and we would be mixing some percentage of background noise to the original model, to make it more robust


In [3]:
import librosa # For loading the audio file

labels_to_consider = ['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go']
unknown = [i for i in os.listdir('./train/train/audio') if i not in labels_to_consider and i != '_background_noise_' ]

def label_encoder(directory):
    '''
    Input: labels -> list of labels to be considered
           directory -> the folder in which the examples for all the dataset is present
           
    Output : list containing the entry as: (sample,gruop number)
    '''
    i = 0
    label_encoder ={}
    labelled_wave = []
    training_data = []
    for label in labels_to_consider:
        path = os.path.join(directory,label)
        label_encoder[label] = i    # Label to encoder
        print("The current label is: "+str(label)+" of: "+str(i))
        i+=1
        for audio_file in os.listdir(path):
            if audio_file.endswith('.wav'):
                samples, sample_rate = librosa.load(os.path.join(os.path.join(directory,label),audio_file))
                samples = librosa.resample(samples,sample_rate,8000)
            if len(samples)!=8000:
                continue
            else:
                labelled_wave.append([samples,label])
                training_data.append(samples)
    for label in unknown:
                print("The current label is: "+str("unknown"+" of: "+str(i)))
                label_encoder['unknown'] = i
                path = os.path.join(directory,label)
                for audio_file in os.listdir(path):
                    if audio_file.endswith('.wav'):
                        samples, sample_rate = librosa.load(os.path.join(os.path.join(directory,label),audio_file))
                        samples = librosa.resample(samples,sample_rate,8000)
                    if len(samples)!=8000:
                        continue
                    else:
                        labelled_wave.append([samples,'unknown'])
                        training_data.append(samples)
    return labelled_wave,training_data,label_encoder
labelled_wave,training_data,label_encoder = label_encoder('./train/train/audio')

The current label is: yes of: 0
The current label is: no of: 1
The current label is: up of: 2
The current label is: down of: 3
The current label is: left of: 4
The current label is: right of: 5
The current label is: on of: 6
The current label is: off of: 7
The current label is: stop of: 8
The current label is: go of: 9
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current label is: unknown of: 10
The current l

## Adding the background noise

We would be adding the random noise to all the audio files, to make it more robust

In [4]:
import numpy as np
import random
def get_random_noise():
    ''' This function will be useful for getting the random noise
    '''
    audios = os.listdir('./train/train/audio/_background_noise_')
    noise = random.randint(0,len(audios)-1)
    noise,sr = librosa.load('./train/train/audio/_background_noise_/'+audios[noise])
    noise = librosa.resample(noise,sr,8000)
    start =random.randint(0,noise.shape[0]-8000-1)
    return noise[start:start+8000]


def mix_audio(data,ratio = 0.1):
    ''' This function will mix the original audio with the background noise
    '''
    noise = get_random_noise()
    final_data = []
    for i,j in enumerate(data):
        final_data.append(j + (ratio*noise))
    return final_data
final_data = mix_audio(training_data)

In [5]:
labels = [i[1] for i in labelled_wave]
def remove_not_equal_length(data,labels):
    
    '''If some array has some inadequate length, we would remove it
    '''
    final_data = []
    for i,j in zip(data,labels):
        if len(i)!=8000:
            continue
        else:
            final_data.append([i,j])
    return final_data
dataset = remove_not_equal_length(final_data,labels)

In [6]:
# Traininng data = wav_file
# Labels = target
wave_file = [i[0] for i in dataset]
wave_file = np.reshape(np.array(wave_file),(-1,8000,1))
target = [label_encoder[i[1]] for i in dataset]
target = np.reshape(target,(-1,1))

## Train Test split

In [7]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(wave_file,target,test_size= 0.15,random_state=98,shuffle=True)

## One Hot Encoding

In [8]:
import keras
y_train = keras.utils.to_categorical(y_train, len(set(labels))+1)
y_test = keras.utils.to_categorical(y_test, len(set(labels))+1)

In [9]:
training_labels = [np.argmax(i) for i in y_train]
testing_labels = [np.argmax(i) for i in y_test]
train  = X_train.reshape(-1,8000)
test  = X_test.reshape(-1,8000)

## Making the model

In [10]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 42,max_depth = 20)
classifier.fit(train,training_labels)

RandomForestClassifier(criterion='entropy', max_depth=20, n_estimators=10,
                       random_state=42)

In [11]:
from sklearn.metrics import classification_report,accuracy_score,f1_score
print(classification_report(testing_labels,classifier.predict(test)))
print("The accuracy score is:",accuracy_score(testing_labels,classifier.predict(test)))
#print("The F! score is:",f1_score(testing_labels,classifier.predict(test)))

              precision    recall  f1-score   support

           0       0.33      0.03      0.05       338
           1       0.23      0.04      0.07       307
           2       0.50      0.03      0.05       300
           3       0.16      0.02      0.03       336
           4       0.25      0.01      0.02       319
           5       0.23      0.01      0.02       306
           6       0.08      0.01      0.01       330
           7       0.25      0.01      0.02       310
           8       0.33      0.01      0.02       324
           9       0.41      0.03      0.06       323
          10       0.64      0.98      0.77      5545

    accuracy                           0.63      8738
   macro avg       0.31      0.11      0.10      8738
weighted avg       0.51      0.63      0.50      8738

The accuracy score is: 0.6279468986037995


## Dimensionality Reduction (8000 -> 4000 Features)

In [12]:
from sklearn.decomposition import PCA
pca = PCA(n_components = 4000)
train_pca = pca.fit_transform(train)
test_pca = pca.transform(test)

In [13]:
model_pca = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 42,max_depth = 20)
model_pca.fit(train_pca,training_labels)

RandomForestClassifier(criterion='entropy', max_depth=20, n_estimators=10,
                       random_state=42)

In [14]:
print(classification_report(testing_labels,model_pca.predict(test_pca)))
print("The accuracy score is:",accuracy_score(testing_labels,model_pca.predict(test_pca)))

              precision    recall  f1-score   support

           0       0.25      0.02      0.04       338
           1       0.22      0.03      0.05       307
           2       0.26      0.03      0.05       300
           3       0.19      0.02      0.04       336
           4       0.10      0.01      0.02       319
           5       0.08      0.01      0.02       306
           6       0.12      0.01      0.01       330
           7       0.17      0.01      0.02       310
           8       0.20      0.02      0.03       324
           9       0.33      0.03      0.05       323
          10       0.64      0.97      0.77      5545

    accuracy                           0.62      8738
   macro avg       0.23      0.11      0.10      8738
weighted avg       0.48      0.62      0.50      8738

The accuracy score is: 0.6242847333485924


## Cross Validation (Comparison with and without Dimensionality reduction)
Create two datasets, with dimensionality reduction, and the other one is without dimensionality reduction

In [15]:
from sklearn.model_selection import cross_val_score
# without dimensionality reduction
scores = cross_val_score(classifier, train, training_labels, cv=5)
print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))

0.62 accuracy with a standard deviation of 0.00


In [16]:
from sklearn.model_selection import cross_val_score
# with dimensionality reduction
scores = cross_val_score(model_pca, train_pca, training_labels, cv=5)
print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))

0.62 accuracy with a standard deviation of 0.00
