## Speech recognition
### Training the network using a training set. Demonstrating the ability of the network to recognize words not in the training set and at the end, reporting the accuracy of the model.

### Preparing the data and importing functions from preproces.py

In [4]:
%load_ext autoreload
%autoreload 2

from preprocess import *
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.utils import to_categorical

# Second dimension of the feature is dim2
feature_dim_2 = 11

# Save data to array file first
save_data_to_array(max_len=feature_dim_2)

# # Loading train set and test set
X_train, X_test, y_train, y_test = get_train_test()

# # Feature dimension
feature_dim_1 = 20
channel = 1
epochs = 50
batch_size = 100
verbose = 1
num_classes = 5

# Reshaping to perform 2D convolution
X_train = X_train.reshape(X_train.shape[0], feature_dim_1, feature_dim_2, channel)
X_test = X_test.reshape(X_test.shape[0], feature_dim_1, feature_dim_2, channel)

y_train_hot = to_categorical(y_train)
y_test_hot = to_categorical(y_test)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Saving vectors of label - 'bed': 100%|█████| 1713/1713 [00:37<00:00, 45.94it/s]
Saving vectors of label - 'cat': 100%|█████| 1733/1733 [00:25<00:00, 67.65it/s]
Saving vectors of label - 'dog': 100%|█████| 1746/1746 [00:31<00:00, 54.90it/s]
Saving vectors of label - 'happy': 100%|███| 1742/1742 [00:29<00:00, 58.45it/s]
Saving vectors of label - 'yes': 100%|█████| 2377/2377 [00:55<00:00, 43.05it/s]


In [5]:
def get_model():
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(2, 2), activation='relu', input_shape=(feature_dim_1, feature_dim_2, channel)))
    model.add(Conv2D(48, kernel_size=(2, 2), activation='relu'))
    model.add(Conv2D(120, kernel_size=(2, 2), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.4))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    return model

# Predicts one sample
def predict(filepath, model):
    sample = wav2mfcc(filepath)
    sample_reshaped = sample.reshape(1, feature_dim_1, feature_dim_2, channel)
    return get_labels()[0][
            np.argmax(model.predict(sample_reshaped))
    ]

# Building The Model Then Training it

In [6]:
model = get_model()
model.fit(X_train, y_train_hot, batch_size=batch_size, epochs=epochs, verbose=verbose, validation_data=(X_test, y_test_hot))

Train on 5586 samples, validate on 3725 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1bfe84562e8>

## Prediction: trying with the word "dog"

In [25]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/dog/0b09edd3_nohash_1.wav', model=model))

dog


## Second try with the word "dog"

In [27]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/dog/0ff728b5_nohash_0.wav', model=model))

dog


## Trying with the word "yes"

In [23]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/yes/01bb6a2a_nohash_2.wav', model=model))


yes


## Second try with the word "yes"

In [24]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/yes/0f7dc557_nohash_1.wav', model=model))

yes


## Checking with the word "down" not included in the training set:

In [29]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/down/98447c43_nohash_0.wav', model=model))

dog


We can see that the network identified the word as "dog". 

## Checking with the word "house" not included in the training set:

In [31]:
print(predict('/users/Victor/documents/Master/CSCI E-89 Deep Learning/Assignments/Assignment 12/DeadSimpleSpeechRecognizer-master/data/house/0d393936_nohash_0.wav', model=model))

down


We can see that the network identified the word as "down".