#3: Deep Learning Models
# Data Science - Capstone Project Submission

* Student Name: **James Toop**
* Student Pace: **Self Paced**
* Scheduled project review date/time: **29th October 2021 @ 21:30 BST**
* Instructor name: **Jeff Herman / James Irving**
* Blog URL: **https://toopster.github.io/**

---

## Table of Contents
1. [Business Case, Project Purpose and Approach](1_business_case.ipynb#business-case)
    1. [The importance of communication for people with severe learning disabilities](1_business_case.ipynb.ipynb#communication-and-learning-disabilities)
    2. [Types of communication](1_business_case.ipynb.ipynb#types-of-communication)
    3. [Communication techniques for people with learning disabilities](1_business_case.ipynb.ipynb#communication-techniques)
    4. [Project purpose & approach](1_business_case.ipynb.ipynb#project-purpose)
2. [Exploratory Data Analysis](2_eda.ipynb#eda)
    1. [The Datasets](2_eda.ipynb#the-datasets)
    2. [Discovery](2_eda.ipynb#data-discovery)
    3. [Preprocessing - Stage One](2_eda.ipynb#data-preprocessing-stage-one)
3. [Deep Learning Models for Speech Recognition](#deep-learning-models)
    1. [Preprocessing - Stage Two](#data-preprocessing-stage-two)
    2. [Simple Baseline Model](#simple-baseline-model)
    3. [Advanced Model using MFCC's](#model-2)
4. [Final Model Performance Evaluation](#final-model-performance-evaluation)

---
<a name="deep-learning-models"></a>
# 3. Deep Learning Models for Speech Recognition

In [54]:
# Import relevant libraries and modules for creating and training neural networks
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import wave
import soundfile as sf
import librosa, librosa.display
import IPython.display as ipd
import os
import json

import tensorflow as tf
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras import models
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)

import pathlib
from pathlib import Path

In [2]:
# Set seed for reproducibility
seed = 123
tf.random.set_seed(seed)
np.random.seed(seed)

<a name="data-preprocessing-stage-two"></a>
### 3A. Preprocessing - Stage Two

In [17]:
# Function to extract Mel Spectrograms and MFCCs to use in the models and store in JSON file
def preprocess_dataset(dataset_path, json_path, num_samples, num_mfcc=13, n_fft=2048, hop_length=512):

    # Dictionary to temporarily store mapping, labels, MFCCs, spectrograms and filenames
    data = {
        "mapping": [],
        "labels": [],
        "MFCCs": [],
        "files": []
    }

    # Loop through all sub directories
    for i, (dirpath, dirnames, filenames) in enumerate(os.walk(dataset_path)):

        # Ensure we're at sub-folder level
        if dirpath is not dataset_path:

            # Save label in the mapping
            label = dirpath.split("/")[-1]
            data["mapping"].append(label)
            print("\nProcessing: '{}'".format(label))

            # Process all audio files in sub directory and store MFCCs
            for f in filenames:
                file_path = os.path.join(dirpath, f)

                # Load audio file and slice it to ensure length consistency among different files
                signal, sample_rate = librosa.load(file_path)

                # Drop audio files with less than pre-decided number of samples
                if len(signal) >= num_samples:

                    # Ensure consistency of the length of the signal
                    signal = signal[:num_samples]

                    # Extract MFCCs
                    MFCCs = librosa.feature.mfcc(signal, 
                                                 sample_rate, 
                                                 n_mfcc=num_mfcc, 
                                                 n_fft=n_fft,
                                                 hop_length=hop_length)

                    # Append data in dictionary
                    data["MFCCs"].append(MFCCs.T.tolist())
                    data["labels"].append(i-1)
                    data["files"].append(file_path)
                    print("{}: {}".format(file_path, i-1))

    # Save data in JSON file for re-using later
    with open(json_path, "w") as file_path:
        json.dump(data, file_path, indent=4)

In [18]:
us_dataset_path = 'data/ultrasuite_test'
us_json_path = 'ultrasuite_data.json'
num_samples = 11025

In [19]:
preprocess_dataset(us_dataset_path, us_json_path, num_samples)


Processing: 'crab'
data/ultrasuite_test/crab/crab_upx-05M-BL1-005A.wav: 0
data/ultrasuite_test/crab/crab_uxtd-30F-039A.wav: 0
data/ultrasuite_test/crab/crab_uxtd-27M-037A.wav: 0
data/ultrasuite_test/crab/crab_upx-12M-BL1-005A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-08M-BL1-024A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-04M-BL2-004A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-04M-Mid-004A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-02M-Post-004A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-08M-BL1-008A.wav: 0
data/ultrasuite_test/crab/crab_uxtd-12M-037A.wav: 0
data/ultrasuite_test/crab/crab_uxtd-08M-039A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-04M-Post-005A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-03F-BL1-006A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-07F-Maint-005A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-01M-Therapy_01-007A.wav: 0
data/ultrasuite_test/crab/crab_uxtd-05M-039A.wav: 0
data/ultrasuite_test/crab/crab_uxssd-05M-BL2-011A.wav: 0
data/ultrasuite_test/crab/cra

data/ultrasuite_test/boy/boy_uxssd-02M-BL1-090A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-08M-021A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-19M-018A.wav: 1
data/ultrasuite_test/boy/boy_upx-08M-BL1-015A.wav: 1
data/ultrasuite_test/boy/boy_uxssd-06M-BL1-008A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-25M-046A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-27M-020A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-09F-045A.wav: 1
data/ultrasuite_test/boy/boy_uxssd-02M-BL1-066A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-12M-020A.wav: 1
data/ultrasuite_test/boy/boy_upx-01F-BL2-044A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-01M-027A.wav: 1
data/ultrasuite_test/boy/boy_uxssd-05M-Post_round2-016A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-30F-022A.wav: 1
data/ultrasuite_test/boy/boy_upx-14M-BL2-040A.wav: 1
data/ultrasuite_test/boy/boy_uxssd-03F-Post-020A.wav: 1
data/ultrasuite_test/boy/boy_uxssd-04M-Maint2-015A.wav: 1
data/ultrasuite_test/boy/boy_upx-01F-BL1-016A.wav: 1
data/ultrasuite_test/boy/boy_uxtd-02M-028A.wav

data/ultrasuite_test/bridge/bridge_upx-07M-BL1-034A.wav: 2
data/ultrasuite_test/bridge/bridge_upx-15M-Mid-045C.wav: 2
data/ultrasuite_test/bridge/bridge_upx-19M-Post-034C.wav: 2
data/ultrasuite_test/bridge/bridge_upx-11M-Post-038A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-38M-009A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-14M-029A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-42M-009A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-12M-032A.wav: 2
data/ultrasuite_test/bridge/bridge_upx-07M-Post-024A.wav: 2
data/ultrasuite_test/bridge/bridge_upx-20M-BL3-040C.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-50F-009A.wav: 2
data/ultrasuite_test/bridge/bridge_uxssd-07F-Post-002A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-44F-009A.wav: 2
data/ultrasuite_test/bridge/bridge_upx-06M-Maint-027A.wav: 2
data/ultrasuite_test/bridge/bridge_upx-17M-Post-043C.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-08M-031A.wav: 2
data/ultrasuite_test/bridge/bridge_uxtd-12M-029A.wav: 2
data/ultrasu

data/ultrasuite_test/book/book_upx-08M-Suit-014A.wav: 4
data/ultrasuite_test/book/book_uxtd-16F-046A.wav: 4
data/ultrasuite_test/book/book_uxssd-04M-Mid-013A.wav: 4
data/ultrasuite_test/book/book_uxssd-04M-BL2-013A.wav: 4
data/ultrasuite_test/book/book_uxtd-23F-046A.wav: 4
data/ultrasuite_test/book/book_uxssd-03F-BL1-027A.wav: 4
data/ultrasuite_test/book/book_uxssd-07F-BL2-014A.wav: 4
data/ultrasuite_test/book/book_uxssd-07F-Mid-014A.wav: 4
data/ultrasuite_test/book/book_uxssd-06M-Maint1-041A.wav: 4
data/ultrasuite_test/book/book_uxssd-05M-Mid_round2-014A.wav: 4
data/ultrasuite_test/book/book_upx-14M-BL1-015A.wav: 4
data/ultrasuite_test/book/book_upx-10M-BL1-015A.wav: 4
data/ultrasuite_test/book/book_uxtd-19M-043A.wav: 4
data/ultrasuite_test/book/book_uxssd-05M-BL2-020A.wav: 4
data/ultrasuite_test/book/book_uxssd-01M-BL1-013A.wav: 4
data/ultrasuite_test/book/book_upx-17M-Post-049A.wav: 4
data/ultrasuite_test/book/book_uxtd-24F-047A.wav: 4
data/ultrasuite_test/book/book_uxtd-22M-046A.wa

In [20]:
# Function to load the data from the JSON file depending on selected feature
def load_data(data_path, feature):

    with open(data_path, 'r') as file_path:
        data = json.load(file_path)

    X = np.array(data[feature])
    y = np.array(data['labels'])

    print('Datasets loaded...')
    
    return X, y

In [21]:
# Function to create training, test and validation datasets
def create_train_test(data_path, feature, test_size=0.2, val_size=0.2):

    # Load dataset
    X, y = load_data(data_path, feature)

    # Create train, test and validation splits
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=val_size)

    # Increase the dimension of the NumPy array for each split - WHY?
    X_train = X_train[..., np.newaxis]
    X_test = X_test[..., np.newaxis]
    X_val = X_val[..., np.newaxis]

    return X_train, y_train, X_val, y_val, X_test, y_test

In [22]:
# Function for visualising results
def visualise_results(results):
    history = results.history

    plt.figure(figsize=(20,8))
    plt.xticks(fontsize=12)
    plt.yticks(fontsize=12)
    
    plt.subplot(1, 2, 1)
    plt.plot(history['val_loss'])
    plt.plot(history['loss'])
    plt.legend(['Validation Loss', 'Training Loss'], fontsize=12)
    plt.title('Loss', fontsize=18)
    plt.xlabel('Epochs', fontsize=14)
    plt.ylabel('Loss', fontsize=14)
    
    plt.subplot(1, 2, 2)
    plt.plot(history['val_acc'])
    plt.plot(history['acc'])
    plt.legend(['Validation Accuracy', 'Training Accuracy'], fontsize=12)
    plt.title('Accuracy', fontsize=18)
    plt.xlabel('Epochs', fontsize=14)
    plt.ylabel('Accuracy', fontsize=14)
    plt.show()

In [25]:
us_data_path = 'ultrasuite_data.json'
X_train, y_train, X_val, y_val, X_test, y_test = create_train_test(us_data_path, 'MFCCs')

Datasets loaded...


<a name="simple-baseline-model"></a>
### 3B. Simple Baseline Model

In [48]:
# Explore the dataset again
m_train = X_train.shape[0]
num_px = X_train.shape[1]
m_test = X_test.shape[0]
m_val = X_val.shape[0]

print ("Number of training samples: " + str(m_train))
print ("Number of testing samples: " + str(m_test))
print ("Number of validation samples: " + str(m_val))
print ("X_train shape: " + str(X_train.shape))
print ("y_train shape: " + str(y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("y_test shape: " + str(y_test.shape))
print ("X_val shape: " + str(X_val.shape))
print ("y_val shape: " + str(y_val.shape))

Number of training samples: 316
Number of testing samples: 99
Number of validation samples: 80
X_train shape: (316, 22, 13, 1)
y_train shape: (316,)
X_test shape: (99, 22, 13, 1)
y_test shape: (99,)
X_val shape: (80, 22, 13, 1)
y_val shape: (80,)


In [51]:
y_test[:10]

array([1, 4, 0, 0, 2, 0, 2, 1, 0, 1])

In [57]:
def reformat_y(y):
    y = LabelEncoder().fit_transform(y)
    y = tf.keras.utils.to_categorical(y)
    return y

In [58]:
train_y = reformat_y(y_train)
test_y = reformat_y(y_test)
val_y = reformat_y(y_val)

In [59]:
test_y.shape

(99, 5)

In [60]:
baseline_model = tf.keras.models.Sequential()
baseline_model.add(tf.keras.layers.InputLayer(input_shape=(X_train[0].shape)))
baseline_model.add(tf.keras.layers.Flatten())
baseline_model.add(tf.keras.layers.BatchNormalization())
baseline_model.add(tf.keras.layers.Dense(5, activation='softmax'))
baseline_model.compile(loss='categorical_crossentropy',
                       optimizer='adam',
                       metrics=['accuracy'])
baseline_model.fit(X_train, 
                   train_y, 
                   batch_size=128, 
                   epochs=30,
                   validation_data=(X_val, val_y), 
                   callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)])

Epoch 1/30
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7ff46ce92d10>

In [61]:
baseline_model.summary()

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_8 (Flatten)          (None, 286)               0         
_________________________________________________________________
batch_normalization_8 (Batch (None, 286)               1144      
_________________________________________________________________
dense_8 (Dense)              (None, 5)                 1435      
Total params: 2,579
Trainable params: 2,007
Non-trainable params: 572
_________________________________________________________________


In [None]:
visualise_results()