# Question A4

In this section, we will understand the utility of such a neural network in real world scenarios.

#### Please use the real record data named ‘record.wav’  as a test sample. Preprocess the data using the provided preprocessing script (data_preprocess.ipynb) and prepare the dataset.
Do a model prediction on the sample test dataset and obtain the predicted label using a threshold of 0.5. The model used is the optimized pretrained model using the selected optimal batch size and optimal number of neurons.
Find the most important features on the model prediction for the test sample using SHAP. Plot the local feature importance with a force plot and explain your observations.  (Refer to the documentation and these three useful references:
https://christophm.github.io/interpretable-ml-book/shap.html#examples-5,
https://towardsdatascience.com/deep-learning-model-interpretation-using-shap-a21786e91d16,  
https://medium.com/mlearning-ai/shap-force-plots-for-classification-d30be430e195)



1. Firstly, we import relevant libraries.

In [1]:
import tqdm
import time
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from scipy.io import wavfile as wav

from sklearn import preprocessing
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
from common_utils import set_seed

# setting seed
set_seed()

To reduce repeated code, place your
network (MLP defined in QA1)
torch datasets (CustomDataset defined in QA1)
loss function (loss_fn defined in QA1)
in a separate file called common_utils.py

Import them into this file. You will not be repenalised for any error in QA1 here as the code in QA1 will not be remarked.

The following code cell will not be marked.


In [2]:
# YOUR CODE HERE
import librosa
import soundfile as sf

import numpy as np
import pandas as pd

import os
from os import listdir
from os.path import isfile, join

from collections import OrderedDict

import json

def extract_features(filepath):
    
    '''
    Source: https://github.com/danz1ka19/Music-Emotion-Recognition/blob/master/Feature-Extraction.py
    Modified to process a single file

        function: extract_features
        input: path to mp3 files
        output: csv file containing features extracted

        This function reads the content in a directory and for each audio file detected
        reads the file and extracts relevant features using librosa library for audio
        signal processing
    '''

    feature_set = {}  # Features

    # Reading audio file
    y, sr = librosa.load(filepath)
    S = np.abs(librosa.stft(y, n_fft=512)) 
    # https://librosa.org/doc/main/generated/librosa.stft.html (set 512 for speech processing)

    # Extracting Features
    tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
    chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr, n_fft=512)
    
    chroma_cq = librosa.feature.chroma_cqt(y=y, sr=sr)
    
    chroma_cens = librosa.feature.chroma_cens(y=y, sr=sr)
    melspectrogram = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=512)
    rmse = librosa.feature.rms(y=y)[0]
    cent = librosa.feature.spectral_centroid(y=y, sr=sr, n_fft=512)
    spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr, n_fft=512)
    contrast = librosa.feature.spectral_contrast(S=S, sr=sr, n_fft=512)
    rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr, n_fft=512)
    poly_features = librosa.feature.poly_features(S=S, sr=sr, n_fft=512)
    
    tonnetz = librosa.feature.tonnetz(y=y, sr=sr)
    
    zcr = librosa.feature.zero_crossing_rate(y)
    harmonic = librosa.effects.harmonic(y)
    percussive = librosa.effects.percussive(y)

    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_fft=512)
    mfcc_delta = librosa.feature.delta(mfcc)

    onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
    frames_to_time = librosa.frames_to_time(onset_frames[:20], sr=sr)

    # Concatenating Features into one csv and json format
    feature_set['filename'] = filepath  # song name
    feature_set['tempo'] = tempo  # tempo 
    feature_set['total_beats'] = sum(beats)  # beats
    feature_set['average_beats'] = np.average(beats)
    feature_set['chroma_stft_mean'] = np.mean(chroma_stft)  # chroma stft
    feature_set['chroma_stft_var'] = np.var(chroma_stft)
    
    feature_set['chroma_cq_mean'] = np.mean(chroma_cq)  # chroma cq
    feature_set['chroma_cq_var'] = np.var(chroma_cq)
    
    feature_set['chroma_cens_mean'] = np.mean(chroma_cens)  # chroma cens
    feature_set['chroma_cens_var'] = np.var(chroma_cens)
    feature_set['melspectrogram_mean'] = np.mean(melspectrogram)  # melspectrogram
    feature_set['melspectrogram_var'] = np.var(melspectrogram)
    feature_set['mfcc_mean'] = np.mean(mfcc)  # mfcc
    feature_set['mfcc_var'] = np.var(mfcc)
    feature_set['mfcc_delta_mean'] = np.mean(mfcc_delta)  # mfcc delta
    feature_set['mfcc_delta_var'] = np.var(mfcc_delta)
    feature_set['rmse_mean'] = np.mean(rmse)  # rmse
    feature_set['rmse_var'] = np.var(rmse)
    feature_set['cent_mean'] = np.mean(cent)  # cent
    feature_set['cent_var'] = np.var(cent)
    feature_set['spec_bw_mean'] = np.mean(spec_bw)  # spectral bandwidth
    feature_set['spec_bw_var'] = np.var(spec_bw)
    feature_set['contrast_mean'] = np.mean(contrast)  # contrast
    feature_set['contrast_var'] = np.var(contrast)
    feature_set['rolloff_mean'] = np.mean(rolloff)  # rolloff
    feature_set['rolloff_var'] = np.mean(rolloff)
    feature_set['poly_mean'] = np.mean(poly_features)  # poly features
    feature_set['poly_var'] = np.mean(poly_features)
    
    feature_set['tonnetz_mean'] = np.mean(tonnetz)  # tonnetz
    feature_set['tonnetz_var'] = np.var(tonnetz)
    
    feature_set['zcr_mean'] = np.mean(zcr)  # zero crossing rate
    feature_set['zcr_var'] = np.var(zcr)
    feature_set['harm_mean'] = np.mean(harmonic)  # harmonic
    feature_set['harm_var'] = np.var(harmonic)
    feature_set['perc_mean'] = np.mean(percussive)  # percussive
    feature_set['perc_var'] = np.var(percussive)
    feature_set['frame_mean'] = np.mean(frames_to_time)  # frames
    feature_set['frame_var'] = np.var(frames_to_time)
    
    for ix, coeff in enumerate(mfcc):
        feature_set['mfcc' + str(ix) + '_mean'] = coeff.mean()
        feature_set['mfcc' + str(ix) + '_var'] = coeff.var()
    
    return feature_set


2. Install and import shap

In [3]:
# YOUR CODE HERE
import shap

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


3. Read the csv data preprocessed from 'record.wav', using variable name 'df', and fill the size of 'df' in 'size_row' and 'size_column'.

In [4]:
new_features_dict = extract_features('record.wav')
df = pd.DataFrame([new_features_dict])
df.to_csv('./new_record.csv', index=False)

df = pd.read_csv('new_record.csv')
size_row, size_column = df.shape

 4.  Preprocess to obtain the test data, save the test data as numpy array.

In [7]:
from common_utils import preprocess_dataset, split_dataset

class CustomDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

def preprocess(X_train, df):
    """preprocess your dataset to obtain your test dataset, remember to remove the 'filename' as Q1
    """
    df = df.drop(columns=['filename'])
    X_train_scaled, X_test_scaled_eg = preprocess_dataset(X_train, df)
    return X_test_scaled_eg

# for training the model
simplified = pd.read_csv('simplified.csv')
simplified['label'] = simplified['filename'].str.split('_').str[-2]
simplified['label'].value_counts()

X_train_simplified, y_train_simplified, X_test_simplified, y_test_simplified = split_dataset(simplified, 'filename', 0.30, 1)
X_train_simplified = X_train_simplified.drop(columns=['label'])
X_test_simplified = X_test_simplified.drop(columns=['label'])
X_train_scaled_simplified, X_test_scaled_simplified = preprocess_dataset(X_train_simplified, X_test_simplified)
# for training the model

X_test_scaled_eg = preprocess(X_train_scaled_simplified, df)

def train(model, X_train_scaled, y_train2, X_val_scaled, y_val2, batch_size):
    
    epochs = 100
    times = []
    train_dataloader, test_dataloader = intialise_loaders(X_train_scaled, y_train2, X_val_scaled, y_val2)
    
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    loss_fn = nn.CrossEntropyLoss()

    tr_loss, tr_correct = [], []
    te_loss, te_correct = [], []
    for t in range(epochs):
        train_loss, train_correct = train_loop(train_dataloader, model, loss_fn, optimizer)
        test_loss, test_correct = test_loop(test_dataloader, model, loss_fn)

        tr_loss.append(train_loss), tr_correct.append(train_correct)
        te_loss.append(test_loss), te_correct.append(test_correct)
        times.append(t+1)
        
        print(f"Epoch {t+1}: Train_accuracy: {(100*train_correct):>0.2f}%, Train_loss: {train_loss:>8f}, Test_accuracy: {(100*test_correct):>0.2f}%, Test_loss: {test_loss:>8f}")
        
    train_accuracies = tr_correct
    train_losses = tr_loss
    test_accuracies = te_correct
    test_losses = te_loss

    return train_accuracies, train_losses, test_accuracies, test_losses, times

def intialise_loaders(X_train_scaled, y_train, X_test_scaled, y_test):

    train_data = CustomDataset(X_train_scaled,y_train)
    test_data = CustomDataset(X_test_scaled,y_test)
    
    train_dataloader = DataLoader(train_data, batch_size=1024, shuffle=True)
    test_dataloader = DataLoader(test_data, batch_size=1024, shuffle=True)
    
    return train_dataloader, test_dataloader

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    train_loss, train_correct = 0, 0
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
        train_correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    
    train_loss /= num_batches
    train_correct /=size

    return train_loss, train_correct

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, test_correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            test_correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    test_correct /= size
    
    return test_loss, test_correct

class FirstHiddenLayerMLP(nn.Module):

    def __init__(self, no_features, no_hidden_first_layer, no_labels):
        super().__init__()
        self.mlp_stack = nn.Sequential(
            nn.Linear(no_features, no_hidden_first_layer),
            nn.ReLU(),
            nn.Dropout(p=0.2),
            nn.Linear(no_hidden_first_layer, 128),
            nn.ReLU(),
            nn.Dropout(p=0.2),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Dropout(p=0.2),
            nn.Linear(128, no_labels),
            nn.Sigmoid()
        )

    def forward(self, x):
         logits = self.mlp_stack(x)
         return logits

model = FirstHiddenLayerMLP(77,256,2)
train_accuracies, train_losses, test_accuracies, test_losses, times = train(model, X_train_scaled_simplified, y_train_simplified, X_test_scaled_simplified, y_test_simplified, 1024)

X has feature names, but StandardScaler was fitted without feature names


Epoch 1: Train_accuracy: 51.55%, Train_loss: 0.690996, Test_accuracy: 52.65%, Test_loss: 0.689275
Epoch 2: Train_accuracy: 55.48%, Train_loss: 0.684479, Test_accuracy: 55.61%, Test_loss: 0.683961
Epoch 3: Train_accuracy: 58.12%, Train_loss: 0.677480, Test_accuracy: 57.74%, Test_loss: 0.675777
Epoch 4: Train_accuracy: 58.75%, Train_loss: 0.669288, Test_accuracy: 58.90%, Test_loss: 0.670894
Epoch 5: Train_accuracy: 60.32%, Train_loss: 0.662362, Test_accuracy: 59.54%, Test_loss: 0.668089
Epoch 6: Train_accuracy: 61.90%, Train_loss: 0.652254, Test_accuracy: 60.31%, Test_loss: 0.665861
Epoch 7: Train_accuracy: 62.97%, Train_loss: 0.650646, Test_accuracy: 60.12%, Test_loss: 0.666380
Epoch 8: Train_accuracy: 64.12%, Train_loss: 0.637956, Test_accuracy: 62.02%, Test_loss: 0.653357
Epoch 9: Train_accuracy: 64.75%, Train_loss: 0.633035, Test_accuracy: 60.78%, Test_loss: 0.662860
Epoch 10: Train_accuracy: 64.89%, Train_loss: 0.630436, Test_accuracy: 61.53%, Test_loss: 0.658837
Epoch 11: Train_acc

5. Do a model prediction on the sample test dataset and obtain the predicted label using a threshold of 0.5. The model used is the optimized pretrained model using the selected optimal batch size and optimal number of neurons. Note: Please define the variable of your final predicted label as 'pred_label'.

In [9]:
X_test_scaled_eg = torch.tensor(X_test_scaled_eg, dtype=torch.float32)
model.eval()
with torch.no_grad():
    predictions = model(X_test_scaled_eg)
threshold = 0.5
predicted_labels = (predictions > threshold).type(torch.long)

predicted_label_array = predicted_labels.numpy()
pred_label = np.argmax(predicted_label_array)
print("pred_label is " + str(pred_label))

# The one hot vector has the value of 1 on the zeroth column.
# Zeroth column is for negative class and first column is for positive class.
# So record.wav has been classified as having negative sentiment.

pred_label is 0


To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).


6. Find the most important features on the model prediction for your test sample using SHAP. Create an instance of the DeepSHAP which is called DeepExplainer using traianing dataset: https://shap-lrjball.readthedocs.io/en/latest/generated/shap.DeepExplainer.html.

Plot the local feature importance with a force plot and explain your observations.  (Refer to the documentation and these three useful references:
https://christophm.github.io/interpretable-ml-book/shap.html#examples-5,
https://towardsdatascience.com/deep-learning-model-interpretation-using-shap-a21786e91d16,  
https://medium.com/mlearning-ai/shap-force-plots-for-classification-d30be430e195)


In [13]:
'''
Fit the explainer on a subset of the data (you can try all but then gets slower)
Return approximate SHAP values for the model applied to the data given by X.
Plot the local feature importance with a force plot and explain your observations.
'''

'\nFit the explainer on a subset of the data (you can try all but then gets slower)\nReturn approximate SHAP values for the model applied to the data given by X.\nPlot the local feature importance with a force plot and explain your observations.\n'