# Classifying Urban sounds using Deep Learning

## 2. Model Training and Evaluation

In [171]:
# retrieve the preprocessed data from previous notebook
%store -r x_train 
%store -r x_test 
%store -r y_train 
%store -r y_test 
%store -r yy 
%store -r le

### Initial model architecture - MLP
We will start with constructing a Multilayer Perceptron (MLP) Neural Network using Keras and a Tensorflow backend.

Starting with a sequential model so we can build the model layer by layer.

We will begin with a simple model architecture, consisting of three layers, an input layer, a hidden layer and an output layer. All three layers will be of the dense layer type which is a standard layer type that is used in many cases for neural networks.

The first layer will receive the input shape. As each sample contains 40 MFCCs (Mel Frequency Cepstral Coefficents)(or columns) we have a shape of (1x40) this means we will start with an input shape of 40.

The first two layers will have 256 nodes. The activation function we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. This activation function has been proven to work well in neural networks.

We will also apply a Dropout value of 50% on our first two layers. This will randomly exclude nodes from each update cycle which in turn results in a network that is capable of better generalisation and is less likely to overfit the training data.

Our output layer will have 10 nodes (num_labels) which matches the number of possible classifications. The activation is for our output layer is softmax. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.

In [223]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

num_labels = yy.shape[1]
filter_size = 4

# Construct model 
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(num_labels))
model.add(Activation('softmax'))

## Compiling the model
For compiling our model, we will use the following three parameters:

Loss function - we will use categorical_crossentropy. This is the most common choice for classification. A lower score indicates that the model is performing better.

Metrics - we will use the accuracy metric which will allow us to view the accuracy score on the validation data when we train the model.

Optimizer - here we will use adam which is a generally good optimizer for many use cases.

In [224]:
# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

In [225]:
# Display model architecture summary 
model.summary()

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=0)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_13 (Dense)             (None, 256)               10496     
_________________________________________________________________
activation_13 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_9 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_14 (Dense)             (None, 256)               65792     
_________________________________________________________________
activation_14 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_15 (Dense)             (None, 10)               

### Training
Here we will train the model.

We will start with 100 epochs which is the number of times the model will cycle through the data. The model will improve on each cycle until it reaches a certain point.

We will also start with a low batch size, as having a large batch size can reduce the generalisation ability of the model.

In [226]:
from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 100
num_batch_size = 32

checkpointer = ModelCheckpoint(filepath='../Capstone_Project_2/weights.best.basic_mlp.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)

Train on 3804 samples, validate on 1631 samples
Epoch 1/100

Epoch 00001: val_loss improved from inf to 1.88409, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 2/100

Epoch 00002: val_loss improved from 1.88409 to 1.63949, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 3/100

Epoch 00003: val_loss improved from 1.63949 to 1.40477, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 4/100

Epoch 00004: val_loss improved from 1.40477 to 1.22706, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 5/100

Epoch 00005: val_loss improved from 1.22706 to 1.12922, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 6/100

Epoch 00006: val_loss improved from 1.12922 to 0.99106, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 7/100

Epoch 00007: val_loss improved from 0.99106 to 0.90834, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 8/100

## Test the model
Here we will review the accuracy of the model on both the training and test data sets.

In [227]:
# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

Training Accuracy:  0.9978969693183899
Testing Accuracy:  0.9270386099815369


In [228]:

import librosa 
import numpy as np 

def extract_feature(file_name):
   
    try:
        audio_data, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    return np.array([mfccsscaled])

In [229]:

def print_prediction(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    print("The predicted class is:", predicted_class[0], '\n') 

    predicted_proba_vector = model.predict_proba(prediction_feature) 
    predicted_proba = predicted_proba_vector[0]
    for i in range(len(predicted_proba)): 
        category = le.inverse_transform(np.array([i]))
        print(category[0], "\t\t : ", format(predicted_proba[i], '.32f') )

### Test each class

In [253]:
# Drilling
filename = '../Capstone_Project_2/test/Test/4060.wav' 
print_prediction(filename)

The predicted class is: drilling 

air_conditioner 		 :  0.00000000000000000000000000000027
car_horn 		 :  0.00000000000000002903503873479124
children_playing 		 :  0.00000000000000000000115494052993
dog_bark 		 :  0.00000000000001462043410871503163
drilling 		 :  0.99821376800537109375000000000000
engine_idling 		 :  0.00000000000000000000000000000002
gun_shot 		 :  0.00000000000000000000092428992010
jackhammer 		 :  0.00178629334550350904464721679688
siren 		 :  0.00000000000000000000000000000324
street_music 		 :  0.00000000000000039606056238639207


In [254]:
# Children Playing
filename = '../Capstone_Project_2/test/Test/7128.wav' 
print_prediction(filename)

The predicted class is: children_playing 

air_conditioner 		 :  0.00000000000032833062440715266028
car_horn 		 :  0.00000015141870335355633869767189
children_playing 		 :  0.98311859369277954101562500000000
dog_bark 		 :  0.00007684771117055788636207580566
drilling 		 :  0.00000007332023699291312368586659
engine_idling 		 :  0.00000000000007794520508631191946
gun_shot 		 :  0.01669478975236415863037109375000
jackhammer 		 :  0.00000000000000237326074557211910
siren 		 :  0.00000000000653985763113262841273
street_music 		 :  0.00010941340588033199310302734375


In [258]:
# Street Music
filename = '../Capstone_Project_2/test/Test/3335.wav' 
print_prediction(filename)

The predicted class is: street_music 

air_conditioner 		 :  0.00770732527598738670349121093750
car_horn 		 :  0.00000438925690104952082037925720
children_playing 		 :  0.00181827484630048274993896484375
dog_bark 		 :  0.00021549727534875273704528808594
drilling 		 :  0.23018267750740051269531250000000
engine_idling 		 :  0.00080198544310405850410461425781
gun_shot 		 :  0.00001067862194759072735905647278
jackhammer 		 :  0.00843258295208215713500976562500
siren 		 :  0.00002845075505319982767105102539
street_music 		 :  0.75079816579818725585937500000000


In [259]:
# Dog bark
filename = '../Capstone_Project_2/test/Test/281.wav' 
print_prediction(filename)

The predicted class is: dog_bark 

air_conditioner 		 :  0.00000000000000000000000000000000
car_horn 		 :  0.00000000000000000254738789596014
children_playing 		 :  0.00000000000000000005841554009493
dog_bark 		 :  1.00000000000000000000000000000000
drilling 		 :  0.00000000000000000017729706531400
engine_idling 		 :  0.00000000000000000000000000000000
gun_shot 		 :  0.00000000000044505501714531270352
jackhammer 		 :  0.00000000000000000000000000000000
siren 		 :  0.00000000000000000002195405523493
street_music 		 :  0.00000000000000000000000000000002


In [264]:
# Gun shot
filename = '../Capstone_Project_2/test/Test/7117.wav' 
print_prediction(filename)

The predicted class is: gun_shot 

air_conditioner 		 :  0.00000000000000007833223053353841
car_horn 		 :  0.00000000000000000905931515590526
children_playing 		 :  0.00001230783709615934640169143677
dog_bark 		 :  0.05733761936426162719726562500000
drilling 		 :  0.00000017433109178455197252333164
engine_idling 		 :  0.00000000000000000000000000216941
gun_shot 		 :  0.94264984130859375000000000000000
jackhammer 		 :  0.00000000000000000000018524611116
siren 		 :  0.00000000002287933384415019588687
street_music 		 :  0.00000000005439900266357433622488


In [281]:
#air conditioner 
filename = '../Capstone_Project_2/test/Test/1127.wav' 
print_prediction(filename)

The predicted class is: air_conditioner 

air_conditioner 		 :  0.41991049051284790039062500000000
car_horn 		 :  0.00000048313881961803417652845383
children_playing 		 :  0.24721004068851470947265625000000
dog_bark 		 :  0.10683305561542510986328125000000
drilling 		 :  0.00000125928022498555947095155716
engine_idling 		 :  0.01999645680189132690429687500000
gun_shot 		 :  0.00118373206350952386856079101562
jackhammer 		 :  0.00002046230110863689333200454712
siren 		 :  0.00000095013041345737292431294918
street_music 		 :  0.20484319329261779785156250000000


In [283]:
#car horn
filename = '../Capstone_Project_2/test/Test/1102.wav' 
print_prediction(filename)

The predicted class is: car_horn 

air_conditioner 		 :  0.00000000000000000000000000000001
car_horn 		 :  1.00000000000000000000000000000000
children_playing 		 :  0.00000000000000000000000000000000
dog_bark 		 :  0.00000000000000000000000000000000
drilling 		 :  0.00000000000000000000000000000000
engine_idling 		 :  0.00000000000000000000000000000000
gun_shot 		 :  0.00000000000000000000000000000000
jackhammer 		 :  0.00000000000000000000000000000000
siren 		 :  0.00000000000000000000000000000000
street_music 		 :  0.00000000000000000000000000000000


In [272]:
#engine idling
filename = '../Capstone_Project_2/test/Test/107.wav' 
print_prediction(filename)

The predicted class is: engine_idling 

air_conditioner 		 :  0.00023651641095057129859924316406
car_horn 		 :  0.00000987872226687613874673843384
children_playing 		 :  0.00071573175955563783645629882812
dog_bark 		 :  0.00007891857967479154467582702637
drilling 		 :  0.00000005278201697933582181576639
engine_idling 		 :  0.98152655363082885742187500000000
gun_shot 		 :  0.00001562062607263214886188507080
jackhammer 		 :  0.00000114825729724543634802103043
siren 		 :  0.00003644346361397765576839447021
street_music 		 :  0.01737919263541698455810546875000


In [273]:
#siren
filename = '../Capstone_Project_2/test/Test/106.wav' 
print_prediction(filename)

The predicted class is: siren 

air_conditioner 		 :  0.00000000000002614255552757949880
car_horn 		 :  0.00000000000044638866713263281039
children_playing 		 :  0.00000000130105382023515403489000
dog_bark 		 :  0.00026945961872115731239318847656
drilling 		 :  0.00000000001841969979321511630133
engine_idling 		 :  0.00000000000063600188483781128213
gun_shot 		 :  0.00000000128683586009259443017072
jackhammer 		 :  0.00000000000066077492869714982149
siren 		 :  0.99973052740097045898437500000000
street_music 		 :  0.00000001034495422658210372901522


In [279]:
#jackhammer
filename = '../Capstone_Project_2/test/Test/1099.wav' 
print_prediction(filename)

The predicted class is: jackhammer 

air_conditioner 		 :  0.00000092970526566205080598592758
car_horn 		 :  0.00000000000003573262359287711354
children_playing 		 :  0.00000000005342382439210702216315
dog_bark 		 :  0.00000000000010555459636777189680
drilling 		 :  0.00000000000000000582313272310129
engine_idling 		 :  0.00000000000000328454156539213175
gun_shot 		 :  0.00000000000102864331635865724479
jackhammer 		 :  0.99999904632568359375000000000000
siren 		 :  0.00000000000015835591301791018815
street_music 		 :  0.00000000000242116613216603049352


### Observations
The performance of our initial model is satisfactory and has generalised well, seeming to predict well when tested against new audio data.

In [240]:
testdata = pd.read_csv('../Capstone_Project_2/test/test.csv')

In [241]:
testdata['ID'] = testdata['ID'] .astype(str)+'.wav'

In [242]:
testdata.to_csv('test.csv')

## Predict the classes in the given test set(test.csv)

In [243]:
Class_Name = []
def print_pred(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    Class_Name.append(predicted_class[0])

In [244]:
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
fulldatasetpath = '../Capstone_Project_2/test/Test/'

testdata = pd.read_csv('../Capstone_Project_2/test.csv')

features = []

# Iterate through each sound file and extract the features 
for index, row in testdata.iterrows():
    
    file_name = os.path.join(os.path.abspath(fulldatasetpath)+'/',str(row["ID"]))
    data = print_pred(file_name)



In [245]:
Class=pd.DataFrame(Class_Name)

In [246]:
testdata['Class']=Class

In [247]:
testdata = testdata[['Class', 'ID']]

In [248]:
testdata["ID"]= testdata["ID"].str.split(".", n = 1, expand = True) 


In [249]:
test.to_csv('test.csv')

### The submission score on https://datahack.analyticsvidhya.com/contest/practice-problem-urban-sound-classification/my-submissions is 0.88