# Classifying Urban sounds using Deep Learning

## 2. Model Training and Evaluation

In [171]:
# retrieve the preprocessed data from previous notebook
%store -r x_train 
%store -r x_test 
%store -r y_train 
%store -r y_test 
%store -r yy 
%store -r le

### Initial model architecture - MLP
We will start with constructing a Multilayer Perceptron (MLP) Neural Network using Keras and a Tensorflow backend.

Starting with a sequential model so we can build the model layer by layer.

We will begin with a simple model architecture, consisting of three layers, an input layer, a hidden layer and an output layer. All three layers will be of the dense layer type which is a standard layer type that is used in many cases for neural networks.

The first layer will receive the input shape. As each sample contains 40 MFCCs (Mel Frequency Cepstral Coefficents)(or columns) we have a shape of (1x40) this means we will start with an input shape of 40.

The first two layers will have 256 nodes. The activation function we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. This activation function has been proven to work well in neural networks.

We will also apply a Dropout value of 50% on our first two layers. This will randomly exclude nodes from each update cycle which in turn results in a network that is capable of better generalisation and is less likely to overfit the training data.

Our output layer will have 10 nodes (num_labels) which matches the number of possible classifications. The activation is for our output layer is softmax. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.

In [172]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

num_labels = yy.shape[1]
filter_size = 4

# Construct model 
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(num_labels))
model.add(Activation('softmax'))

## Compiling the model
For compiling our model, we will use the following three parameters:

Loss function - we will use categorical_crossentropy. This is the most common choice for classification. A lower score indicates that the model is performing better.

Metrics - we will use the accuracy metric which will allow us to view the accuracy score on the validation data when we train the model.

Optimizer - here we will use adam which is a generally good optimizer for many use cases.

In [173]:
# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

In [176]:
# Display model architecture summary 
model.summary()

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=0)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 256)               10496     
_________________________________________________________________
activation_10 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 256)               65792     
_________________________________________________________________
activation_11 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_8 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 10)               

In [177]:
from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 100
num_batch_size = 32

checkpointer = ModelCheckpoint(filepath='../Capstone_Project_2/weights.best.basic_mlp.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)

Train on 3804 samples, validate on 1631 samples
Epoch 1/100

Epoch 00001: val_loss improved from inf to 0.39590, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 2/100

Epoch 00002: val_loss did not improve from 0.39590
Epoch 3/100

Epoch 00003: val_loss improved from 0.39590 to 0.37018, saving model to ../Capstone_Project_2/weights.best.basic_mlp.hdf5
Epoch 4/100

Epoch 00004: val_loss did not improve from 0.37018
Epoch 5/100

Epoch 00005: val_loss did not improve from 0.37018
Epoch 6/100

Epoch 00006: val_loss did not improve from 0.37018
Epoch 7/100

Epoch 00007: val_loss did not improve from 0.37018
Epoch 8/100

Epoch 00008: val_loss did not improve from 0.37018
Epoch 9/100

Epoch 00009: val_loss did not improve from 0.37018
Epoch 10/100

Epoch 00010: val_loss did not improve from 0.37018
Epoch 11/100

Epoch 00011: val_loss did not improve from 0.37018
Epoch 12/100

Epoch 00012: val_loss did not improve from 0.37018
Epoch 13/100

Epoch 00013: val_loss did not

## Test the model
Here we will review the accuracy of the model on both the training and test data sets.

In [178]:
# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

Training Accuracy:  0.9726603627204895
Testing Accuracy:  0.8976088166236877


In [179]:

import librosa 
import numpy as np 

def extract_feature(file_name):
   
    try:
        audio_data, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    return np.array([mfccsscaled])

In [180]:

def print_prediction(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    print("The predicted class is:", predicted_class[0], '\n') 

    predicted_proba_vector = model.predict_proba(prediction_feature) 
    predicted_proba = predicted_proba_vector[0]
    for i in range(len(predicted_proba)): 
        category = le.inverse_transform(np.array([i]))
        print(category[0], "\t\t : ", format(predicted_proba[i], '.32f') )

### Test each class

In [181]:
# Air Conditioner
filename = '../Capstone_Project_2/train/Train/24.wav' 
print_prediction(filename)

The predicted class is: air_conditioner 

air_conditioner 		 :  0.99999701976776123046875000000000
car_horn 		 :  0.00000000050721826738708841730841
children_playing 		 :  0.00000000006495886467616784898382
dog_bark 		 :  0.00000012448894892713724402710795
drilling 		 :  0.00000000008195875228489413188981
engine_idling 		 :  0.00000012672303739691415103152394
gun_shot 		 :  0.00000000144970890847417877012049
jackhammer 		 :  0.00000000492667195928220280620735
siren 		 :  0.00000174470915226265788078308105
street_music 		 :  0.00000099620274340850301086902618


In [182]:
# Car Horn
filename = '../Capstone_Project_2/train/Train/48.wav' 
print_prediction(filename)

The predicted class is: car_horn 

air_conditioner 		 :  0.00000000024389193442608814166306
car_horn 		 :  0.99830448627471923828125000000000
children_playing 		 :  0.00000000001709420625883861788452
dog_bark 		 :  0.00000230480623031326103955507278
drilling 		 :  0.00001547173815197311341762542725
engine_idling 		 :  0.00000000329104077323449928371701
gun_shot 		 :  0.00000000000043488492971685555055
jackhammer 		 :  0.00000226493989430309738963842392
siren 		 :  0.00000284834754893381614238023758
street_music 		 :  0.00167265301570296287536621093750


In [183]:
# Children Playing
filename = '../Capstone_Project_2/train/Train/6.wav' 
print_prediction(filename)

The predicted class is: children_playing 

air_conditioner 		 :  0.00000238092434301506727933883667
car_horn 		 :  0.00001049465845426311716437339783
children_playing 		 :  0.96449857950210571289062500000000
dog_bark 		 :  0.02274562045931816101074218750000
drilling 		 :  0.00015408261970151215791702270508
engine_idling 		 :  0.00019491935381665825843811035156
gun_shot 		 :  0.00047619448741897940635681152344
jackhammer 		 :  0.00000021041492459517030511051416
siren 		 :  0.00335997086949646472930908203125
street_music 		 :  0.00855746027082204818725585937500


In [184]:
# Dog bark
filename = '../Capstone_Project_2/train/Train/4.wav' 
print_prediction(filename)

The predicted class is: dog_bark 

air_conditioner 		 :  0.00125140685122460126876831054688
car_horn 		 :  0.00000000431539559642146741680335
children_playing 		 :  0.00000002903416529420610459055752
dog_bark 		 :  0.97260832786560058593750000000000
drilling 		 :  0.00004252087092027068138122558594
engine_idling 		 :  0.01917824894189834594726562500000
gun_shot 		 :  0.00006298487278399989008903503418
jackhammer 		 :  0.00000019236412640566413756459951
siren 		 :  0.00000428389421358588151633739471
street_music 		 :  0.00685194041579961776733398437500


In [185]:
# Drilling
filename = '../Capstone_Project_2/train/Train/11.wav' 
print_prediction(filename)

The predicted class is: drilling 

air_conditioner 		 :  0.00000000000000000000000000000000
car_horn 		 :  0.00000000000000000000000000002018
children_playing 		 :  0.00000000000000000000000000001705
dog_bark 		 :  0.00000000000000000000000333972981
drilling 		 :  1.00000000000000000000000000000000
engine_idling 		 :  0.00000000000000000000000000000000
gun_shot 		 :  0.00000000000000000000000000000071
jackhammer 		 :  0.00000000000000000000654697016237
siren 		 :  0.00000000000000000000000000000000
street_music 		 :  0.00000000000000000000000222743160


In [186]:
#Engine Idling
filename = '../Capstone_Project_2/train/Train/17.wav' 
print_prediction(filename)

The predicted class is: engine_idling 

air_conditioner 		 :  0.00000004768581618463940685614944
car_horn 		 :  0.00000000000015433634188363742901
children_playing 		 :  0.00000000108367392837038778452552
dog_bark 		 :  0.00000002748255312212677381467074
drilling 		 :  0.00000048815002173796528950333595
engine_idling 		 :  0.99999928474426269531250000000000
gun_shot 		 :  0.00000000000062812591499622483227
jackhammer 		 :  0.00000000019466518674793320542449
siren 		 :  0.00000000001433714952314701918112
street_music 		 :  0.00000015006331466338451718911529


In [187]:
#Gun shot
filename = '../Capstone_Project_2/train/Train/12.wav' 
print_prediction(filename)

The predicted class is: gun_shot 

air_conditioner 		 :  0.00000000000058845826432160630581
car_horn 		 :  0.00000000015187752444578705990352
children_playing 		 :  0.00000516155478180735372006893158
dog_bark 		 :  0.00027733063325285911560058593750
drilling 		 :  0.00000954388724494492635130882263
engine_idling 		 :  0.00000000000037171334803590139195
gun_shot 		 :  0.99970799684524536132812500000000
jackhammer 		 :  0.00000000000006174672318587370867
siren 		 :  0.00000000002131903160951242881538
street_music 		 :  0.00000000023853163888531980774133


In [188]:
#jackhammer
filename = '../Capstone_Project_2/train/Train/33.wav' 
print_prediction(filename)

The predicted class is: jackhammer 

air_conditioner 		 :  0.00001163196793640963733196258545
car_horn 		 :  0.00072179472772404551506042480469
children_playing 		 :  0.00000007185435890733060659840703
dog_bark 		 :  0.00000002709861490757248247973621
drilling 		 :  0.00003591745553421787917613983154
engine_idling 		 :  0.00000067514997681428212672472000
gun_shot 		 :  0.00000335822119268414098769426346
jackhammer 		 :  0.99804413318634033203125000000000
siren 		 :  0.00000027558306214814365375787020
street_music 		 :  0.00118217396084219217300415039062


In [189]:
#siren
filename = '../Capstone_Project_2/train/Train/0.wav' 
print_prediction(filename)

The predicted class is: siren 

air_conditioner 		 :  0.00006121601472841575741767883301
car_horn 		 :  0.00003773438220378011465072631836
children_playing 		 :  0.00002691800727916415780782699585
dog_bark 		 :  0.00152355106547474861145019531250
drilling 		 :  0.00000040077469520838349126279354
engine_idling 		 :  0.00046968838432803750038146972656
gun_shot 		 :  0.00001833475835155695676803588867
jackhammer 		 :  0.00040259279194287955760955810547
siren 		 :  0.99739754199981689453125000000000
street_music 		 :  0.00006207113619893789291381835938


In [190]:
#street_music
filename = '../Capstone_Project_2/train/Train/1.wav' 
print_prediction(filename)

The predicted class is: street_music 

air_conditioner 		 :  0.00000952461596170905977487564087
car_horn 		 :  0.00000125327187561197206377983093
children_playing 		 :  0.00000330592069985868874937295914
dog_bark 		 :  0.00000172932868736097589135169983
drilling 		 :  0.00000392400670534698292613029480
engine_idling 		 :  0.00001271152632398298010230064392
gun_shot 		 :  0.00001185752171295462176203727722
jackhammer 		 :  0.00004030571290059015154838562012
siren 		 :  0.00000001730891163731484994059429
street_music 		 :  0.99991536140441894531250000000000


### Observations
The performance of our initial model is satisfactorry and has generalised well, seeming to predict well when tested against new audio data.

In [191]:
testdata = pd.read_csv('../Capstone_Project_2/test/test.csv')

In [192]:
testdata['ID'] = testdata['ID'] .astype(str)+'.wav'

In [193]:
testdata.to_csv('test.csv')

## Predict the classes in the given test set(test.csv)

In [194]:
Class_Name = []
def print_pred(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    Class_Name.append(predicted_class[0])

In [195]:
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
fulldatasetpath = '../Capstone_Project_2/test/Test/'

testdata = pd.read_csv('../Capstone_Project_2/test.csv')

features = []

# Iterate through each sound file and extract the features 
for index, row in testdata.iterrows():
    
    file_name = os.path.join(os.path.abspath(fulldatasetpath)+'/',str(row["ID"]))
    data = print_pred(file_name)



In [196]:
Class=pd.DataFrame(Class_Name)

In [197]:
testdata['Class']=Class

In [204]:
testdata = testdata[['Class', 'ID']]

In [205]:
testdata["ID"]= testdata["ID"].str.split(".", n = 1, expand = True) 


In [209]:
test.to_csv('test.csv')

### The submission score on https://datahack.analyticsvidhya.com/contest/practice-problem-urban-sound-classification/my-submissions is 0.88