# SOUND SIGNAL CLASSIFICATION USING DEEP LEARNING PROJECT

This project consists of 3 main steps:

- Step 1. We will prepare our dataset for analysis and extract sound signal features from  audio files using Mel-Frequency Cepstral Coefficients(MFCC).
- Step 2. Then we will build a Convolutional Neural Networks (CNN) model and train our model with our dataset. 
- Step 3: Finally We Predict an Audio File's Class Using Our CNN Deep Learning Model



We will use UrbanSound8K Dataset, download Link is here: https://urbansounddataset.weebly.com/download-urbansound8k.html

Dataset folder and this source code should be on same directory..

Don't forget to install librosa library using anaconda promt with the following command line:

conda install -c conda-forge librosa

<IMG src="audiosignal.png" width="500" height="250">
    

        

### Step 1: We will prepare our dataset for analysis and extract sound signal features from  audio files using Mel-Frequency Cepstral Coefficients(MFCC).

Every signal has its own characteristics. In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.

You can get detailed info about MFC on : https://www.youtube.com/watch?v=4_SH2nfbQZ8&t=0s

So by using librosa library we will get characteristics of every audio signal in our dataset and hold them in a list.

In [None]:
import tensorflow as tf
print(tf.__version__)

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import librosa
import numpy as np
import os, fnmatch
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten
from tensorflow.keras.optimizers import Adam
from sklearn import metrics
from tensorflow.keras.callbacks import ModelCheckpoint



In [None]:
# First I want to show to how librosa handles sound signals.
# Let's read an example audio signal using librosa
audio_file_path='UrbanSound8K/17973-2-0-32.wav'

librosa_audio_data, librosa_sample_rate = librosa.load(audio_file_path)

In [None]:
# An important thing you should know about librosa is librosa converts any stereo(2 channel) signal into mono(single channel). 
# So librosa converted signal data is one dimensional since it converts all signals(2 channels) into single channel(mono) 
# and get signal characteristics of your sound file over this mono signal form..

print(librosa_audio_data)

In [None]:
librosa_audio_data.shape

In [None]:
# Plot the librosa audio data
# Audio with 1 channel 
plt.figure(figsize=(10, 4))
plt.plot(librosa_audio_data)
plt.show()

In [None]:
librosa_sample_rate

#### Here We will Extract Features of all Sound  Signals in the UrbanSound8K Dataset

Now we will calculate the Mel-Frequency Cepstral Coefficients(MFCC) of the audio samples. The MFCC calculate the frequency distribution across the window size, so it is possible to analyse both the frequency and time characteristics of the sound. Using this audio signal characteristics we can identify audio features for classification.


In [None]:
mfccs = librosa.feature.mfcc(y=librosa_audio_data, sr=librosa_sample_rate, n_mfcc=45)   #n_mfcc: number of MFCCs to return 
print(mfccs.shape)

In [None]:
mfccs

In [None]:
# The function for extracting MFC coefficients from signals using librosa:

def features_extractor(file_name):
    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
    mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=45)
    mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)    
    return mfccs_scaled_features

In [None]:
# In order to find all the files in directory:
def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


dataset =[]                
                
for filename in find_files("UrbanSound8K/audio", "*.wav"):    
#    print("Found wav source:", filename)
    label = filename.split(".wav")[0][-5]
    if label == '-':
        label = filename.split(".wav")[0][-6]
    dataset.append({"file_name" : filename, "label" : label})
  
    
    
dataset

In [None]:
dataset = pd.DataFrame(dataset)

dataset.head()


In [None]:
dataset.shape

In [None]:
# Let's iterate every sound file and extract features using MFC Coefficients of librosa 
# using features_extractor method we defined above:
extracted_features=[]

dataset['data'] = dataset['file_name'].apply(features_extractor)


In [None]:
dataset.head()

In [None]:
# Let's change column names:
dataset = dataset.rename(columns={'label': 'class'})
dataset = dataset.rename(columns={'data': 'feature'})

In [None]:
dataset.head()

In [None]:
# Dropping unnecessary column from dataframe..
dataset.drop(['file_name'], axis=1, inplace=True)

In [None]:
dataset.head()

In [None]:
# We will convert extracted_features to Pandas dataframe
extracted_features_df = pd.DataFrame(dataset,columns=['class','feature'])
extracted_features_df.head()

### Defining Train and Validation Test Subsets

In [None]:
# We then split the dataset into independent and dependent dataset
X=np.array(extracted_features_df['feature'].tolist())
y=np.array(extracted_features_df['class'].tolist())

In [None]:
X.shape

In [None]:
X

In [None]:
y

In [None]:
y.shape

In [None]:
# We should perform Label Encoding since we need one hot encoded values for output classes in our model (1s and 0s)

# Please remember one-hot encoding:
# 1 0 0 0 0 0 0 0 0 0 => air_conditioner
# 0 1 0 0 0 0 0 0 0 0 => car_horn
# 0 0 1 0 0 0 0 0 0 0 => children_playing
# 0 0 0 1 0 0 0 0 0 0 => dog_bark
# ...
# 0 0 0 0 0 0 0 0 0 1 => street_music

labelencoder=LabelEncoder()
y=to_categorical(labelencoder.fit_transform(y))

In [None]:
y

In [None]:
y[0]

In [None]:
# We split dataset as Train and Test

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

In [None]:

X_train

In [None]:
y

In [None]:
X_train.shape

In [None]:
X_test.shape

In [None]:
y_train.shape

In [None]:
y_test.shape

### Step 2: We will Build a Convolutional Neural Network (CNN) Model and Train Our Model with processed sound signals of UrbanSound8K Dataset.


In [None]:
# How many classes we have? We should  use it in our model
num_labels = 10

In [None]:
# Now we start building our CNN model..

model=Sequential()
# 1. hidden layer
model.add(Dense(125,input_shape=(45,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
# 2. hidden layer
model.add(Dense(250))
model.add(Activation('relu'))
model.add(Dropout(0.5))
# 3. hidden layer
model.add(Dense(125))
model.add(Activation('relu'))
model.add(Dropout(0.5))

# output layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))

In [None]:
model.summary()

In [None]:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

In [None]:
# Trianing the model

epochscount = 300
num_batch_size = 32

model.fit(X_train, y_train, batch_size=num_batch_size, epochs=epochscount, validation_data=(X_test, y_test), verbose=1)


In [None]:
validation_test_set_accuracy = model.evaluate(X_test,y_test,verbose=0)
print(validation_test_set_accuracy[1])

In [None]:
X_test[1]

In [None]:
model.predict_classes(X_test)

### Step 3: Finally We Predict an Audio File's Class Using Our CNN Deep Learning Model

We first preprocess the new audio data and then predict the class.


You can download example_sound1_children_playing.wav from the link here: https://www.epidemicsound.com/track/qDBYeKjWF0/

You can download example_sound2_siren.wav from the link here: https://www.epidemicsound.com/track/ByOBJyDp8P/


In [None]:
# You can download example_sound1_children_playing.wav from the link here: https://www.epidemicsound.com/track/qDBYeKjWF0/
# You can download example_sound2_siren.wav from the link here: https://www.epidemicsound.com/track/ByOBJyDp8P/

filename="UrbanSound8K/example_sound1_children_playing.wav"
sound_signal, sample_rate = librosa.load(filename, res_type='kaiser_fast') 
mfccs_features = librosa.feature.mfcc(y=sound_signal, sr=sample_rate, n_mfcc=45)
mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)

In [None]:
print(mfccs_scaled_features)

In [None]:
mfccs_scaled_features = mfccs_scaled_features.reshape(1,-1)

In [None]:
mfccs_scaled_features.shape

In [None]:
print(mfccs_scaled_features)

In [None]:
print(mfccs_scaled_features.shape)

In [None]:
result_array = model.predict(mfccs_scaled_features)

In [None]:
result_array

In [None]:
result_classes = ["air_conditioner","car_horn","children_playing","dog_bark","drilling", "engine_idling", "gun_shot", "jackhammer", "siren", "street_music"]

result = np.argmax(result_array[0])
print(result_classes[result]) 