# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

In [None]:
#@title Explanation Video
from IPython.display import HTML

HTML("""<video width="854" and height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Walkthrough/Hackathon_Voice_based.mp4" type="video/mp4">
</video>
""")

# Hackathon: Voice commands based E-commerce ordering system
The goal of the hackathon is to train your model on different types of voice data (such as studio data and your own team data) and able to place order based on user preferences.

## Grading = 40 Marks

### **Objectives:**

Stage 0 - Obtain Features from Audio samples

Stage 1 (22 Marks) - Define and train a CNN model on Studio data and deploy the model in the server 

Stage 2 (18 Marks) - Collect your voice samples (team data) and refine the classifier trained on Studio_data. Deploy the model in the server.

## Dataset Description

The data contains voice samples of classes - Zero, One, Two, Three, Four, Five. Each class is denoted by a numerical label from 0 to 5.

The audio files collected in a Studio dataset contain very few noise samples and all the files are in wav format.

The audio files recorded for the studio are saved with the following naming convention: 

● Class Representation + user_id + sample_ID (or noise + sample_ID)

> For example: The voice sample by the user b2 recorded “Zero”, it is saved as 0_b2_35.wav. Here 35 is sample ID, 2 is the user id and ‘0’ is the label of that sample.




In [1]:
#@title Please run the setup to download the dataset

from IPython import get_ipython
ipython = get_ipython()
  
notebook= "Hackathon3 - Voice E-commerce Ordering System" #name of the notebook

def setup():
    ipython.magic("sx wget https://cdn.iiith.talentsprint.com/aiml/Hackathon_data/B17_studio_rev_data.zip")
    ipython.magic("sx unzip B17_studio_rev_data.zip ")
    print ("Setup completed successfully")

setup()

Setup completed successfully


In [2]:
import os
import sys
import glob
import torch
import librosa
import warnings
import numpy as np
import torch.nn as nn
from time import sleep
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable
from sklearn.model_selection import train_test_split
warnings.filterwarnings('ignore')

## **Stage 0:** Obtain Features from Audio samples
---

### Generate features from an audio sample of '.wav' format
- Code is available to extract the features

In [3]:
# Caution: Do not change the default parameters
def get_features(filepath, sr=8000, n_mfcc=30, n_mels=128, frames = 15):
    # The following function contains code to produce features of the audio sample.  
    y, sr = librosa.load(filepath, sr=sr)
    D = np.abs(librosa.stft(y))**2
    S = librosa.feature.melspectrogram(S=D)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)
    log_S = librosa.power_to_db(S,ref=np.max)
    features = librosa.feature.mfcc(S=log_S, n_mfcc=n_mfcc)
    if features.shape[1] < frames :
        features = np.hstack((features, np.zeros((n_mfcc, frames - features.shape[1]))))
    elif features.shape[1] > frames:
        features = features[:, :frames]

    # Find 1st order delta_mfcc
    delta1_mfcc = librosa.feature.delta(features, order=1)

    # Find 2nd order delta_mfcc
    delta2_mfcc = librosa.feature.delta(features, order=2)

    # Stacking delta_mfcc features in sequence horizontally (column wise)
    features = np.hstack((delta1_mfcc.flatten(), delta2_mfcc.flatten()))

    # Increase the dimension by inserting an axis along second dimension
    features = features.flatten()[:,np.newaxis]
    
    # Convert the numpy.ndarray to a Tensor object
    features = Variable(torch.from_numpy(features)).float()
    return features

All the voice samples needed for training are present in the folder `"studio_data"`

In [4]:
%ls

B17_studio_rev_data.zip  [0m[01;34msample_data[0m/  [01;34mstudio_data[0m/


##**Stage 1**:  Define and train a CNN model on Studio data and deploy the model in the server

---


### a) Extract features of Studio data (4 Marks)

 Load 'Studio data' and extract mfcc features

 **Evaluation Criteria:**

 * Complete the code in the load_data function
 * The function should take path of the folder containing audio samples as input
 * It should return features of all the audio samples present in the specified folder into single array (list of lists or 2-d numpy array) and their respective labels should be returned too

In [5]:
def load_data(folder_path):
  #YOUR CODE HERE
  #return features, labels
  features = []
  labels = []
  # Loop through all files in the folder
  for file_name in os.listdir(folder_path):
      if file_name.endswith('.wav'):
        # Load audio file
        file_path = os.path.join(folder_path, file_name)
        feature_vector = get_features(file_path)
        features.append(feature_vector.numpy())
        labels.append((int((file_name.split("/")[-1]).split("_")[0])))
  return features, labels
    

Load data from studio_data folder for extracting all features and labels

In [6]:
studio_recorded_features, studio_recorded_labels = load_data('/content/studio_data')

Use train_test_split for splitting the train and test data

In [7]:
# YOUR CODE HEREA!mL2578#
X_train,X_test,y_train, y_test= train_test_split(studio_recorded_features,studio_recorded_labels,test_size=0.2,random_state=40)


Load the dataset with DataLoader
- Refer to [torch.utils.data.TensorDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset)
- Refer to [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)

In [8]:
# YOUR CODE HERE for the DataLoader
from torch.utils.data import DataLoader, TensorDataset
batch_size = 28

train_set = torch.utils.data.TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))
test_set = torch.utils.data.TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test))

#loading the train dataset

train_loader = torch.utils.data.DataLoader(train_set,batch_size= batch_size, shuffle= True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size= batch_size, shuffle= True)

In [9]:
torch.FloatTensor(X_train).shape

torch.Size([3183, 900, 1])

### b) Define your CNN architecture (4 Marks)

[Hint](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)

In [10]:
## Define your CNN Architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        # Sample Convolution Layer 1 
        
        self.conv1 = nn.Conv1d(in_channels=900, out_channels=400, kernel_size=1)
        self.bn1 = nn.BatchNorm1d(400)
        self.relu1 = nn.ReLU()

        # Sample Maxpool for the Convolutional Layer 1
        self.maxpool1 = nn.MaxPool1d(kernel_size=1)

        # Sample Dropout Layer
        self.dropout = nn.Dropout(p=0.25)

        # YOUR CODE HERE for defining more number of Convolutional layers with Maxpool as required (Hint: Use at least 2 more convolutional layers for better performance)
        self.conv2 = nn.Conv1d(in_channels=400,out_channels=200, kernel_size=1) 
        self.bn2 = nn.BatchNorm1d(200)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool1d(kernel_size=1)
        
        self.conv3 = nn.Conv1d(in_channels=200, out_channels=100, kernel_size=1) 
        self.bn3 = nn.BatchNorm1d(100)
        self.relu3 = nn.ReLU()
        self.maxpool3 = nn.MaxPool1d(kernel_size=1)
        # YOUR CODE HERE for defining the Fully Connected Layer and also define LogSoftmax

        self.fc1 = nn.Linear(100,50)
        self.fc2 = nn.Linear(50,25)
        self.fc3 = nn.Linear(25,6)
        self.logsoftmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        #print('x size', x.size(0))
        out = self.maxpool1(self.relu1(self.bn1(self.conv1(x))))
        out = self.dropout(out)


        out = self.maxpool2(self.relu2(self.bn2(self.conv2(out))))
        out = self.dropout(out)
        
        out = self.maxpool3(self.relu3(self.bn3(self.conv3(out))))
        out = self.dropout(out)
        #print('Cnn3 shape', out.shape)
        
        #out = out.view(x.size(0), 200 * 8 * 8)
        #out = out.view(out.size(0), -1)
        out = out.view(-1, 100)
        #print('view shape', out.shape)
        #out = self.dropout1(nn.functional.relu(self.fc1(out)))
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        
        # YOUR CODE HERE for returning the output of LogSoftmax after applying Fully Connected Layer
        out = self.logsoftmax(out)
        return out

In [11]:
# To run the training on GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [12]:
model = Net()
model = model.to(device)

#criterion = # YOUR CODE HERE : Explore and declare loss function
criterion = nn.CrossEntropyLoss()

#optimizer = # YOUR CODE HERE : Explore on the optimizer and define with the learning rate
optimizer = optim.Adam(model.parameters(), lr = 0.001)

### c) Train and classify on the studio_data (3 Marks)

The goal here is to train the Model on voice samples collected in studio data and validate it continuously to calculate the loss and accuracy for the train dataset across each epoch.

Iterate over images in the train_loader and perform the following steps. 

1. First, zero out the gradients using zero_grad()

2. Pass the data to the model. Convert the data to GPU before passing data  to the model

3. Calculate the loss using a Loss function

4. Perform Backward pass using backward() to update the weights

5. Optimize and predict by using the torch.max()

6. Calculate the accuracy of the train dataset


In [13]:
# No of Epochs
epoch = 1

# keeping the network in train mode
# model.train()
# train_losses,  train_accuracy = [], []

# # Loop for no of epochs
# for e in range(epoch):
#     train_loss = 0
#     correct = 0
#     # Iterate through all the batches in each epoch
#     for voice, labels in train_loader:
#       voice = voice.to(device)
#       labels = labels.to(device)
#       optimizer.zero_grad()
#       outputs = model(voice)
#       loss = criterion(outputs, labels)
#       train_loss += loss.item()
#       loss.backward()
#       optimizer.step()
#       _, predicted = torch.max(outputs, 1)
#       correct += (predicted == labels).sum().item()
#     train_losses.append(train_loss/len(X_train))
#     train_accuracy.append(100 * correct/len(X_train))
#     print('epoch: {}, Train Loss:{:.6f} Train Accuracy: {:.2f} '.format(e+1,train_losses[-1], train_accuracy[-1]))

In [14]:
# YOUR CODE HERE. This will take time

# Record loss and accuracy of the train dataset
def train_model(num_epochs, model,train_loader,train_set):
  model.train()
  #Define the lists to store the results of loss and accuracy
  train_loss, train_accuracy = [],[] 
  for epoch in range(num_epochs):
    correct, iter_loss = 0, 0.0
    for (inputs, labels) in train_loader:
      inputs, labels = inputs.to(device), labels.to(device)
      optimizer.zero_grad()
      outputs = model(inputs)
      loss = criterion(outputs, labels)
      iter_loss += loss.item()
      loss.backward()
      optimizer.step()
      _, predicted = torch.max(outputs, 1)
      correct += (predicted == labels).sum()
    train_loss.append(iter_loss/len(train_set))
    train_accuracy.append((100 * correct / len(train_set)))
    print('Epoch {}/{}, Training Loss: {:.3f}, Training Accuracy: {:.3f}'.format(epoch+1, num_epochs, train_loss[-1], train_accuracy[-1]))
  return model

In [15]:
trained_model = train_model(20,model,train_loader, train_set)

Epoch 1/20, Training Loss: 0.054, Training Accuracy: 37.606
Epoch 2/20, Training Loss: 0.038, Training Accuracy: 57.053
Epoch 3/20, Training Loss: 0.032, Training Accuracy: 65.253
Epoch 4/20, Training Loss: 0.029, Training Accuracy: 67.923
Epoch 5/20, Training Loss: 0.026, Training Accuracy: 72.604
Epoch 6/20, Training Loss: 0.024, Training Accuracy: 76.249
Epoch 7/20, Training Loss: 0.022, Training Accuracy: 77.160
Epoch 8/20, Training Loss: 0.020, Training Accuracy: 80.773
Epoch 9/20, Training Loss: 0.019, Training Accuracy: 80.867
Epoch 10/20, Training Loss: 0.017, Training Accuracy: 82.532
Epoch 11/20, Training Loss: 0.016, Training Accuracy: 84.354
Epoch 12/20, Training Loss: 0.016, Training Accuracy: 84.386
Epoch 13/20, Training Loss: 0.016, Training Accuracy: 83.412
Epoch 14/20, Training Loss: 0.015, Training Accuracy: 85.328
Epoch 15/20, Training Loss: 0.013, Training Accuracy: 87.842
Epoch 16/20, Training Loss: 0.014, Training Accuracy: 86.773
Epoch 17/20, Training Loss: 0.012

### d) Testing Evaluation for CNN model (3 Marks)

Evaluate model with the given test data

1. Transform and load the test images.

2. Pass the test data through the model (network) to get the outputs

3. Get the predictions from a maximum value using torch.max

4. Compare with the actual labels and get the count of the correct labels

5. Calculate the accuracy based on the count of correct labels

### **Expected testing accuracy is above 80%**

In [16]:
# YOUR CODE HERE to test the model
def test_model(model, test_loader, test_size):
  model.eval()
  test_accuracy = 0
  test_labels, test_predictions = [], []
  for inputs, labels in test_loader:
    
    inputs = inputs.to(device)
    labels = labels.to(device)
    output = model(inputs)
    _, predicted = torch.max(output, 1)
    test_accuracy += (predicted == labels).sum().item()
    test_labels.extend(labels)
    test_predictions.extend(predicted)
  accuracy = 100 * (test_accuracy/len(test_set))
  print( " Accuracy of test data" , accuracy)
  return [i.item() for i in test_labels], [i.item() for i in test_predictions]


In [17]:
test_labels, test_predictions = test_model(trained_model, test_loader, test_set)

 Accuracy of test data 84.17085427135679


### e) Save and download your model (2 Marks)

**Save your model trained on studio data**

* Save the state dictionary of the classifier (use pytorch only), It will be useful in
integrating model to the web application

 [Hint](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [18]:
### YOUR CODE HERE for saving the CNN model
state = {'net_dict' : trained_model.state_dict()}
torch.save(state, 'studio_model.t1')

Download your trained model using the code below
* Give the path of model file to download through the browser

In [None]:
from google.colab import files
files.download('studio_model.t1')

### f) Deploy and evaluate your model trained on Studio Data in the server (6 Marks).

(This can be done on the day of the Hackathon once the login username and password provided by the mentors in the lab) 

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based e-commerce ordering.pdf) for details. 

To order product in user interface, go through the document (3-Hackathon_II Application Interface Documentation.pdf) for details.


**Evaluation Criteria: Four consecutive utterances should be predicted correctly by the model**

- There are two stages in the e-commerce ordering application    
    - Ordering Product
    - Selecting the e-commerce platform
- If both the stages are cleared as per the evaluation criteria you will get
complete marks Otherwise, you will see a reduction in the marks

## **Stage 2:** Collect your voice samples and refine the classifier trained on studio_data and Team_data
---

### a) Collect your Team Voice Samples and extract features (6 Marks)

(This can be done on the day of the Hackathon once the login username and password is given by mentors in the lab)

* In order to collect the team data, ensure the server is active (2-Server Access and File transfer For Voice based e-commerce ordering.pdf)

* Refer document "3-Hackathon_II Application Interface Documentation.pdf" for collecting your team voice samples. These will get stored in your server

**Evaluation Criteria:**
* Load 'Team_data' and extract features
* Combine features of team data with the extracted features of studio data
* Split the combined features into train and test data
* Load the dataset with DataLoader

In [20]:
!mkdir team_data

In [21]:
# Replace <YOUR_GROUP_ID> with your Username given in the lab
!wget -r -A .wav https://aiml-sandbox1.talentsprint.com/audio_recorder/b20h2g13/team_data/ -nH --cut-dirs=100  -P ./team_data

--2023-04-04 14:55:59--  https://aiml-sandbox1.talentsprint.com/audio_recorder/b20h2g13/team_data/
Resolving aiml-sandbox1.talentsprint.com (aiml-sandbox1.talentsprint.com)... 139.162.203.12
Connecting to aiml-sandbox1.talentsprint.com (aiml-sandbox1.talentsprint.com)|139.162.203.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./team_data/index.html.tmp’

index.html.tmp          [ <=>                ]  22.43K  --.-KB/s    in 0s      

2023-04-04 14:55:59 (304 MB/s) - ‘./team_data/index.html.tmp’ saved [22971]

Loading robots.txt; please ignore errors.
--2023-04-04 14:55:59--  https://aiml-sandbox1.talentsprint.com/robots.txt
Reusing existing connection to aiml-sandbox1.talentsprint.com:443.
HTTP request sent, awaiting response... 404 Not Found
2023-04-04 14:55:59 ERROR 404: Not Found.

Removing ./team_data/index.html.tmp since it should be rejected.

--2023-04-04 14:55:59--  https://aiml-sandbox1.talentsprint.com/audio_re

In [22]:
# YOUR CODE HERE to Load data from teamdata folder for extracting all features and labels
team_recorded_features, team_recorded_labels = load_data('/content/team_data')
#team_recorded_labels = np.array(team_recorded_labels).astype('float32')

In [23]:
print(len(team_recorded_features))
len(studio_recorded_features)

175


3979

In [24]:
# Combine the features of all voice samples (studio_data and teamdata)
# YOUR CODE HERE
studio_recorded_features.extend(team_recorded_features)
studio_recorded_labels.extend(team_recorded_labels)
print(len(studio_recorded_features))
len(studio_recorded_labels)

4154


4154

In [25]:
# YOUR CODE HERE to split the combined features into train and test data (Hint: Use train_test_split)
X_train,X_test,y_train, y_test= train_test_split(studio_recorded_features,studio_recorded_labels,test_size=0.2,random_state=40)

In [26]:
# YOUR CODE HERE to load the dataset with DataLoader
train_set = torch.utils.data.TensorDataset(torch.FloatTensor(X_train), torch.LongTensor(y_train))
test_set = torch.utils.data.TensorDataset(torch.FloatTensor(X_test), torch.LongTensor(y_test))
#loading the train dataset
train_loader = torch.utils.data.DataLoader(train_set,batch_size= batch_size, shuffle= True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size= batch_size, shuffle= True)

### b) Classify and download the model (6 Marks)

The goal here is to train and test your model on all voice samples collected in studio and team data

**Evaluation Criteria:**
* Refine your classifier (if needed)
* Train your model on the extracted train data
* Test your model on the extracted test data
* Save and download the trained model

### **Expected testing accuracy is above 80%**

In [27]:
# YOUR CODE HERE for refining your classifier (if needed)
model = Net()
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.001)

In [28]:
# YOUR CODE HERE to train your model
trained_model = train_model(50,model,train_loader, train_set)
# Record loss and accuracy of the train dataset

Epoch 1/50, Training Loss: 0.056, Training Accuracy: 35.841
Epoch 2/50, Training Loss: 0.037, Training Accuracy: 59.103
Epoch 3/50, Training Loss: 0.032, Training Accuracy: 65.995
Epoch 4/50, Training Loss: 0.028, Training Accuracy: 71.351
Epoch 5/50, Training Loss: 0.025, Training Accuracy: 73.187
Epoch 6/50, Training Loss: 0.023, Training Accuracy: 77.189
Epoch 7/50, Training Loss: 0.023, Training Accuracy: 76.587
Epoch 8/50, Training Loss: 0.020, Training Accuracy: 80.319
Epoch 9/50, Training Loss: 0.019, Training Accuracy: 80.620
Epoch 10/50, Training Loss: 0.018, Training Accuracy: 81.162
Epoch 11/50, Training Loss: 0.018, Training Accuracy: 82.395
Epoch 12/50, Training Loss: 0.017, Training Accuracy: 83.057
Epoch 13/50, Training Loss: 0.017, Training Accuracy: 83.780
Epoch 14/50, Training Loss: 0.014, Training Accuracy: 86.277
Epoch 15/50, Training Loss: 0.015, Training Accuracy: 85.495
Epoch 16/50, Training Loss: 0.014, Training Accuracy: 86.639
Epoch 17/50, Training Loss: 0.014

In [29]:
# YOUR CODE HERE to test your model
test_labels, test_predictions = test_model(trained_model, test_loader, test_set)

 Accuracy of test data 84.11552346570397


**Save your trained model**

* Save the state dictionary of the classifier (use pytorch only), It will be useful in
integrating model to the web application

 [Hint](https://pytorch.org/tutorials/beginner/saving_loading_models.html)

In [30]:
### YOUR CODE HERE for saving the CNN model
state = {'net_dict' : trained_model.state_dict()}
torch.save(state, 'final_model.t1')

Download your trained model using the code below
* Give the path of model file to download through the browser

In [31]:
from google.colab import files
files.download('final_model.t1')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### c) Deploy and evaluate your model trained on Studio Data + Team Data in the server (6 Marks).

(This can be done on the day of the Hackathon once the login username and password provided by the mentors in the lab) 

Deploy your model on the server, check the hackathon document (2-Server Access and File transfer For Voice based e-commerce ordering.pdf) for details. 

To order product in user interface, go through the document (3-Hackathon_II Application Interface Documentation.pdf) for details.


**Evaluation Criteria: Four consecutive utterances should be predicted correctly by the model**

- There are two stages in the e-commerce ordering application    
    - Ordering Product
    - Selecting the e-commerce platform
- If both the stages are cleared as per the evaluation criteria you will get
complete marks Otherwise, you will see a reduction in the marks