# Assignment 2: Spectrogram classification

**Deadline**: 21/2/24

**Submission: Submit a PDF export of the completed notebook as well as the .ipynb file.**


**General**:
This assignment aims to practice designing and training neural networks. The task the networks solve is “predicting”/”inferring” a signal type from its spectrogram image.
You will explore two neural network architectures. A starter code is provided to help with data processing and make it a bit easier.

You may modify the starter code as you see fit, including changing the signatures of functions and adding/removing helper functions. However, please ensure you adequately explain what you are doing and why.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import collections
import scipy.io
import cv2
from google.colab.patches import cv2_imshow

import torch
import torch.nn as nn
import torch.optim as optim

## Question 1. Data (15%)

With any machine learning problem, the first thing that we would want to do
is to get an intuitive understanding of what our data looks like. Download the file
`Data set` from the course page on Moodle and upload it to Google Drive.
Then, mount Google Drive from your Google Colab notebook:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

# Find the path to the file:
path = '/content/gdrive/My Drive/assignment2' # TODO - UPDATE ME!

### Part (a) -- 3%

Load the training and test data, and separate the training data into training and validation.
Create the NumPy arrays `train_data`, `valid_data`, `test_data`.

  

1.    `data`, all of which should be of shape `[N, 128, 128, 1]`. The dimensions of this NumPy array are as follows:

- `N` - the number of rows allocated to train, valid, or test
- `128` - the height of each spectrogram (i.e., the number of freq. points)
- `128` - the width of each spectrogram (i.e., the number of time samples)
- `1` - the color channels

2.   `labels`, all of which should be of shape `[N,]` The dimensions of this NumPy array are as follows:

- `N` - the number of rows allocated to train, valid, or test







The pixel intensities are stored as an integer between 0 and 255.
Make sure you normalize your images, namely, divide the intensities by 255 so that you have floating-point values between 0 and 1. Then, subtract 0.5
so that the elements of `train_data`, `valid_data` and `test_data` are between -0.5 and 0.5.
**Note that this step actually makes a huge difference in training!**

This function might take a while to run, and it can take several minutes just to load the files from Google Drive. If you want to avoid running this code multiple times, you can save your NumPy arrays and load it later:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html

In [None]:
import glob
from PIL import Image
folder_path = '/content/gdrive/My Drive/assignment2'
def sort_data(folder_path):
  train_path = f'{folder_path}/train_data/*.jpg'
  test_path = f'{folder_path}/test_data/*.jpg'
  train_images = {}
  test_images = {}
  for file in glob.glob(train_path):
      filename = file.split("/")[-1] # get the name of the .png file
      label = filename.split('_')[0] # get the label
      image = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
      img = cv2.resize(image, dsize=(128, 128), interpolation=cv2.INTER_CUBIC)
      img = (img/255) - 0.5
      train_images[filename] = img

  for file in glob.glob(test_path):
      filename = file.split("/")[-1] # get the name of the .png file
      label = filename.split('_')[0] # get the label
      image = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
      img = cv2.resize(image, dsize=(128, 128), interpolation=cv2.INTER_CUBIC)
      img = (img/255) - 0.5
      test_images[filename] = img

  sort_train_dict = dict(sorted(train_images.items()))
  sort_test_dict = dict(sorted(test_images.items()))
  train = np.array(list(sort_train_dict.values()))
  test = np.array(list(sort_test_dict.values()))
  train_label = np.array(list(sort_train_dict.keys()))
  test_labels = np.array(list(sort_test_dict.keys()))
  reindex = np.random.permutation(len(train))
  train = train[reindex]
  train_label = train_label[reindex]
  tr_label, ts_label = [], []
  for ii in range(len(train_label)):
    tr_label.append(train_label[ii].split('_')[0])
  for ii in range(len(test_labels)):
    ts_label.append(test_labels[ii].split('_')[0])
  train_label = np.array(tr_label)
  test_labels = np.array(ts_label)
  train_data, valid_data = train[int(train.shape[0] * 0.15):], train[:int(train.shape[0] * 0.15)]
  train_labels, valid_labels = train_label[int(train_label.shape[0] * 0.15):], train_label[:int(train_label.shape[0] * 0.15)]
  return train_data, train_labels, valid_data, valid_labels, test, test_labels

### Part (b) -- 3%

We want to train a model that determines the signal type from a spectrogram. Therefore, our model will take in a spectrogram image.

Write a function generate_plots() that takes one of the data sets that you produced in part (a), and generates image plots of the different spectrograms with different classes. Your function generate_plots() plots 12 subplots of spectrogram images containing all classes.

Note: While at this stage we are working with NumPy arrays, later on, we will need to convert this NumPy array into a PyTorch tensor with shape [N, 128, 128].

Include the result with your PDF submission.


In [None]:
# Your code goes here



# Run this code, include the result with your PDF submission!!
print(train_data.shape) # if this is [N, 128, 128]
print(generate_plots(train_data).shape) # should be [N, 128, 128]
plt.imshow(generate_plots(train_data[idx*30])) # should show spectrogram, 30 is just an example.
# Please take the first 2 digits of your ID (if both of your ID starts with 0 change it to 1)


### Part (c) -- 3%

Why is it important that our data set will be ***balanced***? In other words, suppose we created
a data set where 99% of the images are of Gaussian spectrogram, and
1% of the images are the other classes. Why could this be a problem?

**Write your explanation here:**

\

\

### Part (d) -- 3%

Our neural network will take as input spectrogram images and predict their class. Since we have four string classes we would want to convert them into numbers, where each number is assigned to each class.

**Complete** the helper function `convert_class_to_number` so that the function output will be a dictionary that assigns a number to each class.
Examples of how this function should operate are detailed in the code below.

You can use the defined `vocab`, `lables2num_vocab`,
and `num2labels_vocab` in your code.

In [None]:
# A list of all the labels in the data set. We will assign a unique
# identifier for each of these labels.
vocab = sorted(list(set([s for s in train_labels]))) # A mapping of index => label (string)
num2labels_vocab = dict(enumerate(vocab)) # A mapping of labels => its index
labels2num_vocab = {word:index for index, word in num2labels_vocab.items()}

def convert_class_to_number(labels):
    """
    This function takes a list of labels
    and returns a new list with the same structure, but where each label
    is replaced by its index in `num2labels_vocab`.

    Example:
    >>> convert_class_to_number([['Pulse', 'SingleFrequency', 'Pulse', 'Gaussian', 'ThreeFrequency'], ['ThreeFrequency', 'Pulse', 'Gaussian', 'SingleFrequency'])
    [[1, 2, 1, 0, 3], [3, 1, 0, 2]]
    """

    # Write your code here



### Part (e) -- 3%
Since the labels in the data are comprised of $4$ distinct classes, our task boils down to classification where the label space $\mathcal{S}$ is of cardinality $|\mathcal{S}|=4$ while our input, which is comprised of spectrograms data, is treated as a vector of size $16384\times 1$.


**Implement** yourself a function `create_onehot`, which takes the data in index notation and outputs it in a one-hot notation.

Start by reviewing the helper function, which is given to you:

In [None]:
def create_onehot(data):
    """
    Convert one batch of data in the index notation into its corresponding onehot
    notation. Remember, the function should work for st.

    input - vector with shape D (1D or 2D)
    output - vector with shape (D,4)
    """

    # Write your code here



## Question 2. Model architecture (30%)

In this part we will look at two model architectures: a MultiLayer Perceptron (MLP) and a Convolutional Neural Network (CNN).

Since the labels are comprised of $4$ distinct classes, our task boils down to classification where the label space $\mathcal{S}$ is of cardinality $|\mathcal{S}|=4$ while our input is treated as a vector of size $128 \times 128$ (i.e., the spectrogram matrix).

We build the model in PyTorch. Since PyTorch uses automatic
differentiation, we only need to write the *forward pass* of our
model.

###Part (a) -- Multy layer perceptron (MLP) (15%)

Please provide a detailed diagram that best describes this model’s architecture. Specify the number of layers, weights,  etc.



This link will help you to understand how to upload an image to the google Colab
[https://medium.com/analytics-vidhya/embedding-your-image-in-google-colab-markdown-3998d5ac2684](https://)

This is an example of how to change the width and height of the image scheme:

`<img src="image path" width="400px height="200px"" />`




In [None]:
class PyTorchMLP(nn.Module):
    def __init__(self, num_hidden=100):
        super(PyTorchMLP, self).__init__()
        self.layer1 = nn.Linear(128*128, num_hidden)
        self.layer2 = nn.Linear(num_hidden, 4)
        self.num_hidden = num_hidden
    def forward(self, inp):
        inp = inp.reshape([-1, 128*128])
        # Note that we will be using the nn.CrossEntropyLoss(), which computes the softmax operation internally, as loss criterion
        hidden = self.layer1(inp)
        output = self.layer2(hidden)
        output = torch.nn.functional.log_softmax(output, dim=1)
        return output

**Please show the model scheme here:**

\

\

###Part (b) -- Convolutional Neural Network (CNN) (15%)


The CNN model is given below. Please provide a detailed diagram that best describes this model's architecture. Specify the number of  layers, kernel size, weights, etc.

This link will help you to understand how to upload an image to the google colab
[https://medium.com/analytics-vidhya/embedding-your-image-in-google-colab-markdown-3998d5ac2684](https://)

This is an example of how to change the width and height of the image scheme:

`<img src="image path" width="400px height="200px"" />`

In [None]:
class CNNChannel(nn.Module):
    def __init__(self, n=8):
        super(CNNChannel, self).__init__()
        self.n = n
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=n, kernel_size=3, stride=1, padding=2)
        self.conv2 = nn.Conv2d(in_channels=n, out_channels=2*n, kernel_size=3, stride=1, padding=2)
        self.conv3 = nn.Conv2d(in_channels=2*n, out_channels=4*n, kernel_size=3, stride=1, padding=2)
        self.conv4 = nn.Conv2d(in_channels=4*n, out_channels=8*n, kernel_size=3, stride=1, padding=2)
        self.fc1 = nn.Linear(5184, 100)
        self.fc2 = nn.Linear(100, 4)

    def forward(self, xs, verbose=False):
      x = np.expand_dims(xs, axis=1)
      x = torch.Tensor(x)
      x = self.conv1(x)
      x - nn.functional.relu(x)
      x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
      x = self.conv2(x)
      x = nn.functional.relu(x)
      x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
      x = self.conv3(x)
      x = nn.functional.relu(x)
      x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
      x = self.conv4(x)
      x = nn.functional.relu(x)
      x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
      x = x.view(x.size(0), -1)
      x = self.fc1(x)
      x = nn.functional.relu(x)
      x = self.fc2(x)
      return x

**Please show the model's scheme here:**

\

\

The function `estimate_accuracy` is written for you. Depending on how you set up your model and training, you may need to modify this function.

In [None]:
def estimate_accuracy(model, data, label, batch_size=100, max_N=100000):
    """
    Estimate the accuracy of the model on the data. To reduce
    computation time, use at most `max_N` elements of `data` to
    produce the estimate.
    """
    model.eval()
    correct = 0
    N = 0
    for i in range(0, data.shape[0], batch_size):
    # get a batch of data
      xt, st = get_batch(data, label, i, i + batch_size, onehot=False)
    # forward pass prediction
      y = model(torch.Tensor(xt))
      y = y.detach().numpy() # convert the PyTorch tensor => numpy array
      pred = np.argmax(y, axis=1)
      true = np.argmax(st, axis=1)
      for ii in range(len(y)):
        if pred[ii] == true[ii]:
          correct += 1
      N += st.shape[0]



      if N > max_N:
        break
    return correct / N

The following function `get_batch` will take as input the whole dataset and output a single batch for the training. The output size of the batch is explained below.

In [None]:
def get_batch(data, label, range_min, range_max, onehot):
    """
    Convert one batch of data into input and output
    data and return the training data (xt, st) where:
     - `xt` is an numpy array of one-hot vectors of shape [batch_size, 128, 128]
     - `st` is either
            - a numpy array of shape [batch_size, 4] if onehot is True,
            - a numpy array of shape [batch_size] containing indicies otherwise

    Preconditions:
     - `data` is a numpy array of shape [N, 128, 128] produced by a call
        to `process_data`
     - range_max > range_min
    """
    xt = data[range_min:range_max]
    st = label[range_min:range_max]
    st = convert_class_to_number(st)
    if onehot:
        st = create_onehot(st).reshape(-1, 4)
    return xt, st

## Question 3. Training (34%)

Now, we will write the functions required to train the PyTorch models using the Adam optimizer and the cross entropy loss.

Our task is a multi-class classification problem. Therefore, we will use a one-hot vector to represent our target.


### Part (a) -- 15%

**Complete** the function `train_model`, and use it to train your PyTorch MLP and CNN models.


Plot the learning curve using the `plot_learning_curve` function provided
to you, and include your plot in your PDF submission.

It is also recommended to checkpoint your model (save a copy) after every epoch.

In [None]:
def train_model(model,
                train_data=train_data,
                train_label=train_labels,
                validation_data=valid_data,
                validation_label=valid_labels,
                batch_size=100,
                learning_rate=0.001,
                weight_decay=0,
                max_iters=1000,
                checkpoint_path=None):

    """
    Train the PyTorch model on the dataset `train_data`, reporting
    the validation accuracy on `validation_data`, for `max_iters`
    iteration.

    If you want to **checkpoint** your model weights (i.e. save the
    model weights to Google Drive), then the parameter
    `checkpoint_path` should be a string path with `{}` to be replaced
    by the iteration count:

    For example, calling

    >>> train_model(model, ...,
            checkpoint_path = '/content/gdrive/My Drive/assignment2/mlp/ckpt-{}.pk')

    will save the model parameters in Google Drive every 100 iterations.
    You will have to make sure that the path exists (i.e. you'll need to create
    the folder Intro_to_Deep_Learning, mlp or cnn, etc...). Your Google Drive will be populated with files:

    - /content/gdrive/My Drive/assignment2/mlp/ckpt-500.pk
    - /content/gdrive/My Drive/assignment2/cnn/ckpt-1000.pk
    - ...

    To load the weights at a later time, you can run:

    >>> model.load_state_dict(torch.load('/content/gdrive/My Drive/assignment2/mlp/ckpt-500.pk'))

    This function returns the training loss, and the training/validation accuracy,
    which we can use to plot the learning curve.
    """
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(),
                           lr=learning_rate,
                           weight_decay=weight_decay)

    iters, losses = [], []
    iters_sub, train_accs, val_accs  = [], [] ,[]

    n = 0 # the number of iterations
    while True:
      reindex = np.random.permutation(len(train_data))
      train_data = train_data[reindex]
      train_labels = train_label[reindex]
      for i in range(0, train_data.shape[0], batch_size):
          if (i + batch_size) > train_data.shape[0]:
              break
          model.train()
          # get the input and targets of a minibatch
          xt, st = get_batch(train_data, train_label, i, i + batch_size, onehot=False)
          # convert from numpy arrays to PyTorch tensors
          xt = torch.Tensor(xt)
          st = torch.Tensor(st)

          zs = ...                 # compute prediction logit
          loss =                   # compute the total loss
          ...                      # compute updates for each parameter
          ...                      # make the updates for each parameter
          ...                      # a clean up step for PyTorch

          # save the current training information
          iters.append(n)
          losses.append(float(loss)/batch_size)  # compute *average* loss
          if n % 100 == 0:
              iters_sub.append(n)
              train_cost = float(loss.detach().numpy())
              train_acc = estimate_accuracy(model, train_data, train_label)
              train_accs.append(train_acc)
              val_acc = estimate_accuracy(model, validation_data, validation_label)
              val_accs.append(val_acc)
              print("Iter %d. [Val Acc %.0f%%] [Train Acc %.0f%%, Loss %f]" % (
                    n, val_acc * 100, train_acc * 100, train_cost))

              if (checkpoint_path is not None) and n > 0:
                  torch.save(model.state_dict(), checkpoint_path.format(n))

          # increment the iteration number
          n += 1

          if n > max_iters:
              return iters, losses, iters_sub, train_accs, val_accs


def plot_learning_curve(iters, losses, iters_sub, train_accs, val_accs):
    """
    Plot the learning curve.
    """
    plt.title("Learning Curve: Loss per Iteration")
    plt.plot(iters, losses, label="Train")
    plt.xlabel("Iterations")
    plt.ylabel("Loss")
    plt.show()

    plt.title("Learning Curve: Accuracy per Iteration")
    plt.plot(iters_sub, train_accs, label="Train")
    plt.plot(iters_sub, val_accs, label="Validation")
    plt.xlabel("Iterations")
    plt.ylabel("Accuracy")
    plt.legend(loc='best')
    plt.show()

### Part (b) -- 15%

Train your models from Questions 2(a) and 2(b). Change the values of a few
hyperparameters, including the learning rate, batch size, choice of $n$ and the kernel size in the CNN model, choice of $num$_$hidden$ in the MLP model. You do not need to check all values for all hyperparameters. Instead, try to make significant changes to see how each change affects your scores
(try to start with finding a reasonable learning rate for each network, then start changing the other parameters).

In this section, explain how you tuned your hyperparameters.

**Write your explanation here:**

\

\

**Include the training curves for the two models:**

In [None]:
pytorch_mlp = PyTorchMLP()
# learning_curve_info = train_model(pytorch_mlp, ...)


# plot_learning_curve(*learning_curve_info)

In [None]:
model_cnn_ch = CNNChannel()
# learning_curve_info = train_model(pytorch_mlp, ...)


# plot_learning_curve(*learning_curve_info)

### Part (c) -- 4%

Include your training curves for the **best** models from each MLP and CNN.
These are the models that you will use in Question 4.

In [None]:
pytorch_mlp = PyTorchMLP()
print('The model we used here is MLP channel model')
learning_curve_info = run_pytorch_gradient_descent(pytorch_mlp,
                                 train_data=train_data,
                                 train_label=train_ts_onehot,
                                 validation_data=valid_data,
                                 validation_label=valid_ts_onehot,
                                 batch_size=100,
                                 learning_rate=0.001,
                                 weight_decay=0,
                                 max_iters=1000,
                                 checkpoint_path=None)


plot_learning_curve(*learning_curve_info)
learning_curve_info = train_model(model_cnn_ch, train_data, valid_data, batch_size=64, learning_rate=0.001, weight_decay=0, max_iters=120, checkpoint_path=None)
plot_learning_curve(*learning_curve_info)

In [None]:
# Include the training curves for the two models.
model_cnn_ch = CNNChannel()
print('The model we used here is CNN model')
learning_curve_info = train_model(model_cnn_ch, train_data, train_ts_onehot, valid_data, valid_ts_onehot,
                                  batch_size=25, learning_rate=0.001, weight_decay=0, max_iters=50, checkpoint_path=None)
plot_learning_curve(*learning_curve_info)

## Question 4. Testing (21%)


### Part (a) -- 7%

Report the test accuracies of your **single best** model,
separately for the test set.
Do this by choosing the model
architecture that produces the best validation accuracy. For instance,
if your model attained the
best validation accuracy in epoch 10, then the weights at epoch 10 is what you should be using
to report the test accuracy.

In [None]:
# Make sure to include the test accuracy in your report!!
# Write your code here:





### Part (b) -- 7%

For each model, display one of the signal spectrograms that your model correctly classified, and one of the signal spectrograms that your model classified incorrectly.

In [None]:
# Make sure to include the test accuracy in your report!!
# Write your code here:





### Part (c) -- 7%

Compare the capacity, the number of layers, and performance of the two architectures, and discuss the advantages and disadvantages between these architectures.

Will one of these models perform better? Explain why.

Is the architecture choices important in machine learning?

**Write your explanation here:**

\

\



# PDF export
To export a PDF of the completed notebook, you might find the following helper functions helpful. Here are some resources for additional learning.

- https://nbconvert.readthedocs.io/en/latest/

- https://nbconvert.readthedocs.io/en/latest/install.html#installing-tex

In [None]:
!sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic

In [None]:
!jupyter nbconvert --to pdf /content/drive/MyDrive/Colab_Notebooks/Assignment2/Assignment2.ipynb
# TODO - UPDATE ME WITH THE TRUE PATH! and UPDATE THE FILE NAME.