# Lab 3: Gesture Recognition using Convolutional Neural Networks

In this lab you will train a convolutional neural network to make classifications on different hand gestures. By the end of the lab, you should be able to:

1. Load and split data for training, validation and testing
2. Train a Convolutional Neural Network
3. Apply transfer learning to improve your model

Note that for this lab we will not be providing you with any starter code. You should be able to take the code used in previous labs, tutorials and lectures and modify it accordingly to complete the tasks outlined below.

### What to submit

Submit a PDF file containing all your code, outputs, and write-up
from parts 1-5. You can produce a PDF of your Google Colab file by
going to **File > Print** and then save as PDF. The Colab instructions
has more information. Make sure to review the PDF submission to ensure that your answers are easy to read. Make sure that your text is not cut off at the margins.

**Do not submit any other files produced by your code.**

Include a link to your colab file in your submission.

Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please complete the assignment and upload your Jupyter Notebook file to Google Colab for submission.

## Colab Link

Include a link to your colab file here

Colab Link: https://colab.research.google.com/drive/1WtDzrUv3uSd5FbHvej-iGRkgYgP3yOhm?usp=sharing

## Dataset

American Sign Language (ASL) is a complete, complex language that employs signs made by moving the
hands combined with facial expressions and postures of the body. It is the primary language of many
North Americans who are deaf and is one of several communication options used by people who are deaf or
hard-of-hearing. The hand gestures representing English alphabet are shown below. This lab focuses on classifying a subset
of these hand gesture images using convolutional neural networks. Specifically, given an image of a hand
showing one of the letters A-I, we want to detect which letter is being represented.

![alt text](https://www.disabled-world.com/pics/1/asl-alphabet.jpg)

## Part B. Building a CNN [50 pt]

For this lab, we are not going to give you any starter code. You will be writing a convolutional neural network
from scratch. You are welcome to use any code from previous labs, lectures and tutorials. You should also
write your own code.

You may use the PyTorch documentation freely. You might also find online tutorials helpful. However, all
code that you submit must be your own.

Make sure that your code is vectorized, and does not contain obvious inefficiencies (for example, unecessary
for loops, or unnecessary calls to unsqueeze()). Ensure enough comments are included in the code so that
your TA can understand what you are doing. It is your responsibility to show that you understand what you
write.

**This is much more challenging and time-consuming than the previous labs.** Make sure that you
give yourself plenty of time by starting early.

### 1. Data Loading and Splitting [5 pt]

Download the anonymized data provided on Quercus. To allow you to get a heads start on this project we will provide you with sample data from previous years. Split the data into training, validation, and test sets.

Note: Data splitting is not as trivial in this lab. We want our test set to closely resemble the setting in which
our model will be used. In particular, our test set should contain hands that are never seen in training!

Explain how you split the data, either by describing what you did, or by showing the code that you used.
Justify your choice of splitting strategy. How many training, validation, and test images do you have?

For loading the data, you can use plt.imread as in Lab 1, or any other method that you choose. You may find
torchvision.datasets.ImageFolder helpful. (see https://pytorch.org/docs/stable/torchvision/datasets.html?highlight=image%20folder#torchvision.datasets.ImageFolder
)

In [None]:
# mount our Google Drive
from google.colab import drive
drive.mount('/content/drive')
!unzip '/content/drive/My Drive/Colab Notebooks/Lab3_Data.zip' -d '/root/datasets'

import time
import os
import numpy as np
import torch

import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, models, transforms
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from torch.utils.data.sampler import SubsetRandomSampler

# I chose to split the data manually by creating 3 folders labelled
# train, validation and test with around 70%, 15% and 15% data respectively.
# The 3 folders are in the main directory and they each include the class
# folders. I chose to do it manually as that is what made most sense
# to me, for the rest of the code, I followed tutorial 3b.

# define training, validation and test data directories
data_dir = '/root/datasets/Lab3_Gestures_Summer'
train_dir = os.path.join(data_dir, 'train')
val_dir = os.path.join(data_dir, 'validation')
test_dir = os.path.join(data_dir, 'test')

# classes are folders in each directory with these names
classes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

# load and transform data using ImageFolder

# resize all images to 224 x 224
data_transform = transforms.Compose([transforms.Resize((224, 224)),
                                      transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize with mean and standard deviation
                                     ])

train_data = datasets.ImageFolder(train_dir, transform=data_transform)
val_data = datasets.ImageFolder(val_dir, transform=data_transform)
test_data = datasets.ImageFolder(test_dir, transform=data_transform)

# print out some data stats
print('Num training images: ', len(train_data))
print('Num validation images: ', len(val_data))
print('Num testing images: ', len(test_data))

# define dataloader parameters
batch_size  = 20
num_workers = 0

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)

# Visualize some sample data

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = next(dataiter)
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, int(20/2), idx+1, xticks=[], yticks=[])
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))
    ax.set_title(classes[labels[idx]])


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Archive:  /content/drive/My Drive/Colab Notebooks/Lab3_Data.zip
replace /root/datasets/Lab3_Gestures_Summer/small_dataset/test/A/568_A_2.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### 2. Model Building and Sanity Checking [15 pt]

### Part (a) Convolutional Network - 5 pt

Build a convolutional neural network model that takes the (224x224 RGB) image as input, and predicts the gesture
letter. Your model should be a subclass of nn.Module. Explain your choice of neural network architecture: how
many layers did you choose? What types of layers did you use? Were they fully-connected or convolutional?
What about other decisions like pooling layers, activation functions, number of channels / hidden units?

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.name = "CNN"
        self.conv1 = nn.Conv2d(3, 16, 4) #there are 3 inputs, RGB
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 4) #output of conv1 is input of conv2
        self.conv3 = nn.Conv2d(32, 64, 4) #same as above
        self.fc1 = nn.Linear(64 * 25 * 25, 512) #output from conv3 flattened
        self.fc2 = nn.Linear(512, 9) #output is 9 cause 9 classes
        #there are 3 convolution that are used to extract spatial
        #features from the input images and 2 fully connected layers
        #that perform classification based on the learned features
        #from the convolution layers. There is a max pooling layer
        #applied after each convolution layer to downsample the
        #feature maps and reduce spatial dimensions while
        #mantaining important features. The hidden units and
        #number of channels was chosen through research
        #to choose numbers that are simple and effective for the task.
        #The values for the first linear layer were found using the
        #equation from tutorial 3a.

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) #the reLU activation function is
        x = self.pool(F.relu(self.conv2(x))) #simple and effective
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 64 * 25 * 25)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

### Part (b) Training Code - 5 pt

Write code that trains your neural network given some training data. Your training code should make it easy
to tweak the usual hyperparameters, like batch size, learning rate, and the model object itself. Make sure
that you are checkpointing your models from time to time (the frequency is up to you). Explain your choice
of loss function and optimizer.

In [None]:
def evaluate(net, loader, criterion):
    """ Evaluate the network on the validation set.

     Args:
         net: PyTorch neural network object
         loader: PyTorch data loader for the validation set
         criterion: The loss function
     Returns:
         err: A scalar for the avg classification error over the validation set
         loss: A scalar for the average loss function over the validation set
     """
    total_loss = 0.0
    total_correct = 0.0
    total_samples = 0
    for i, data in enumerate(loader, 0):
        inputs, labels = data
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_loss += loss.item() * inputs.size(0)
        total_samples += inputs.size(0)
    err = 1.0 - (total_correct / total_samples)
    loss = total_loss / total_samples
    return err, loss

In [None]:
def get_model_name(name, batch_size, learning_rate, epoch):
    """ Generate a name for the model consisting of all the hyperparameter values

    Args:
        config: Configuration object containing the hyperparameters
    Returns:
        path: A string with the hyperparameter name and value concatenated
    """
    path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
                                                   batch_size,
                                                   learning_rate,
                                                   epoch)
    return path

In [None]:
def train_net(train_loader, val_loader, test_loader, net, batch_size=32, learning_rate=0.001, num_epochs=10):
    ########################################################################
    # Train a classifier
    target_classes = ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
    ########################################################################
    # Fixed PyTorch random seed for reproducible result
    torch.manual_seed(1000)
    ########################################################################
    # Define the Loss function and optimizer
    #I decided to use cross entropy as it was used in tutorial for multiclassification
    #and was suggested in the lecture as a good non-regresion loss function for
    #multiple classes. For the optimizer, I chose Adam as it is generally more effective
    #than SGD.
    train_loss = np.zeros(num_epochs)
    val_loss = np.zeros(num_epochs)
    train_err = np.zeros(num_epochs)
    val_err = np.zeros(num_epochs)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=learning_rate)
    ########################################################################

    ########################################################################
    # Train the network
    # Loop over the data iterator and sample a new batch of training data
    # Get the output from the network, and optimize our loss function.
    start_time = time.time()
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        total_train_loss = 0.0
        total_train_err = 0.0
        total_epoch = 0
        for i, data in enumerate(train_loader, 0):
            # Get the inputs
            inputs, labels = data
            # Zero the parameter gradients
            optimizer.zero_grad()
            # Forward pass, backward pass, and optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            # Calculate the statistics
            total_train_err += torch.sum(outputs.argmax(dim=1) != labels).item()
            total_train_loss += loss.item()
            total_epoch += len(labels)
        train_err[epoch] = total_train_err / total_epoch
        train_loss[epoch] = total_train_loss / (i+1)
        val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)
        print(("Epoch {}: Train err: {}, Train loss: {} |"+
               "Validation err: {}, Validation loss: {}").format(
                   epoch + 1,
                   train_err[epoch],
                   train_loss[epoch],
                   val_err[epoch],
                   val_loss[epoch]))
        # Save the current model (checkpoint) to a file
        model_path = get_model_name(net.name, batch_size, learning_rate, epoch)
        torch.save(net.state_dict(), model_path)
    print('Finished Training')
    end_time = time.time()
    elapsed_time = end_time - start_time
    print("Total time elapsed: {:.2f} seconds".format(elapsed_time))
    # Write the train/test loss/err into CSV file for plotting later
    epochs = np.arange(1, num_epochs + 1)
    np.savetxt("{}_train_err.csv".format(model_path), train_err)
    np.savetxt("{}_train_loss.csv".format(model_path), train_loss)
    np.savetxt("{}_val_err.csv".format(model_path), val_err)
    np.savetxt("{}_val_loss.csv".format(model_path), val_loss)

### Part (c) “Overfit” to a Small Dataset - 5 pt

One way to sanity check our neural network model and training code is to check whether the model is capable
of “overfitting” or “memorizing” a small dataset. A properly constructed CNN with correct training code
should be able to memorize the answers to a small number of images quickly.

Construct a small dataset (e.g. just the images that you have collected). Then show that your model and
training code is capable of memorizing the labels of this small data set.

With a large batch size (e.g. the entire small dataset) and learning rate that is not too high, You should be
able to obtain a 100% training accuracy on that small dataset relatively quickly (within 200 iterations).

In [None]:
# define training, validation and test data directories
data_tiny_dir = '/root/datasets/Lab3_Gestures_Summer/small_dataset'
train_tiny_dir = os.path.join(data_tiny_dir, 'train')
val_tiny_dir = os.path.join(data_tiny_dir, 'validation')
test_tiny_dir = os.path.join(data_tiny_dir, 'test')

# load and transform data using ImageFolder

train_tiny_data = datasets.ImageFolder(train_tiny_dir, transform=data_transform)
val_tiny_data = datasets.ImageFolder(val_tiny_dir, transform=data_transform)
test_tiny_data = datasets.ImageFolder(test_tiny_dir, transform=data_transform)

# print out some data stats
print('Num training images: ', len(train_tiny_data))
print('Num validation images: ', len(val_tiny_data))
print('Num testing images: ', len(test_tiny_data))


# prepare data loaders
train_tiny_loader = torch.utils.data.DataLoader(train_tiny_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
val_tiny_loader = torch.utils.data.DataLoader(val_tiny_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)
test_tiny_loader = torch.utils.data.DataLoader(test_tiny_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)


In [None]:
def plot_training_curve(path):
    """ Plots the training curve for a model run, given the csv files
    containing the train/validation error/loss.

    Args:
        path: The base path of the csv files produced during training
    """
    import matplotlib.pyplot as plt
    train_err = np.loadtxt("{}_train_err.csv".format(path))
    val_err = np.loadtxt("{}_val_err.csv".format(path))
    train_loss = np.loadtxt("{}_train_loss.csv".format(path))
    val_loss = np.loadtxt("{}_val_loss.csv".format(path))
    plt.title("Train vs Validation Error")
    n = len(train_err) # number of epochs
    plt.plot(range(1,n+1), train_err, label="Train")
    plt.plot(range(1,n+1), val_err, label="Validation")
    plt.xlabel("Epoch")
    plt.ylabel("Error")
    plt.legend(loc='best')
    plt.show()
    plt.title("Train vs Validation Loss")
    plt.plot(range(1,n+1), train_loss, label="Train")
    plt.plot(range(1,n+1), val_loss, label="Validation")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(loc='best')
    plt.show()

In [None]:
net = CNN()
train_net(train_tiny_loader, val_tiny_loader, test_tiny_loader, net, num_epochs=15)
model_path = get_model_name("CNN", batch_size=32, learning_rate=0.001, epoch=14)
plot_training_curve(model_path)

### 3. Hyperparameter Search [15 pt]

### Part (a) - 3 pt

List 3 hyperparameters that you think are most worth tuning. Choose at least one hyperparameter related to
the model architecture.

In [None]:
#batch size, learning rate and number of hidden layers. The number of hidden laters is related
#to the model architecture.

### Part (b) - 5 pt

Tune the hyperparameters you listed in Part (a), trying as many values as you need to until you feel satisfied
that you are getting a good model. Plot the training curve of at least 4 different hyperparameter settings.

In [None]:
net1 = CNN()
train_net(train_loader, val_loader, test_loader, net1, num_epochs=15)
model_path = get_model_name("CNN", batch_size=32, learning_rate=0.001, epoch=14)
plot_training_curve(model_path)

In [None]:
#I changed the number of epochs since the data was still being overfit
#and the training error was low within 9 epochs
net2 = CNN()
#increase the learning rate from 0.001 to 0.01
train_net(train_loader, val_loader, test_loader, net2, learning_rate=0.01, num_epochs=9)
model_path = get_model_name("CNN", batch_size=32, learning_rate=0.01, epoch=8)
plot_training_curve(model_path)

In [None]:
net3 = CNN()
#increase the batch size from 32 to 64
train_net(train_loader, val_loader, test_loader, net3, batch_size=64, num_epochs=9)
model_path = get_model_name("CNN", batch_size=64, learning_rate=0.001, epoch=8)
plot_training_curve(model_path)

In [None]:
class CNN_W(nn.Module):
    def __init__(self):
        super(CNN_W, self).__init__()
        self.name = "CNN_W"
        self.conv1 = nn.Conv2d(3, 16, 4) #there are 3 inputs, RGB
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 4) #output of conv1 is input of conv2
        self.conv3 = nn.Conv2d(32, 64, 4) #same as above
        self.fc1 = nn.Linear(64 * 25 * 25, 512) #output from conv3 flattened
        self.fc2 = nn.Linear(512, 256) #new hidden layer
        self.fc3 = nn.Linear(256, 9) #output is 9 cause 9 classes

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) #the reLU activation function is
        x = self.pool(F.relu(self.conv2(x))) #simple and effective
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 64 * 25 * 25)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

In [None]:
net4 = CNN_W()
train_net(train_loader, val_loader, test_loader, net4, num_epochs=9)
model_path = get_model_name("CNN_W", batch_size=32, learning_rate=0.001, epoch=8)
plot_training_curve(model_path)

### Part (c) - 3 pt
Choose the best model out of all the ones that you have trained. Justify your choice.

In [None]:
#I would choose the last model with an extra hidden layer since it overfits the data the least.
#But to increase accuracy I would also increase the batch size to 64 since that resulted in
#the highest accuarcy between all the models.

net5 = CNN_W()
train_net(train_loader, val_loader, test_loader, net5, num_epochs=9, batch_size=64)
model_path = get_model_name("CNN_W", batch_size=64, learning_rate=0.001, epoch=8)
plot_training_curve(model_path)

### Part (d) - 4 pt
Report the test accuracy of your best model. You should only do this step once and prior to this step you should have only used the training and validation data.

In [None]:
criterion = nn.CrossEntropyLoss()
error, loss = evaluate(net5, test_loader, criterion)
print("Test Error:", error, "Test Loss:", loss)

### 4. Transfer Learning [15 pt]
For many image classification tasks, it is generally not a good idea to train a very large deep neural network
model from scratch due to the enormous compute requirements and lack of sufficient amounts of training
data.

One of the better options is to try using an existing model that performs a similar task to the one you need
to solve. This method of utilizing a pre-trained network for other similar tasks is broadly termed **Transfer
Learning**. In this assignment, we will use Transfer Learning to extract features from the hand gesture
images. Then, train a smaller network to use these features as input and classify the hand gestures.

As you have learned from the CNN lecture, convolution layers extract various features from the images which
get utilized by the fully connected layers for correct classification. AlexNet architecture played a pivotal
role in establishing Deep Neural Nets as a go-to tool for image classification problems and we will use an
ImageNet pre-trained AlexNet model to extract features in this assignment.

### Part (a) - 5 pt
Here is the code to load the AlexNet network, with pretrained weights. When you first run the code, PyTorch
will download the pretrained weights from the internet.

In [None]:
import torchvision.models
alexnet = torchvision.models.alexnet(pretrained=True)

The alexnet model is split up into two components: *alexnet.features* and *alexnet.classifier*. The
first neural network component, *alexnet.features*, is used to compute convolutional features, which are
taken as input in *alexnet.classifier*.

The neural network alexnet.features expects an image tensor of shape Nx3x224x224 as input and it will
output a tensor of shape Nx256x6x6 . (N = batch size).

Compute the AlexNet features for each of your training, validation, and test data. Here is an example code
snippet showing how you can compute the AlexNet features for some images (your actual code might be
different):

In [None]:
# img = ... a PyTorch tensor with shape [N,3,224,224] containing hand images ...
#features = alexnet.features(img)

**Save the computed features**. You will be using these features as input to your neural network in Part
(b), and you do not want to re-compute the features every time. Instead, run *alexnet.features* once for
each image, and save the result.

In [None]:
features_path = '/content/drive/My Drive/features'

# define dataloader parameters
batch_size  = 1
num_workers = 1

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)

# save features to folders
n=0
for img, label in train_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  folder_name = features_path + '/train/' + str(classes[label])
  if not os.path.isdir(folder_name):
    os.mkdir(folder_name)
  torch.save(features_tensor.squeeze(0), folder_name + '/' + str(n) + '.tensor')
  n+=1

n=0
for img, label in val_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  folder_name = features_path + '/validation/' + str(classes[label])
  if not os.path.isdir(folder_name):
    os.mkdir(folder_name)
  torch.save(features_tensor.squeeze(0), folder_name + '/' + str(n) + '.tensor')
  n+=1

n=0
for img, label in test_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  folder_name = features_path + '/test/' + str(classes[label])
  if not os.path.isdir(folder_name):
    os.mkdir(folder_name)
  torch.save(features_tensor.squeeze(0), folder_name + '/' + str(n) + '.tensor')
  n+=1


In [None]:
train_features_data = torchvision.datasets.DatasetFolder(features_path + '/train', loader=torch.load, extensions=('.tensor'))
val_features_data = torchvision.datasets.DatasetFolder(features_path + '/validation', loader=torch.load, extensions=('.tensor'))
test_features_data = torchvision.datasets.DatasetFolder(features_path + '/test', loader=torch.load, extensions=('.tensor'))

# define dataloader parameters
batch_size  = 32
num_workers = 1

# prepare data loaders
train_features_loader = torch.utils.data.DataLoader(train_features_data, batch_size=batch_size,
                                           num_workers=num_workers, shuffle=True)
val_features_loader = torch.utils.data.DataLoader(val_features_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)
test_features_loader = torch.utils.data.DataLoader(test_features_data, batch_size=batch_size,
                                          num_workers=num_workers, shuffle=True)

### Part (b) - 3 pt
Build a convolutional neural network model that takes as input these AlexNet features, and makes a
prediction. Your model should be a subclass of nn.Module.

Explain your choice of neural network architecture: how many layers did you choose? What types of layers
did you use: fully-connected or convolutional? What about other decisions like pooling layers, activation
functions, number of channels / hidden units in each layer?

Here is an example of how your model may be called:

In [None]:
# features = ... load precomputed alexnet.features(img) ...
#output = model(features)
#prob = F.softmax(output)

In [None]:
class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.name = "AlexNet"
        self.conv1 = nn.Conv2d(256, 64, 2) #the output from features is a 256x6x6
        #we just use the first dimension as the input for the conv1 layer.
        self.pool = nn.MaxPool2d(1, 1)
        self.conv2 = nn.Conv2d(64, 32, 2) #output of conv1 is input of conv2
        self.conv3 = nn.Conv2d(32, 16, 2) #same as above
        self.fc1 = nn.Linear(16 * 3 * 3, 128) #output from conv3 flattened
        self.fc2 = nn.Linear(128, 9) #output is 9 cause 9 classes
        #I used the exact same structure as the CNN network above with some
        #numbers changed to account for the different dimensions.
        #the values for the linear fc1 layer were again calculated
        #using the equation in tutorial 3a. The model worked pretty well earlier
        #making me believe that it would work well for the AlexNet module as well.

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) #the reLU activation function is
        x = self.pool(F.relu(self.conv2(x))) #simple and effective
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 3 * 3)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = x.squeeze(1) # Flatten to [batch_size]
        return x

### Part (c) - 5 pt
Train your new network, including any hyperparameter tuning. Plot and submit the training curve of your
best model only.

Note: Depending on how you are caching (saving) your AlexNet features, PyTorch might still be tracking
updates to the **AlexNet weights**, which we are not tuning. One workaround is to convert your AlexNet
feature tensor into a numpy array, and then back into a PyTorch tensor.

In [None]:
#tensor = torch.from_numpy(tensor.detach().numpy())

net6 = AlexNet()
train_net(train_features_loader, val_features_loader, test_features_loader, net6, num_epochs=9, batch_size=64)
model_path = get_model_name("AlexNet", batch_size=64, learning_rate=0.001, epoch=8)
plot_training_curve(model_path)

### Part (d) - 2 pt
Report the test accuracy of your best model. How does the test accuracy compare to Part 3(d) without transfer learning?

In [None]:
criterion = nn.CrossEntropyLoss()
error, loss = evaluate(net6, test_features_loader, criterion)
print("Test Error:", error, "Test Loss:", loss)

AlexNet had much better accuracy than the CNN. This is because AlexNet has better features that it learned from lots of data. I also tweaked AlexNet a bit to better fit our specific task, which helped improve its performance even more. Overall, I'm really happy with how well the new model worked since it only had an error rate of 7.8%.








In [None]:
%%shell
jupyter nbconvert --to html "/content/lab3.ipynb"