# Introduction

Hello, I'm **Wesley**, nice to meet you! 👋

In this notebook, we will be training **six** different neural network architectures to show how **convolutions** helps improve image classification performance considerably despite using less parameters.

This is inspired by a [convolutional neural network (CNN)](https://en.wikipedia.org/wiki/Convolutional_neural_network) called [LeNet-5](https://paperswithcode.com/method/lenett) 👇, created by one of the AI Godfathers [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun), that popularized the use of [convolutions](https://en.wikipedia.org/wiki/Convolution) in deep learning. 

![Screenshot 2023-07-02 at 11.32.03 PM.png](attachment:95022719-cd55-412c-abf0-add6403ecc12.png)

For fun, I made a *Cat vs Dog Dataset* from [CIFAR10](https://paperswithcode.com/dataset/cifar-10)  to compare these neural networks.

**TLDR**: Convolutions works!

![convolutions-results.png](attachment:982b92b6-58fb-496e-852b-e2ae9ea3b452.png)

If you like this notebook, please **upvote** 👍 so that I know to make more of these fun projects to share with you. **Enjoy!** 😄

# Setup Python Libraries

In [None]:
%%bash
pip install numpy torch torchvision torchsummary datasets tqdm matplotlib --quiet
pip freeze | grep -E '^numpy|^torch|^torchvision|^torchsummary|^datasets|^tqdm|^matplotlib'

# Data Engineer Custom Dataset

## Clone Raw Dataset

In [None]:
# import required libraries
from datasets import load_dataset

In [None]:
# let's clone the cifar10 dataset with 10 classes that has both cats and dogs

raw_datasets = load_dataset('cifar10')
raw_datasets

## Exploratory Data Analysis

In [None]:
# import required libraries

import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# let's see an example of the datset

example = raw_datasets['train'][0]
example

In [None]:
# let's see the PIL image with matplotlib

plt.imshow(example['img'])
plt.show()

In [None]:
# let's see what the label is: 'airplane'

ID2LABEL = {}
LABEL2ID = {}

for i,name in enumerate(raw_datasets['train'].features['label'].names):
    print(f"ID: {i} | LABEL: {name}")
    
    ID2LABEL[i] = name
    LABEL2ID[name] = i

In [None]:
# let's check out more examples

fig, axs = plt.subplots(nrows=3, ncols=3)

fig.suptitle("Train Example Images")

for idx,ax in enumerate(axs.flat):
    img, label_id = raw_datasets['train'][idx].values()
    label = ID2LABEL[label_id]
    ax.axis('off')
    ax.set_title(f"{idx} {label}")
    ax.imshow(img)

## Create Cat vs Dog Dataset

In [None]:
# import required libraries

from collections import Counter
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# for out dataset, we will only keep the longstanding comparison of cat vs. dog

catvsdog_datasets = raw_datasets.filter(lambda x: x['label'] in [LABEL2ID['cat'], LABEL2ID['dog']])
catvsdog_datasets

In [None]:
# let's make sure dog and cats are left in the dataset

fig, axs = plt.subplots(nrows=3, ncols=3)

fig.suptitle("Train Example Images")

for idx,ax in enumerate(axs.flat):
    img, label_id = catvsdog_datasets['train'][idx].values()
    label = ID2LABEL[label_id]
    ax.axis('off')
    ax.set_title(f"{idx} {label}")
    ax.imshow(img)

In [None]:
# let's double check the data imbalance to get a baseline accuracy

label_counts = Counter(catvsdog_datasets['train']['label'])
total_label_count = sum(label_counts.values())
label_percentages = {k:v/total_label_count for k,v in label_counts.items()}

print(f"LABEL_COUNTS: {label_counts}")
print(f"LABEL_PERCENTAGES: {label_percentages}")

In [None]:
# let's make make the label more clearer

label_counts = {ID2LABEL[k]:v for k,v in label_counts.items()}
label_percentages = {ID2LABEL[k]:v for k,v in label_percentages.items()}

print(f"LABEL_COUNTS: {label_counts}")
print(f"LABEL_PERCENTAGES: {label_percentages}")

In [None]:
# let's plot the data imbalance: it's 50-50

fig, axs = plt.subplots(2,figsize=(5,10))

bar0 = axs[0].bar(label_counts.keys(), label_counts.values())
axs[0].bar_label(bar0, label_type='edge')
axs[0].set_title('Count by Image Label')
axs[0].set_xlabel('Image Label')
axs[0].set_ylabel('Count')

bar1 = axs[1].bar(label_percentages.keys(), label_percentages.values())
axs[1].bar_label(bar1, label_type='edge')
axs[1].set_title('Percentages by Image Label')
axs[1].set_xlabel('Image Label')
axs[1].set_ylabel('Percentage')

plt.show()

## Preprocess Datasets for Training

In [None]:
# import required libraries

import numpy as np
import torch
from datasets import DatasetDict

In [None]:
# let's see the seed for reproducibility

SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)

In [None]:
# let's make change the label for cat from 3 to 1 and dog from 5 to 0

def preprocess_labels(example):
    example['label'] =  int(example['label'] == LABEL2ID['cat'])
    return example
    
preprocessed_datasets = catvsdog_datasets.map(preprocess_labels)
preprocessed_datasets

In [None]:
# let's double check an preprocssed example with the label change

print(f"OLD: {catvsdog_datasets['train'][0]}")
print(f"NEW: {preprocessed_datasets['train'][0]}")

In [None]:
# let's split the dataset train-validaiton-test to 66%-16%-16%

_train_validation_datasets = preprocessed_datasets['train'].train_test_split(test_size=0.2, seed=SEED, shuffle=True)

preprocessed_datasets = DatasetDict({
    'train': _train_validation_datasets['train'],
    'validation': _train_validation_datasets['test'],
    'test': preprocessed_datasets['test']
})

preprocessed_datasets

# Setup Training

## Setup Dataloaders

In [None]:
# import required libraries

from torchvision.transforms.functional import to_tensor
from torch.utils.data import DataLoader

In [None]:
# we create this collate function to stack images and labels properly for the dataloaders

def collate_fn(batch):
    # to_tensor converts PIL images into torch tensors
    imgs = torch.stack(list(map(lambda x: to_tensor(x['img']), batch)))
    # we convert labels to be float type for training
    labels = torch.stack(list(map(lambda x: torch.tensor([x['label']], dtype=torch.float32), batch)))
    return imgs, labels

In [None]:
# let's setup the dataloaders with the only train to shuffle during each epoch

dataloaders = {}
for dataset_type in preprocessed_datasets.keys():
    dataloaders[dataset_type] = DataLoader(
        dataset=preprocessed_datasets[dataset_type],
        batch_size=32,
        shuffle=(dataset_type == 'train'),
        collate_fn=collate_fn,
    )

dataloaders

In [None]:
# let's get the current device for training the model
# for this notebook, a gpu is not needed but it'll make it go fast

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

## Setup Training Functions

In [None]:
# import required libraries

from tqdm.notebook import tqdm

In [None]:
# let's make the training function

def train(dataloader, model, loss_fn, optimizer):
    # setup storage variable
    train_loss = 0.
    
    # set to train mode to enable backprop
    model.train()
    for X, y in tqdm(dataloader, leave=False):
        # reset gradients
        optimizer.zero_grad()
        
        # move to current device 
        X, y = X.to(device), y.to(device)
        
        # get prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # backprop and update optimizer
        loss.backward()
        optimizer.step()
        
        # accumulate metrics
        train_loss += loss.item()
          
    # compute metrics
    train_loss /= len(dataloader)
    return {'loss': train_loss}

In [None]:
# let's make the evaluation function

def evaluate(dataloader, model, loss_fn):
    # setup storage variables
    evaluate_loss = 0.
    evaluate_accuracy = 0.
    
    # set to eval mode
    model.eval()
    with torch.no_grad():
        for X, y in tqdm(dataloader, leave=False):
            # move data to current device
            X, y = X.to(device), y.to(device)
            
            # get prediction
            pred = model(X)
            
            # accumulate metrics
            evaluate_loss += loss_fn(pred, y).item()
            evaluate_accuracy += ((pred>0.50) == y).type(torch.float).sum().item()
            
    # compute metrics
    evaluate_loss /= len(dataloader)
    evaluate_accuracy /= len(dataloader.dataset)
    return {'loss': evaluate_loss, 'accuracy': evaluate_accuracy}

In [None]:
def print_model_metrics(epoch, train_metrics, validation_metrics):
    print(f"EPOCH {epoch}", end=" | ")
    print(f"TRAIN LOSS: {train_metrics['loss']:.5f}", end=" | ")
    print(f"VALIDATION LOSS: {validation_metrics['loss']:.5f}", end=" | ")
    print(f"VALIDATION ACCURACY: {validation_metrics['accuracy']:.5f}", end=" | ") 

In [None]:
# this function get the total number of parameters that are trained on

def get_total_parameters(model):
    return sum(layer.numel() for layer in model.parameters() if layer.requires_grad)

In [None]:
# this function plots the model metrics of one neural network

def plot_model_metrics(metrics):
    fig, axs = plt.subplots()

    axs.set_title(f"{metrics['name']}\n({metrics['total_parameters']} parameters)")
    for metric_key in ['train_loss', 'validation_loss', 'validation_accuracy']:
        axs.plot(range(NUM_EPOCHS), metrics[metric_key], label=metric_key)
    axs.legend()
    
    plt.show()

In [None]:
# this function consolidates all model metrics into a nice comparison multiplot

def plot_all_model_metrics(all_model_metrics):
    fig, axs = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=True, figsize=(10,10))

    for i, metrics in enumerate(all_model_metrics):
        ax = axs.flat[i]
        ax.set_title(f"{metrics['name']}\n({metrics['total_parameters']} parameters)")
        for metric_key in ['train_loss', 'validation_loss', 'validation_accuracy']:
            ax.plot(range(NUM_EPOCHS), metrics[metric_key], label=metric_key)
            ax.legend(loc='lower left')
            ax.set(xlabel='Epochs', ylabel='Loss/Accuracy')
            ax.label_outer()

    fig.tight_layout()
    plt.show()

In [None]:
def train_evaluate_plot(model):
    current_model_metrics = {
        'name': model.__class__.__name__,
        'total_parameters': get_total_parameters(model),
        'train_loss': [],
        'validation_loss': [],
        'validation_accuracy': [],
        'test_accuracy': None,
    }

    # train and evaluate model
    for epoch in tqdm(range(NUM_EPOCHS)):
        train_metrics = train(dataloaders['train'], model, loss_fn, optimizer)
        validation_metrics = evaluate(dataloaders['validation'], model, loss_fn)

        # accumulate metrics
        current_model_metrics['train_loss'].append(train_metrics['loss'])
        current_model_metrics['validation_loss'].append(validation_metrics['loss'])
        current_model_metrics['validation_accuracy'].append(validation_metrics['accuracy'])

        # print to notebook
        print_model_metrics(epoch+1, train_metrics, validation_metrics)

    # print test loss
    test_metrics = evaluate(dataloaders['test'], model, loss_fn)
    print(f"TEST ACCURACY: {test_metrics['accuracy']:.5f}")
    current_model_metrics['test_accuracy'] = test_metrics['accuracy']
        
    # plot current model metrics
    plot_model_metrics(current_model_metrics)

    return current_model_metrics

# Train Models

**Goal:**
* We will be training **6** different neural network models with linear and convolutional units on the **Cat vs Dog custom dataset**.

**Model Types:**
1. **LogisticRegressionNetwork**: 1-layer neural network / traditional logistic regression classifier
2. **ShallowLinearNeuralNetwork**: 2-layer linear neural network
3. **DeepLinearNeuralNetwork**: 3-layer linear neural network
4. **ShallowConvolutionalNeuralNetwork**: 2-layer convolutional neural network
5. **DeepConvolutionalNeuralNetwork**: 3-layer convolutional neural network
6. **VeryDeepConvolutionalNeuralNetwork**: 5-layer convolutional neural network with both hidden convolutional and linear units

**Model Architecture Specifications:**
* All activations between layers are non-linear ReLUs
* All convolutional units with same padding will be paired with max pooling 
* Both convolutional and max pooling units with have kernel size 3x3 and stride 1

**Model Training Specifications:**
* Loss function will be Binary Cross Entropy
* Optimizer will be Adam with default learning rate of 0.001 and no weight decay
* No learning schedule used
* We will train for 10 epochs with train shuffling

**Note**
* Although I called them *linear* neural networks, they can learn *non-linear* representations through the *non-linear* ReLU activations

In [None]:
# import required libraries

from torch import nn
from torchsummary import summary

In [None]:
# set constants and variables

NUM_EPOCHS = 10
INPUT_SHAPE = (3,32,32)

# list to keep all model metrics below
all_model_metrics = []

In [None]:
# we will first train a logistic regression to get a baseline performance

class LogisticRegressionNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=3*32*32, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model
model = LogisticRegressionNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

# print model summary
summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results
# we store this results to the all_model_metrics list for later use

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
# we add a hidden linear layer with a non-linear ReLU activation

class ShallowLinearNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=3*32*32, out_features=256),
            nn.ReLU(),
            nn.Linear(in_features=256, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model training
model = ShallowLinearNeuralNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

# print model summary
summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results
# we store this results to the all_model_metrics list for later use

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
# we add another linear layer but decrease its neurons by half

class DeepLinearNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=3*32*32, out_features=256),
            nn.ReLU(),
            nn.Linear(in_features=256, out_features=128),
            nn.ReLU(),
            nn.Linear(in_features=128, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model training
model = DeepLinearNeuralNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

# print model summary
summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results
# we store this results to the all_model_metrics list for later use

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
# we replace the linear layers from previously with one convolutional unit with max pooling and a relu activation

class ShallowConvolutionalNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding='same'),
            nn.MaxPool2d(kernel_size=3),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(in_features=16*10*10, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model training
model = ShallowConvolutionalNeuralNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

# print model summary
summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
# we add another convolutional unit and a relu activation in between and increasing channel size

class DeepConvolutionalNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding='same'),
            nn.MaxPool2d(kernel_size=3),
            nn.ReLU(),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding='same'),
            nn.MaxPool2d(kernel_size=3),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(in_features=32*3*3, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model
model = DeepConvolutionalNeuralNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results
# we store this results to the all_model_metrics list for later use

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
# we add linear units after the convolutional linear units with non-linear relu activations

class VeryDeepConvolutionalNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.stack = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding='same'),
            nn.MaxPool2d(kernel_size=3),
            nn.ReLU(),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding='same'),
            nn.MaxPool2d(kernel_size=3),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(in_features=32*3*3, out_features=256),
            nn.ReLU(),
            nn.Linear(in_features=256, out_features=128),
            nn.ReLU(),
            nn.Linear(in_features=128, out_features=1),
            nn.Sigmoid(),
        )
        
    def forward(self, x):
        return self.stack(x)
    
# setup model
model = VeryDeepConvolutionalNeuralNetwork().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_fn = nn.BCELoss()

# print model summary
summary(model, INPUT_SHAPE)

In [None]:
# let's train the model, evaluate on validation and test datasets, and plot the results
# we store this results to the all_model_metrics list for later use

model_metric = train_evaluate_plot(model)
all_model_metrics.append(model_metric)

In [None]:
plot_all_model_metrics(all_model_metrics)

# Conclusion

Based on the above table of graphs, we can conclude a few things:
1. Neural networks with only linear layers such as **LogisticRegressionNetwork**, **ShallowLinearNeuralNetwork**, and **DeepLinearNeuralNetwork** had a hard time learning image representations based on the marginal improvements in their training loss.
2. Merely using 1 convolution unit with max pooling improved the training loss considerably despite using less parameters then the linear neural networks (**2049 vs 3073, 786945, 819713**), showing that convolutions learn image representations well.
3. Adding more convolutional layers helped improve performance based on the training loss and validation accuracy improvements between **DeepConvolutionalNeuralNetwork** and **ShallowConvolutionalNeuralNetwork**.
4. Adding linear layers after convolutional layers helped imporve performance even more, speeding up reducing in training loss while increasing validation accuracy.
5. Despite having used less parameters with convolutional neural networks than linear neural networks, they increased training speed and validation accuracy.


**Take Away**
* Use **convolutions** in your neural network for learning image classification! 😆