# Convolutional Neural Networks

Convolutional neural networks are neural network architectures that use *convolutional layers*.

Convolutional layers are essentially parametric sliding windows that filter the input to extract local features.

While in linear layers a different weight is applied to each element of the input, in convolutional layers the same set of weights is applied to different parts of the input.

This allows to extract features that are *position-invariant*.

For instance, if we are processing an image, convolutional layers may find particulars like eyes or hands, regardless of where they are located in the picture.

Here is an example of how a filter of a 2D convolutional layer works.

https://miro.medium.com/max/790/1*1okwhewf5KCtIPaFib4XaA.gif

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch

import torch.nn as nn
import torch.nn.functional as F

from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor

from torch.utils.data import DataLoader

In [None]:
np.random.seed(42)
torch.manual_seed(42)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# hyperparameters
batch_size = 16
lr = 1e-3
n_epochs = 5

In [None]:
datapath = 'data'

# load dataset
data_train = MNIST(
    root = datapath,
    train = True,                         
    transform = ToTensor(), 
)
data_test = MNIST(
    root = datapath, 
    train = False, 
    transform = ToTensor(),
)

In [None]:
train_loader = DataLoader(data_train, batch_size=batch_size, shuffle=True, 
                            pin_memory=True, 
                            num_workers=2
                            )

test_loader = DataLoader(data_test, batch_size=32, shuffle=False, 
                            pin_memory=True, 
                            num_workers=2
                            )

### How to design your convolutional architecture

Differently from linear layers, which can have any size you want, you need to be careful when defining your convolutional layers, since the way the output dimensionality is determined is not as straightforward.

More specifically, the formulae to determine the output height and width ($H_{out}$, $W_{out}$) of a channel are as follows: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html 

However, for the sake of our sanity, in this tutorial we use some simplified formulae, assuming that the kernel, padding, and strides are equal along all dimensions (which is the typical case). Also, we pretend that dilation doesn't exist.

\begin{equation*}
H_{out} = \left\lfloor \dfrac{H_{in} - \text{kernel\_size} +2\times \text{padding}}{\text{stride}} + 1 \right\rfloor
\end{equation*}

\begin{equation*}
W_{out} = \left\lfloor \dfrac{W_{in} - \text{kernel\_size} +2\times \text{padding}}{\text{stride}} + 1 \right\rfloor
\end{equation*}

Let's try with the first layer: we have a $28\times 28$ input image, so the same calculation holds for height and width

\begin{equation*}
\dfrac{28 - 3 + 2\times 1}{1} + 1 = 28
\end{equation*}

So the dimension is kept the same.


In [None]:
class SimpleCNN(nn.Module):
    def __init__(self, n_classes=10):
        super(SimpleCNN, self).__init__()
        self.n_classes = n_classes
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # non parametric
        self.fc1 = nn.Linear(32 * 7 * 7, 32)
        self.fc2 = nn.Linear(32, self.n_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))  # 28x28 --> 28x28
        x = self.pool(x)  # 28x28 --> ?x?
        x = F.relu(self.conv2(x)) # ?x? --> ?x?
        x = self.pool(x)  # ?x? --> 7x7
        x = x.view(-1, 32 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [None]:
model = SimpleCNN(n_classes=10).to(device)
print(model)

In [None]:
def model_accuracy(data_loader):
    n_total = 0
    n_correct = 0

    for x_batch, y_batch in data_loader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)
        logits_batch = model(x_batch)  # model's output scores
        n_total += len(y_batch)
        n_correct += sum(logits_batch.argmax(axis=-1) == y_batch).item()
    return n_correct / n_total

print(f"Train accuracy before training: {model_accuracy(train_loader):.4f}")
print(f"Test accuracy before training: {model_accuracy(test_loader):.4f}")

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

In [None]:
accuracies_train = []

for epoch in range(n_epochs):

    for i, (x_batch, y_batch) in enumerate(train_loader):
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        optimizer.zero_grad()

        logits_batch = model(x_batch)
        loss_batch = loss_fn(logits_batch, y_batch)
        loss_batch.backward()

        optimizer.step()

    # evaluate the model at the end of each epoch
    with torch.no_grad():
        acc_train = model_accuracy(train_loader)

        print(f"[Epoch {epoch+1:03d}] train_acc: {acc_train:.3f}")

        accuracies_train.append(acc_train)


In [None]:
plt.figure()
plt.plot(accuracies_train, '^-', label="Training")
plt.grid(linestyle=':')
plt.ylim([0.8, 1.05])
plt.legend()
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.show()

In [None]:
print(f"Train accuracy after training: {model_accuracy(train_loader):.4f}")
print(f"Test accuracy after training: {model_accuracy(test_loader):.4f}")

In [None]:
torch.save(model.state_dict(), f"saved_models/CNN.pt")