# Neural Networks

In this notebook, you will classify handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using (i) a softmax regression model (as a baseline) and (ii) a convolutional neural network.

## Setup

In [1]:
from tqdm import tqdm

import torch
import torch.nn as nn
from torchvision.datasets import MNIST
from torchvision import transforms as tr
from torchdata.datapipes.map import SequenceWrapper
from torchdata.datapipes.iter import IterDataPipe


class SoftmaxRegression(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=28 * 28, out_features=10),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.layers(x)


class NeuralNetwork(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
            nn.Flatten(),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=1024, out_features=1024),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=1024, out_features=10),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.layers(x)


def create_datapipe(train: bool, batch_size: int, buffer_size: int = 10000) -> tuple[IterDataPipe, IterDataPipe]:
    datapipe = (
        SequenceWrapper(MNIST(root="mnist", train=train, download=True))
        .to_iter_datapipe()
    )
    if buffer_size is not None:
        datapipe = datapipe.cycle().shuffle(buffer_size=buffer_size)

    images, labels = datapipe.unzip(sequence_length=2)
    images = (
        images
        .map(lambda image: tr.ToTensor()(image))
        .batch(batch_size=batch_size)
    )
    labels = labels.batch(batch_size=batch_size)

    return images, labels

  from .autonotebook import tqdm as notebook_tqdm


## Training and evaluating the model

In [None]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = NeuralNetwork().to(device)
# model = SoftmaxRegression().to(device)
optimizer = torch.optim.Adam(model.parameters())
loss_function = nn.CrossEntropyLoss(reduction="mean")

n_batches = 1000
batch_size = 100

### Train model

In [None]:
datapipe_train = create_datapipe(train=True, batch_size=batch_size)

model.train()

batch = 0
for images, labels in tqdm(zip(*datapipe_train), desc="batch", leave=False):
    images = torch.stack(images).to(device=device)
    labels = torch.Tensor(labels).to(dtype=torch.long, device=device)

    optimizer.zero_grad()
    predicted_labels = model(images)
    loss = loss_function(predicted_labels, labels)
    loss.backward()
    optimizer.step()

    batch += 1
    if batch == n_batches:
        break

### Evaluate model

In [None]:
datapipe_test = create_datapipe(train=False, batch_size=batch_size, buffer_size=None)

model.eval()
correct_predictions = 0
total = 0
for images, labels in tqdm(zip(*datapipe_test), desc="batch", leave=False):
    images = torch.stack(images).to(device=device)
    labels = torch.Tensor(labels).to(dtype=torch.long, device=device)

    predicted_labels = model(images)
    correct_predictions += (predicted_labels.argmax(dim=-1) == labels).sum()
    total += len(predicted_labels)

accuracy = correct_predictions / total
print(f"accuracy = {accuracy:.03}")

## Questions

1. What do you expect the accuracy of an untrained model to be on the MNIST dataset? Why? Try evaluating an untrained model. What is the accuracy you obtain? (3 points)
2. What is the loss function used here? Describe it. Why is it appropriate for a classification task? (4 points)
3. Try using a neural network with only one convolutional layer (the `NeuralNetwork` class provided has two: remove the second one!) You will have to adjust the number of input features to your first linear layer. Do you find a accuracy difference between one-layer convolutional network and a two-layer convolutional network? Explain why/why not. (6 points)
4. Try using the simple softmax regression model. Compare the results to those from the convolutional neural network. (2 points)
5. Try changing the number of training batches for the softmax regression model (try 10, 100, 1000, and 10,000). How does the accuracy change? (5 points)