# Deep Learning Exercise: Introduction to Basic Deep Learning Architecture in PyTorch

In this exercise, we will explore the fundamentals of deep learning by building a basic neural network using PyTorch. PyTorch is a popular deep learning framework that provides a flexible and efficient platform for building and training neural networks.

## Objectives

- Understand the basic concepts of deep learning and neural networks.
- Learn how to set up and use PyTorch for building neural networks.
- Implement a simple feedforward neural network.
- Train the neural network on a sample dataset.
- Evaluate the performance of the trained model.

## Prerequisites

Before starting this exercise, you should have a basic understanding of:

- Python programming
- Linear algebra and calculus
- Machine learning concepts

## Steps

1. **Data Preparation**: Load and preprocess the dataset.
2. **Model Definition**: Define the architecture of the neural network.
3. **Loss Function and Optimizer**: Specify the loss function and the optimizer.
4. **Training the Model**: Train the neural network on the training data.
5. **Evaluation**: Evaluate the model's performance on the test data.

By the end of this exercise, you will have a solid understanding of how to build and train a basic neural network using PyTorch. Let's get started!

In [None]:
# Prerequisites
!pip install -q torch torchvision matplotlib


### 1. Data Preparation

In our this small exercise, we will use [Fashion-MNIST](https://arxiv.org/abs/1708.07747) as the example dataset.


In [None]:
# Import needed modules
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import time
import sys

# Download and load the Fashion-MNIST dataset through torchvision.datasets
mnist_train = torchvision.datasets.FashionMNIST(root='./data/FashionMNIST', train=True, download=True, transform=transforms.ToTensor()) # Training dataset
mnist_test = torchvision.datasets.FashionMNIST(root='./data/FashionMNIST', train=False, download=True, transform=transforms.ToTensor()) # Testing dataset

# Take a look at the dataset
print(type(mnist_train))
print(len(mnist_train), len(mnist_test))

feature, label = mnist_train[0]
print(feature.shape, label) # Channel, Height, Width


In [None]:
# There are 10 categories in the Fashion-MNIST dataset, which are:
# 0: T-shirt/top 1: Trouser 2: Pullover 3: Dress 4: Coat 5: Sandal 6: Shirt 7: Sneaker 8: Bag 9: Ankle boot
# Define a function to transform the label index into the corresponding text label
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

# Define a function to visualize the Fashion-MNIST dataset
def show_fashion_mnist(images, labels):

    # Configure the display properties
    from IPython import display
    display.set_matplotlib_formats('svg')

    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

# Display first 10 samples in training dataset
X, y = [], []
for i in range(10):
    X.append(mnist_train[i][0])
    y.append(mnist_train[i][1])
show_fashion_mnist(X, get_fashion_mnist_labels(y))


In [None]:
##################################################################
# TODO: Finish the function to return the dataloader for training
#       and testing dataset according to the example above
###################### YOUR CODE HERE ############################
def load_data_fashion_mnist(batch_size, resize = None, root = "./data/FashionMNIST"):
    trans = []
    # Add any needed transformations
    ### START CODE HERE ###
    trans.append("?")

    ### END CODE HERE ###
    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = "?"
    test_iter = "?"
    return train_iter, test_iter

##################################################################


### 2. Model definition

In this section, we are going to build classification models to be trained on Fashion-MNIST, which are as follows.

- Multilayer Perceptron (28x28, 256, 10)
- CNN ([LeNet](https://ieeexplore.ieee.org/document/726791))

In [None]:
# 1. Multiple layers perceptron

# Define the model parameters size
num_inputs, num_outputs, num_hiddens = 28*28, 10, 256

# Firstly, define the FlattenLayer, which is used to flatten the input tensor
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, x): # x shape: (batch, *, *, ...)
        ##################################################################
        # TODO: Finish the forward function to flatten the input tensor
        ###################### YOUR CODE HERE ############################ 
        pass

        ##################################################################


# Define the multiple layers perceptron model
class MLP(nn.Module):
    def __init__(self, num_inputs, num_outputs, num_hiddens):
        super(MLP, self).__init__()
        ##################################################################
        # TODO: Finish the __init__ function to define the model structure
        ###################### YOUR CODE HERE ############################
        flatten_layer = FlattenLayer()
        self.net = nn.Sequential(
            flatten_layer,
            "?",
            nn.ReLU(),
            "?",
        )
        ##################################################################

    def forward(self, x):
        ##################################################################
        # TODO: Finish the forward function to define the forward pass
        ###################### YOUR CODE HERE ############################
        pass

        ##################################################################


In [None]:
# 2. CNN (LeNet)

# Define the LeNet model
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        ##################################################################
        # TODO: Finish the __init__ function to define the model structure
        ###################### YOUR CODE HERE ############################
        # Convolutional layer
        self.conv = nn.Sequential(
            nn.Conv2d(1, "?", "?"), # in_channels, out_channels, kernel_size
            "?",                    # activation function
            nn.MaxPool2d("?", "?"), # kernel_size, stride
            "?",                    # another Conv2d layer
            "?",                    # activation function
            "?",                    # another MaxPool2d layer
        )
        # Fully connected layer
        self.fc = nn.Sequential(
            "?",
            "?",
            "?",
        )
        ##################################################################

    def forward(self, img):
        ##################################################################
        # TODO: Finish the forward function to define the forward pass
        ###################### YOUR CODE HERE ############################
        pass

        ##################################################################


### 3. Loss function and optimizer

After we define the model, we need to determine loss function and optimizer for next step training.

#### Loss function

  - Cross Entropy Loss

#### Optimizer

  - SGD
  - AdaGrad
  - RMSProp
  - Adam

In [None]:
# 3. Loss function and optimization

# Since we can directly use the built-in loss function and
# optimization function in PyTorch, we don't need to define them here.
# Just call api.

net = LeNet() # or net = MLP(num_inputs, num_outputs, num_hiddens)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.5) # or optimizer = torch.optim.Adam(net.parameters(), lr=0.001)


### 4. Training

Every component for training a classification model is ready. The core part in the project is the training script.

Try to compose them up and kick off the training process!

In [None]:
# 4. Training script

def train(net, train_iter, test_iter, criterion, device, num_epochs, optimizer):
    # Send model to device
    net = net.to(device)
    print("training on ", device)

    # Start training
    for epoch in range(num_epochs):
        # Initialize variables
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        # Iterate through the training dataset
        for X, y in train_iter:
            # Send data to device
            X, y = X.to(device), y.to(device)

            # Forward pass
            y_hat = net(X)

            # Compute loss
            loss = criterion(y_hat, y)

            # Backward pass
            optimizer.zero_grad()

            # Compute gradient
            loss.backward()

            # Update parameters
            optimizer.step()

            # Update variables
            train_l_sum += loss.item()

            # Compute accuracy
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()

            # Update number of samples
            n += y.shape[0]

        # Compute test accuracy
        test_acc = evaluate_accuracy(test_iter, device, net)
        print(f'epoch {epoch + 1}, loss {train_l_sum / n:.4f}, train acc {train_acc_sum / n:.3f}, '
              f'test acc {test_acc:.3f}, time {time.time() - start:.1f} sec')

def evaluate_accuracy(data_iter, device, net):
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            # Send data to device
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
    return acc_sum / n

##################################################################
# TODO: Set the parameters and call the train function
###################### YOUR CODE HERE ############################
num_epochs = "?"
batch_size = "?"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iter, test_iter = load_data_fashion_mnist(batch_size)
train(net, train_iter, test_iter, criterion, device, num_epochs, optimizer)
##################################################################


### 5. Evaluation on Test Set

After training the model, the next step is to evaluate its performance on the test set. This will help us understand how well the model generalizes to unseen data. 

In this section, we will:

- Use the trained model to make predictions on the test set.
- Calculate the accuracy of the model on the test set.
- Visualize some of the test set predictions to get a qualitative sense of the model's performance.

By the end of this section, we will have a clear understanding of the model's performance on the test data.



In [None]:
# 5. Evaluation on Test Set

def evaluate_model(net, test_iter, device):
    net = net.to(device)
    net.eval()  # Set the model to evaluation mode
    test_acc = evaluate_accuracy(test_iter, net)
    print(f'Test accuracy: {test_acc:.3f}')

    # Visualize some predictions
    X, y = next(iter(test_iter))
    X, y = X.to(device), y.to(device)
    y_hat = net(X).argmax(dim=1)
    X, y, y_hat = X.cpu(), y.cpu(), y_hat.cpu()

    # Display the first 10 images and their predicted labels
    show_fashion_mnist(X[:10], get_fashion_mnist_labels(y_hat[:10]))
    print('True labels:', get_fashion_mnist_labels(y[:10]))
    print('Predicted labels:', get_fashion_mnist_labels(y_hat[:10]))

##################################################################
# TODO: Set the parameters and call the evaluate_model function
###################### YOUR CODE HERE ############################
device = "?"
evaluate_model(net, test_iter, device)
##################################################################
