# **DLIP Tutorial - PyTorch**

## MNIST Classification using PyTorch
Y.-K. Kim
(updated 2024. 4. 29)



## For CoLab Usage:

1. Download this notebook
2. Then, open in Colab

The purpose of this tutorial is to learn how to build a simple Multi-Layer Percentron (MLP or ANN) for classification of handwritting digits (MNIST)

===================

## Setup Pytorch and Numpy and GPU

In [None]:
import torch 
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

import numpy as np 
import matplotlib.pyplot as plt

print(torch.__version__)

In [None]:
# Select GPU or CPU for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

## Prepare Datasets
### OpenDataset from TorchVision


1. Loading OpenDataset (Fashion MNIST) from Pytorch data
* ``Dataset``:  stores the samples and their corresponding labels
* ``DataLoader`` wraps an iterable around the ``Dataset``.


In [None]:
# Download Dataset from TorchVision MNIST
# Once, downloaded locally, it does not download again.
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),   #converts 0~255 value to 0~1 value.
)

# Download test data from open datasets.
test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

2. Use  ``DataLoader`` to  make dataset iterable.
* supports automatic batching, sampling, shuffling and multiprocess data loading.



In [None]:
# Create DataLoader with Batch size N
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

3. Plot some training data


In [None]:
# Visualize some Datasets
dataiter = iter(train_dataloader)
images, labels = next(dataiter)

figure = plt.figure()
num_of_images = 9
for index in range(num_of_images):
    plt.subplot(3, 3, index+1)
    plt.axis('off')
    plt.title("Ground Truth: {}".format(labels[index]))
    plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')

# Define model

create a class that inherits from nn.Module


* Define the layers of the network in  __init__ function
* Specify Forward network in the **forward function.**

![](https://github.com/bentrevett/pytorch-image-classification/blob/master/assets/mlp-mnist.png?raw=1)


* Image Input: 1x28x28  image
* Flatten into a 1*784 element vector
* 1st Layer: linear to 250 dimensions / ReLU
* 2nd Layer: linear to 100 dim / ReLU
* 3rd Layer: linear to 10 dim / log SoftMax
* Output:  1x10

Actication function: ReLU

**NOTE**


1) nn.Linear(InputDim, OutputDim)


2) x.view( )
* Similar to  NumPy Reshape(). /// [batch size, height * width]

In [None]:
# Model Architecture
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 250)
        self.linear2 = nn.Linear(250, 100)
        self.linear3 = nn.Linear(100, 10)


    def forward(self, x):
        x=self.flatten(x)
        x= F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        y_pred = F.log_softmax(self.linear3(x))
        return y_pred


model = MLP().to(device)
print(model)

## Weight Initialization
 In Keras, dense layers by default uses “glorot_uniform” random initializer, it is also called Xavier normal initializer.

# Optimization Setup  

### Optmizer function
 Gradient descent is the common optimisation strategy used in neural networks. Many of the variants and advanced optimisation functions now are available,
  
- Stochastic Gradient Descent, Adagrade, Adam, etc

### Loss function

1. Linear regression->Mean Squared Error
2. Classification->Cross entropy,

In [None]:
# Loss Function
loss_fn = nn.CrossEntropyLoss()

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

# Train the model
### Define train() function
Reuse this function in other tutorials


In [None]:
# Train Module
def train(dataloader, model, loss_fn, optimizer):
    # Dataset Size
    size = len(dataloader.dataset)
    
    # Model in Training Mode
    model.train()

    running_loss=0.0

    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # zero gradients for every batch
        optimizer.zero_grad()   

        # Compute prediction loss 
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation and Update        
        loss.backward()
        optimizer.step()        

        # Print loss for every 100 batch in an epoch
        running_loss+=loss.item()
        if batch % 100 == 0:
            running_loss=running_loss/100
            current = batch * len(X)
            print(f"loss: {running_loss:>7f}  [{current:>5d}/{size:>5d}]")
            running_loss=0

### Train
Print training process

In [None]:
epochs = 2
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
print("Done!")

# Test the model  ``eval()``

### Define **test()** function
Using ``eval()`` for test. Evaluate mode로 전환
This function can be reused in other tutorials

In [None]:
def test(dataloader, model, loss_fn):
    # Dataset Size
    size = len(dataloader.dataset)

    # Batch Size
    num_batches = len(dataloader)
    
    # Model in Evaluation Mode
    model.eval()

    test_loss, correctN = 0, 0
    
    # Disable grad() computation to reduce memory consumption.
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            
            # Compute average prediction loss 
            pred = model(X)            
            test_loss += loss_fn(pred, y).item()

            # Predict Label
            y_pred=pred.argmax(1);
            correctN += (y_pred == y).type(torch.float).sum().item()
            
    test_loss /= num_batches
    correctN /= size
    print(f"Test Error: \n Accuracy: {(100*correctN):>0.1f}%, Avg loss: {test_loss:>8f} \n")



### Test
Print test data accuracy

In [None]:
test(test_dataloader, model, loss_fn)

### Visualize Evaluation Results

Select random test images and evaluate

In [None]:
# Get some random test  images // BatchSize at a time
dataiter = iter(test_dataloader)
images, labels = next(dataiter)
print(images.size())

# Evaluate mode
# Prediction of some sample images 
images, labels = images.to(device), labels.to(device)
with torch.no_grad():
    pred = model(images)
    _, predicted = torch.max(pred.data, 1)


Plot some test image results

In [None]:
# Plot 
figure = plt.figure()
num_of_images = 9
for index in range(num_of_images):
    plt.subplot(3, 3, index+1)
    plt.axis('off')
    plt.title("Predicted: {}".format(predicted[index].item()))
    plt.imshow(images[index].cpu().numpy().squeeze(), cmap='gray_r')

### Saving Models
(Option 1) Save Model with Shapes
* save the structure of this class together with the model

In [None]:
torch.save(model,"MNIST_model.pth")

(Option 2) Save Model Weight as  state dictionary

In [None]:
torch.save(model.state_dict(), "MNIST_model2.pth")
print("Saved PyTorch Model State")

### Load the pretrained model

(Option 1) Loading a model with structures 

In [None]:
model = torch.load("MNIST_model.pth")
model.eval()

(Option 2) Loading a model includes re-creating the model structure and loading the state dictionary into it.

* Need to `import` or define the Network Structure

In [None]:
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 250)
        self.linear2 = nn.Linear(250, 100)
        self.linear3 = nn.Linear(100, 10)

        
    def forward(self, x):
        x=self.flatten(x)
        x= F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        y_pred = F.log_softmax(self.linear3(x))
        return y_pred

In [None]:
model2 = MLP().to(device)
print(model2)
model2.load_state_dict(torch.load('MNIST_model2.pth'))
model2.eval()

### Test 
Print test data accuracy 

In [None]:
test(test_dataloader, model, loss_fn)

### Visualize test results

Select random test images and evaluate

In [None]:
# Get some random test  images // BatchSize at a time
dataiter = iter(test_dataloader)
images, labels = next(dataiter)
print(images.size())

# Evaluate mode
# Prediction of some sample images 
images, labels = images.to(device), labels.to(device)
with torch.no_grad():
    pred = model(images)
    _, predicted = torch.max(pred.data, 1)

Plot some test image results

In [None]:
figure = plt.figure()
num_of_images = 9
for index in range(num_of_images):
    plt.subplot(3, 3, index+1)
    plt.axis('off')    
    plt.title("Predicted: {}".format(predicted[index].item()))
    plt.imshow(images[index].cpu().numpy().squeeze(), cmap='gray_r')



---



# Exercise


## Exercise 1
Change, activation functions and optimization types for a better output.

## Exercise 2
Rewrite  the above MLP model as the following.

![](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mlp_mnist.png)

