# Training an Image Classifier on the MNIST Dataset

Two sentence description of the MNIST dataset.

Run the cell below to import the necessary modules and libraries.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import random_split

  from .autonotebook import tqdm as notebook_tqdm


First, let's create our classifier. 
- Create a class called `ImageClassifier` that inherits from `torch.nn.Module`.
- Make a simple two-layer network inside the class constructor. The input linear layer should have input size appropriate to a 28x28 pixel image, and an output size of 128.
- The output linear layer should have an output size of 10, reflecting the number of classes in the `MNIST` dataset.
- The two linear layers should be connected by an activation layer.
- Don't forget to add inheritance from `nn.Module` by calling the `super` constructor.
- Create the `forward` method.

In [2]:
# Define the model
class SimpleClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Next we create our image transform, and load the dataset. We can quickly load the dataset from the `torchvision.datasets` module as follows:

In [3]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

mnist_dataset = datasets.MNIST(root='mnist', train=True, download=True, transform=transform)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 13358014.44it/s]


Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 50663192.73it/s]


Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 11738642.05it/s]


Extracting mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 9937678.02it/s]

Extracting mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw






Now we need to perform a split on the data so that we can train our model and evaluate it. 

- Split the dataset into a training set comprising 80% of the data, and a test set comprising 20% of the data. Call these subsets `train_set` and `test_set`.
- Assign each split to its own dataloader, called `train_loader` and `test_loader` respectively. Set `shuffle=True` for the train loader.

In [7]:
train_set_len = round(0.8*len(mnist_dataset))
test_set_len = len(mnist_dataset) - train_set_len
split_lengths = [train_set_len, test_set_len]

train_set, test_set = random_split(mnist_dataset, split_lengths)

train_loader = torch.utils.data.DataLoader(mnist_dataset, batch_size=4, shuffle=True)
test_loader= torch.utils.data.DataLoader(mnist_dataset, batch_size=4)

To get everything ready for training, we need to initialise the model, an optimiser and a criterion. In the code block below, initialise an instance of your model class, as well as an optimiser for Stochastic Gradient Descent (SGD), and an appropriate loss criterion.

In [8]:
model = SimpleClassifier()
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()


Create the training loop inside a function called `train`.

In [9]:
# Train the model

for epoch in range(10):
    running_loss = 0.0
    for images, labels in test_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(running_loss)
    print(f'Epoch [{epoch + 1}/10], Loss: {running_loss / len(test_loader)}')

print('Finished Training')

4699.049871561532
Epoch [1/10], Loss: 0.3132699914374355
2282.977047286142
Epoch [2/10], Loss: 0.15219846981907612
1677.8816373680793
Epoch [3/10], Loss: 0.11185877582453863
1343.5166016862522
Epoch [4/10], Loss: 0.08956777344575015
1126.6610436426304
Epoch [5/10], Loss: 0.07511073624284202
968.6108782132914
Epoch [6/10], Loss: 0.06457405854755276
834.7787065694606
Epoch [7/10], Loss: 0.05565191377129738
720.5452429215834
Epoch [8/10], Loss: 0.04803634952810556
635.5010533209435
Epoch [9/10], Loss: 0.042366736888062896
556.8305372810122
Epoch [10/10], Loss: 0.03712203581873415
Finished Training


Now let's see how the model performs on an example from the testing set. 

In [19]:
import numpy as np

features,label=test_set[1]
model.eval()
logits=model(features)

softmax=torch.nn.Softmax()
prediction=np.argmax(softmax(logits).detach().numpy())

print('predicted label:')
print(prediction)
print('real label')
print(label)



predicted label:
5
real label
5


  prediction=np.argmax(softmax(logits).detach().numpy())
