# MNIST with SciKit-Learn and skorch



**Source**  https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/MNIST.ipynb

**Note** In Google Colab, enable GPU using

> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**

In [None]:
! [ ! -z "$COLAB_GPU" ] && pip install torch scikit-learn==0.20.* skorch

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

## Loading Data
Using SciKit-Learns ```fetch_openml``` to load MNIST data. Note: `784` refers to how many pixels each digit is.

In [None]:
mnist = fetch_openml('mnist_784', cache=False)

In [None]:
mnist.data.shape

## Preprocessing Data

Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 (white) and 255 (black), corresponding to the grey-value of a pixel.<br />
The above ```featch_mldata``` method to load MNIST returns ```data``` and ```target``` as ```uint8``` which we convert to ```float32``` and ```int64``` respectively.

In [None]:
XMlp = mnist.data.astype('float32')
y = mnist.target.astype('int64')

To avoid big weights that deal with the pixel values from between [0, 255], we scale `XMlp` down. A commonly used range is [0, 1].

In [None]:
XMlp /= 255.0

In [None]:
XMlp.min(), XMlp.max()

Note: data is not normalized.

In [None]:
XMlp_train, XMlp_test, y_train, y_test = train_test_split(XMlp, y, test_size=0.25, random_state=42)

In [None]:
assert(XMlp_train.shape[0] + XMlp_test.shape[0] == mnist.data.shape[0])

In [None]:
XMlp_train.shape, y_train.shape

### Print a selection of training images and their labels

In [None]:
def plot_example(X, y):
    """Plot the first 5 images and their labels in a row."""
    for i, (img, y) in enumerate(zip(X[:5].reshape(5, 28, 28), y[:5])):
        plt.subplot(151 + i)
        plt.imshow(img)
        plt.xticks([])
        plt.yticks([])
        plt.title(y)

In [None]:
plot_example(XMlp_train, y_train)

## Build Neural Network with PyTorch
Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 and output layer 10 neurons, representing digits 0 - 9.

In [None]:
import torch
from torch import nn
import torch.nn.functional as F

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
mnist_dim = XMlp.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))

In [None]:
mnist_dim, hidden_dim, output_dim

A Neural network in PyTorch's framework.

In [None]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            input_dim=mnist_dim,
            hidden_dim=hidden_dim,
            output_dim=output_dim,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

skorch allows to use PyTorch's networks in the SciKit-Learn setting:

In [None]:
from skorch import NeuralNetClassifier

In [None]:
torch.manual_seed(0)

mlp = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    device=device,
)

In [None]:
mlp.fit(XMlp_train, y_train);

## Prediction

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
y_pred_mlp = mlp.predict(XMlp_test)

In [None]:
accuracy_score(y_test, y_pred_mlp)

An accuracy of about 96% for a network with only one hidden layer is not too bad.

Let's take a look at some predictions that went wrong:

In [None]:
error_mask = y_pred_mlp != y_test

In [None]:
plot_example(XMlp_test[error_mask], y_pred_mlp[error_mask])

# Convolutional Network
PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:
* Batch size
* Number of channel
* Height
* Width

As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28). 

In [None]:
XCnn = XMlp.reshape(-1, 1, 28, 28)

In [None]:
XCnn.shape

In [None]:
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)

In [None]:
XCnn_train.shape, y_train.shape

In [None]:
class Cnn(nn.Module):
    def __init__(self, dropout=0.5):
        super(Cnn, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3)
        self.conv2 = nn.Conv2d(16, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d(p=dropout)
        self.fc1 = nn.Linear(1600, 100) # 1600 = number channels * width * height
        self.fc2 = nn.Linear(100, 10)
        self.fc1_drop = nn.Dropout(p=dropout)

    def forward(self, x):
        x = torch.relu(F.max_pool2d(self.conv1(x), 2))
        x = torch.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        
        # flatten over channel, height and width = 1600
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3))
        
        x = torch.relu(self.fc1_drop(self.fc1(x)))
        x = torch.softmax(self.fc2(x), dim=-1)
        return x

In [None]:
torch.manual_seed(0)

cnn = NeuralNetClassifier(
    Cnn,
    max_epochs=10,
    lr=0.002,
    optimizer=torch.optim.Adam,
    device=device,
)

In [None]:
cnn.fit(XCnn_train, y_train);

In [None]:
y_pred_cnn = cnn.predict(XCnn_test)

In [None]:
accuracy_score(y_test, y_pred_cnn)

An accuracy of >98% should suffice for this example!

Let's see how we fare on the examples that went wrong before:

In [None]:
accuracy_score(y_test[error_mask], y_pred_cnn[error_mask])

Over 70% of the previously misclassified images are now correctly identified.

In [None]:
error_mask = y_pred_cnn != y_test

In [None]:
plot_example(XCnn_test[error_mask], y_pred_cnn[error_mask])

In [None]:
class Cnn2(nn.Module):
    def __init__(self, dropout=0.5):
        super(Cnn2, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=2)
        self.conv_drop = nn.Dropout2d(p=dropout)
        self.conv3 = nn.Conv2d(32, 64, kernel_size=2)
        self.fc1 = nn.Linear(256, 100) # 256 = number channels * width * height
        self.fc2 = nn.Linear(100, 10)
        self.fc1_drop = nn.Dropout(p=dropout)

    def forward(self, x):
        x = torch.relu(F.max_pool2d(self.conv1(x), 2))
        x = torch.relu(F.max_pool2d(self.conv_drop(self.conv2(x)), 2))
        x = torch.relu(F.max_pool2d(self.conv_drop(self.conv3(x)), 2))
        
        # flatten over channel, height and width = 1600
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3))
        
        x = torch.relu(self.fc1_drop(self.fc1(x)))
        x = torch.softmax(self.fc2(x), dim=-1)
        return x

In [None]:
torch.manual_seed(0)

cnn2 = NeuralNetClassifier(
    Cnn2,
    max_epochs=10,
    lr=0.002,
    optimizer=torch.optim.Adam,
    device=device,
)

In [None]:
cnn2.fit(XCnn_train, y_train);

In [None]:
y_pred_cnn2 = cnn2.predict(XCnn_test)

In [None]:
error_mask = y_pred_cnn != y_pred_cnn2

In [None]:
accuracy_score(y_test, y_pred_cnn2)

In [None]:
accuracy_score(y_pred_cnn[error_mask], y_pred_cnn2[error_mask])