In [1]:
import tensorflow as tf
import numpy as np

Exercise: *Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.*

In [2]:
mnist = tf.keras.datasets.mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = mnist
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


- The author uses a model with 2 convolutional layers, a max pooling layer, then dropout 25%, dense layer, another dropout but with 50%, and the output layer.
- This model reaches about accuracy in the test set.
- This placed the model roughly in the top 20% of the [MNIST Kaggle competition](https://www.kaggle.com/c/digit-recognizer/).
- You should ignore the models with an accuracy greater than 99.79%, which were most likely trained on the test set, as explained by Chris Deotte in [this post](https://www.kaggle.com/c/digit-recognizer/discussion/61480).

In [3]:
# This is the author's model
tf.keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

model = tf.keras.Sequential(
    [
        tf.keras.layers.Conv2D(
            32,
            kernel_size=3,
            padding="same",
            activation="relu",
            kernel_initializer="he_normal",
        ),
        tf.keras.layers.Conv2D(
            64,
            kernel_size=3,
            padding="same",
            activation="relu",
            kernel_initializer="he_normal",
        ),
        tf.keras.layers.MaxPool2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Dense(128, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)
model.compile(
    loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"]
)
model.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=10)
model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.03281183913350105, 0.9914000034332275]

According to the same post (link above), we can achieve a higher accuracy (99.5 to 99.7% on the test set), you need to add image augmentation, batch norm, use a learning rate schedule such as 1-cycle, and possibly create an ensemble.

In [7]:
# This is my model
tf.keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

model = tf.keras.Sequential(
    [
        tf.keras.layers.Conv2D(
            32,
            kernel_size=3,
            padding="same",
            activation="relu",
            kernel_initializer="he_normal",
        ),
        tf.keras.layers.Conv2D(
            64,
            kernel_size=3,
            padding="same",
            activation="relu",
            kernel_initializer="he_normal",
        ),
        tf.keras.layers.MaxPool2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Dense(128, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)
model.compile(
    loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"]
)
model.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=10)
model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.03181219473481178, 0.9915000200271606]

This notebook is trained on Colab, as it take minutes to train on a CPU.