<a href="https://colab.research.google.com/github/voyager2005/deep-learning-for-cv-beginner-articles/blob/main/notebooks/MNIST_Digit_Recognition_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Introduction**

Welcome to our tutorial on digit classification using the MNIST dataset. In our article we have already explained the reason for choosing MNIST for today’s project. Its images being grey scale and small size helps us play around and even write our own digits on and see how well our model works **(we do this in the last section of the colab)**


## **Setup and Installation [don’t skip]**

In this section we are just going to verify if all of the necessary libraries are installed. Google colab typically comes preinstalled with these libraries but well double check everything just to ensure you have an error free journey.Additionally, we'll verify if a GPU is available, as it can significantly speed up our model's training process.

**IMPORTANT:** *You can enable a GPU by going to **Runtime > Change runtime type** and selecting **GPU**.* We advise using the T4 GPU, this is a free GPU available for all Google Colab users and works well.

If you are executing this for the first time it can take anywhere between 30s to 1m


# **Loading and Exploring the MNIST Dataset**

In this section, we will load the dataset using TensorFlow and vialize a few samples to understand its structure
The general strcuture of this is in the form of (x, y, z) where
- x: the number of images available
- y: the height of the image
- z: the width of the image

Inside the code you will also see a variable named images_count, you can change the value set for this image to see examples of images that are available in the data set along with their lables (lables are a fancy way of saying "correct answer")

**IMPORTANT:** keep the value of images_count between 1 and 9, we have made this just to ensure that a large number does not break Google Colab

The Explaination of this code is given below it to get an indepth understanding on what is happening in the code!!

In [None]:
# Importing necessary libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Checking the TensorFlow version
print(f"TensorFlow version: {tf.__version__}")

# Checking if a GPU is available
if tf.config.list_physical_devices('GPU'):
    print("GPU is available and ready to use!")
else:
    print("No GPU available. Using CPU instead.")

In [None]:
# Loading the MNIST dataset from TensorFlow/Keras
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Displaying the shape of the training and test sets
print(f"Training set shape: {x_train.shape}, Labels shape: {y_train.shape}")
print(f"Test set shape: {x_test.shape}, Labels shape: {y_test.shape}")

# Visualizing a few images from the training set
plt.figure(figsize=(10, 10))

# please change this value depending upon the number of samples you want to see
images_count = 9

for i in range(images_count):
    plt.subplot(3, 3, i + 1)
    plt.imshow(x_train[i], cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.show()


### **Code Explanation**

1. **Loading the Dataset**: We use `tf.keras.datasets.mnist.load_data()` to load the dataset, which returns the training and test images along with their labels.

2. **Displaying the Shape**: The code prints out the dimensions of the images and labels, showing how many examples are available in each set.

3. **Visualizing Images**: We use `matplotlib` to display a 3x3 grid of images from the training set with their corresponding labels, giving users a visual sense of the dataset.


# **Preprocessing the Data**

Before training our model, we need to preprocess the MNIST dataset. This involves normalizing\* the pixel values and reshaping\* the data to make it compatible with our neural network.

- **Normalization**: The images currently have pixel values ranging from 0 to 255. We will scale these values to the range 0-1 for faster convergence during training.
- **Reshaping**: The images are in 2D (28x28), but neural networks typically expect an additional channel dimension. We'll reshape the images to add this channel.

Dont worry if this all seems complex, over time you will get an understanding for these and it will seem easier than ever!

In [None]:
# Normalizing the pixel values
x_train = x_train / 255.0
x_test = x_test / 255.0

# Reshaping the images to add a single channel (28, 28) -> (28, 28, 1)
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# One-hot encoding the labels (optional step)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

# Displaying the updated shapes
print(f"Training set shape after preprocessing: {x_train.shape}, Labels shape: {y_train.shape}")
print(f"Test set shape after preprocessing: {x_test.shape}, Labels shape: {y_test.shape}")


# **Building the Neural Network Model**

Now that our data is preprocessed, it’s time to build our neural network model. We will use TensorFlow's Keras API to define a simple feedforward neural network for digit classification.

The architecture will consist of:
- An input layer to receive the image data.
- One or more hidden layers to learn the features.
- An output layer that will classify the images into one of the 10 digit classes (0-9).

Let’s define our model and add the necessary layers.

**There is an explaination for the code below with all the terms and words that are new to us**

In [None]:
# Building the neural network model
model = tf.keras.Sequential()

# Input layer (flattening the input data)
model.add(tf.keras.layers.Flatten(input_shape=(28, 28, 1)))

# Hidden layer (first hidden layer with 128 neurons and ReLU activation)
model.add(tf.keras.layers.Dense(128, activation='relu'))

# Output layer (10 neurons for 10 classes with softmax activation)
model.add(tf.keras.layers.Dense(10, activation='softmax'))

# Displaying the model summary
model.summary()

### Code Explanation

1. **Input Layer**: The `Flatten` layer transforms the 28x28 input images into a 1D array of 784 pixels, making it suitable for the dense layers.

2. **Hidden Layer**: The first `Dense` layer has 128 neurons and uses the ReLU (Rectified Linear Unit) activation function, allowing the model to learn complex features from the data.

3. **Output Layer**: The final `Dense` layer has 10 neurons (one for each digit) and uses the softmax activation function, which outputs probabilities for each class.

4. **Model Summary**: The `model.summary()` function provides a detailed overview of the model architecture, including the number of parameters.


### Understanding Model Compilation

Just like any other programming language, before running the program, the program needs to be compiled. Here before we can train the model we need to prepare it for training. This step is called model compilation.We have already covered what model compilation is, things like loss functions and optimizers in our article.

Without compilation, our model would not know how to learn from the data. It needs these loss functions, optimzers and mertices to improve and track its progress.


In [None]:
# Compiling the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


### Training the Model

Now that our model is compiled, it's time to train the model. By definition, "Training is the process where the model leans from the data". Let’s break down this definition and understand what happens during model training.

1. **Training:**

    The process where we teach our model to recognize patters is called training, here we are just teaching our model to recognize handwritten digits.

2. **How does training work?**

    While training our model we divide our image dataset into two categories the training set and the testing set as you can see in the image below.
    ![Train Test Split](https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-images/4_train-test-split.jpg)
    We use the training set along with the loss function and optimizer to train the model to get better at digit recognition.

3. **EPOCH**

    We divide the training set into several batches, in our example its 10. We do so to make the learning process efficient. One full pass through of the entire training dataset (each batch for us is 1500) is called an epoch. The model might not learn in a single epoch so we use multiple epochs to give it more chances to learn.

4. **fit**

    We use the `fit` function to train the model. This function takes the training data, the labels (correct answers), the number of epochs, and the batch size (how many images to process at a time).






In [None]:
# Training the model
history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluating the model on the test set
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_accuracy:.4f}")


### Visualizing Training History

In [None]:
# Visualizing training history
plt.figure(figsize=(12, 5))

# Plotting training and validation accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

# Plotting training and validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()


# **OUR OWN HANDWRITING!!!**

Now that our model is all trained we can see how well our model works!!! Welcome to the final part of this colab, the hand written digit detection!! In this section you will be able to draw a digit on your computer, upload that and see how well our neural network works.

## How It Works

### Generate and Download a Blank Canvas
I have provided a 28x28 pixel black canvas for you to draw in the code below. This is the size of every image on MNIST and is the ideal canvas to run our tests on.

To get started, follow these steps:

1. **Download the black canvas**: A button will be provided to download the blank canvas image (a `.png` file).
2. **Draw your digit**: Open the downloaded image in any image editor (such as Paint, GIMP, Photoshop, etc.), and use a white brush to draw a digit (0-9) on the black background.
    - Make sure your drawing stays within the 28x28 pixel canvas.
    - White color represents the digit, and the black background should be left untouched.
3. **Save the image** after drawing your digit.

## Tips for Drawing Digits

- Make sure the digit is clearly visible and takes up most of the canvas.
- Keep the background black and the digit white to match the MNIST dataset format.
- If the digit is too small or unclear, the model might have difficulty recognizing it.

I hope you enjoy experimenting with your custom canvas, if you have any issues in understanding how this works, here below is a video of me running this on my laptop so you can just follow along and keep having fun

In [None]:
import numpy as np
import cv2
from google.colab.patches import cv2_imshow
from google.colab import files

# Create a black 28x28 image (same format as MNIST)
black_canvas = np.zeros((28, 28), dtype=np.uint8)

# Save the black canvas image to disk
cv2.imwrite('/content/black_digit_image.png', black_canvas)

# Display the black canvas in the notebook
cv2_imshow(black_canvas)

# Download button for the image
def download_canvas():
    files.download('/content/black_digit_image.png')

# Create a button to download the black canvas
import ipywidgets as widgets
download_button = widgets.Button(description="Download Black Canvas")
download_button.on_click(lambda x: download_canvas())
display(download_button)


## Upload Your Custom Digit Image
Once you've drawn your digit on the canvas:

1. Run the code
2. Click the **Choose Files** provided in the notebook.
3. Select the image with your custom handwritten digit that you just saved.

In [None]:
from google.colab import files
from tensorflow.keras.preprocessing import image
import cv2

# Upload custom handwritten digit image
uploaded = files.upload()

# Load and preprocess the uploaded image
for fn in uploaded.keys():
    # Load image
    img = cv2.imread(fn, cv2.IMREAD_GRAYSCALE)
    img = cv2.resize(img, (28, 28))  # Resize to 28x28
    img = img / 255.0  # Normalize the image
    img = img.reshape(1, 28, 28, 1)  # Reshape to (1, 28, 28, 1)

    # Predict the class (digit)
    prediction = np.argmax(model.predict(img), axis=-1)
    print(f'Predicted digit: {prediction[0]}')

# Model Limitations

Even though our model does a good job recognizing handwritten digits, there are a few things it struggles with:

1. **It Only Knows What It’s Seen**: Our model has been trained only with MNIST images, which are simple, clean, and black-and-white. If you show it something different, like colorful or messy digits, it might get confused. It’s like learning to read with only one type of handwriting—seeing a different style can throw you off!

2. **Too Focused on the Practice Test**: Sometimes, our model does really well on the training data but doesn’t do as great when faced with new, real-world examples. It's like studying hard for one test but struggling when the questions change even a little.

3. **Limited Skillset**: This model is a "digit expert"—it's really only trained for numbers. So, if you ask it to recognize anything other than digits, it won’t know what to do. Imagine trying to use a calculator for writing an essay; it just doesn't fit the job!

4. **Takes a Lot of Brainpower**: Running and training models like this can be demanding for computers, making it slower on devices with less power. Think of it like trying to play a video game on an old phone—it’s not as smooth or fast.

5. **It Can Be Unfair**: The dataset we used is based on specific writing styles. So, if someone writes differently or in a script the model isn’t familiar with, it might not work well. It’s like not understanding someone because they have a different accent.

