# MNIST Handwritten Digits Classification Tutorial

In this tutorial, we will work on the famous MNIST dataset, which contains images of handwritten digits from 0 to 9. The goal is to build a machine learning model that can recognize and classify the digits accurately.

## Plan
1. **Introduction**
    * Brief description of the MNIST dataset
    * Objective of the tutorial

2. **Loading and Exploring the Data**
    * Loading the MNIST dataset
    * Visualizing the data
    * Analyzing the distribution of digits in the dataset
3. **Data Preprocessing**
    * Normalizing the pixel values
    * One-hot encoding the labels
    * Splitting the dataset into training and testing sets
4. **Building the Model**
    * Selecting a suitable machine learning algorithm
    * Implementing the model using scikit-learn
5. **Training the Model**
    * Training the model on the training dataset
    * Evaluating the performance using cross-validation
6. **Model Evaluation**
    * Assessing the performance of the model on the test dataset
    * Analyzing the confusion matrix
    * Visualizing the misclassified digits
7. **Improving the Model**
    * Feature engineering techniques
    * Hyperparameter tuning
    * Implementing a more complex model (e.g., Convolutional Neural Network)
8. **Deploying the Model**
    * Saving the trained model
    * Loading the model for inference
    * Creating a simple application for digit recognition
9. **Conclusion**
    * Recap of the tutorial
    * Potential applications of digit recognition
    * Further resources and recommendations





## 1. Introduction
In this tutorial, we will be working with the famous MNIST (Modified National Institute of Standards and Technology) dataset. The MNIST dataset is a large database of handwritten digits, containing 60,000 training samples and 10,000 testing samples. Each sample is a grayscale image of size 28x28 pixels, representing a digit from 0 to 9.

![MNIST Sample](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

*Sample images from the MNIST dataset.*

The objective of this tutorial is to guide you through the process of building a machine learning model that can recognize and classify these handwritten digits accurately. We will explore various techniques for data preprocessing, model building, training, and evaluation. By the end of this tutorial, you will have a solid understanding of how to work with image data and build a classifier for the MNIST dataset.

## 2. Loading and Exploring the Data
### 2.1. Loading the MNIST dataset
The MNIST dataset is widely used in machine learning and can be easily loaded using popular libraries like *TensorFlow* or *scikit-learn*. In this tutorial, we will use *scikit-learn* to load the dataset.

In [None]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]

#### Exercise 1
Try loading the MNIST dataset using TensorFlow. Look up the relevant function in the *TensorFlow* documentation.

In [None]:
# Hint: Import TensorFlow, then use the `load_data()` function from the `tensorflow.keras.datasets.mnist` module to load the MNIST dataset.
import tensorflow as tf

# Code sketch:
# (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()


## 2.2. Visualizing the data
Let's visualize some of the handwritten digits in the dataset to get a better understanding of the data.

In [None]:
import matplotlib.pyplot as plt

def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap='binary')
    plt.axis('off')

# Plot a single digit
plot_digit(X[0])
plt.show()

#### Exercise 2
Visualize 5 random digits from the dataset in a single plot.

In [None]:
import numpy as np

# Hint: Generate 5 random indices using `np.random.randint()`. Create a 1x5 subplots using `plt.subplots()`, and use a for loop to call the `plot_digit()` function for each index and corresponding subplot.
# Code sketch:
# rand_indices = np.random.randint(0, len(X), 5)
# fig, axes = plt.subplots(1, 5, figsize=(15, 3))
# for i, ax in zip(rand_indices, axes):
#     plot_digit(X[i], ax)
# plt.show()


### 2.3. Analyzing the distribution of digits in the dataset
It's important to analyze the distribution of the target variable (in this case, the digits) to ensure that the dataset is balanced and representative of all classes.

In [None]:
import seaborn as sns

sns.countplot(y)
plt.xlabel('Digits')
plt.ylabel('Frequency')
plt.title('Distribution of Digits in the MNIST Dataset')
plt.show()

This will display a bar plot showing the distribution of the digits in the dataset.

#### Exercise 3
Calculate the percentage of each digit in the dataset.

In [None]:
import pandas as pd

# Hint: Convert the target variable `y` to a pandas Series, use the `value_counts()` method with the `normalize` parameter set to True, and multiply the result by 100.
# Code sketch:
# y_series = pd.Series(y)
# percentages = y_series.value_counts(normalize=True) * 100
# print(percentages)

## 3. Data Preprocessing
### 3.1. Normalizing the pixel values
To improve the performance of our machine learning models, we can normalize the pixel values of our images. Normalization rescales the pixel values to a range of [0, 1].

In [None]:
# Normalize the pixel values
X_normalized = X / 255.0

#### Exercise 4
Try normalizing the pixel values using different methods, such as min-max scaling or standard scaling.



In [None]:
# Hint: For min-max scaling, use (X - X.min()) / (X.max() - X.min()). For standard scaling, use (X - X.mean()) / X.std().
# Code sketch:
# Min-max scaling:
# X_min_max_scaled = (X - X.min()) / (X.max() - X.min())

# Standard scaling:
# X_standard_scaled = (X - X.mean()) / X.std()

### 3.2. One-hot encoding the labels
We need to convert the labels to one-hot encoded vectors to use them in our machine learning model.

In [None]:
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
y_one_hot = encoder.fit_transform(y.reshape(-1, 1))

#### Exercise 5
Try using TensorFlow's '**to_categorical()**' function to convert the labels to one-hot encoded vectors.

In [None]:
# Hint: Import the necessary function from `tensorflow.keras.utils`, then call the function with the `y` variable as its argument.
# Code sketch:
# from tensorflow.keras.utils import to_categorical
# y_one_hot_tf = to_categorical(y)

### 3.3. Splitting the dataset into training and testing sets
To evaluate our model's performance, we will split the dataset into training and testing sets.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_normalized, y_one_hot, test_size=0.2, random_state=42)

#### Exercise 6
Split the dataset into training, validation, and testing sets, with a 60-20-20 distribution.

In [None]:
# Hint: Call the `train_test_split()` function twice, first to split the data into a 60-40 distribution, and then split the 40% portion into equal parts for validation and testing.
# Code sketch:
# X_train_val, X_test, y_train_val, y_test = train_test_split(X_normalized, y_one_hot, test_size=0.2, random_state=42)
# X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=42)

## 4. Building the Model
### 4.1.Selecting a suitable machine learning algorithm
For the MNIST handwritten digits classification, we will use a simple yet effective machine learning algorithm: the Support Vector Machine (SVM). SVMs are suitable for this task because they can handle multi-class classification problems and work well with high-dimensional data.

### 4.2. Implementing the model using scikit-learn
To implement the SVM model, we will use the '**SVC**' class from the '**sklearn.svm**' module. We will train the model using the '**fit**' method, providing the training data as input.

In [None]:
from sklearn.svm import SVC

# Create the SVM model
svm_model = SVC(random_state=42)

# Train the model
svm_model.fit(X_train, y_train)

#### Exercise 7
Try different kernel functions, such as linear, polynomial, and sigmoid, and compare their performance.

In [None]:
# Hint: Create new instances of the `SVC` class, providing the `kernel` parameter with the desired kernel function, and train the model.
# Code sketch:
# Linear kernel:
# svm_linear = SVC(kernel="linear", random_state=42)
# svm_linear.fit(X_train, y_train)

# Polynomial kernel:
# svm_poly = SVC(kernel="poly", random_state=42)
# svm_poly.fit(X_train, y_train)

# Sigmoid kernel:
# svm_sigmoid = SVC(kernel="sigmoid", random_state=42)
# svm_sigmoid.fit(X_train, y_train)

#### Exercise 8
Train a Random Forest Classifier and compare its performance with the SVM model.

In [None]:
# Hint: Import the `RandomForestClassifier` class from `sklearn.ensemble`, create an instance of the classifier, and train the model.
# Code sketch:
# from sklearn.ensemble import RandomForestClassifier
# rf_model = RandomForestClassifier(random_state=42)
# rf_model.fit(X_train, y_train)

## 5. Training the Model
### 5.1. Training the model on the training dataset
Now that we have our model defined, it's time to train it on the training dataset. We will use the '**fit**' method of the '**SVC**' class to train the model.

In [None]:
# Train the model
svm_model.fit(X_train, y_train)

### 5.2. Evaluating the performance using cross-validation
To evaluate the performance of our model, we can use cross-validation. Cross-validation involves splitting the dataset into several folds and training and evaluating the model on each fold. We will use the cross_val_score function from the sklearn.model_selection module to perform cross-validation.

In [None]:
from sklearn.model_selection import cross_val_score

# Perform 5-fold cross-validation on the SVM model
cv_scores = cross_val_score(svm_model, X_train, y_train, cv=5)

# Print the mean and standard deviation of the cross-validation scores
print(f"Mean cross-validation score: {cv_scores.mean():.4f}")
print(f"Standard deviation of cross-validation scores: {cv_scores.std():.4f}")

#### Exercise 9
Evaluate the performance of the models with different kernel functions and the Random Forest Classifier using cross-validation.

In [None]:
# Hint: Use the `cross_val_score` function to compute the cross-validation scores for each model.
# Code sketch:
# Linear kernel:
# cv_scores_linear = cross_val_score(svm_linear, X_train, y_train, cv=5)

# Polynomial kernel:
# cv_scores_poly = cross_val_score(svm_poly, X_train, y_train, cv=5)

# Sigmoid kernel:
# cv_scores_sigmoid = cross_val_score(svm_sigmoid, X_train, y_train, cv=5)

# Random Forest Classifier:
# cv_scores_rf = cross_val_score(rf_model, X_train, y_train, cv=5)

#### Exercise 10
Experiment with different values of the '**C**' parameter for the SVM model and observe the effect on the cross-validation scores.

In [None]:
# Hint: Create new instances of the `SVC` class, providing different values for the `C` parameter, and compute the cross-validation scores.
# Code sketch:
# svm_c1 = SVC(C=1, random_state=42)
# cv_scores_c1 = cross_val_score(svm_c1, X_train, y_train, cv=5)

# svm_c10 = SVC(C=10, random_state=42)
# cv_scores_c10 = cross_val_score(svm_c10, X_train, y_train, cv=5)

# svm_c100 = SVC(C=100, random_state=42)
# cv_scores_c100 = cross_val_score(svm_c100, X_train, y_train, cv=5)

## 6. Model Evaluation
### 6.1. Assessing the performance of the model on the test dataset
After training the model and evaluating it using cross-validation, it's time to assess its performance on the test dataset. We will use the '**predict**' method of the '**SVC**' class to make predictions on the test dataset and then compute the accuracy of these predictions.

In [None]:
from sklearn.metrics import accuracy_score

# Make predictions on the test dataset
y_pred = svm_model.predict(X_test)

# Compute the accuracy of the predictions
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test accuracy: {test_accuracy:.4f}")

In [None]:
from sklearn.metrics import accuracy_score

# Make predictions on the test dataset
y_pred = svm_model.predict(X_test)

# Compute the accuracy of the predictions
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test accuracy: {test_accuracy:.4f}")

### 6.2. Analyzing the confusion matrix
A confusion matrix can help us understand the performance of the classifier in more detail. We will use the '**confusion_matrix**' function from the '**sklearn.metrics**' module to compute the confusion matrix.

In [None]:
from sklearn.metrics import confusion_matrix

# Compute the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Print the confusion matrix
print(cm)

### 6.3. Visualizing the misclassified digits
To better understand the misclassifications, we can visualize some of the misclassified digits.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Find the misclassified digits
misclassified_indices = np.where(y_test != y_pred)

# Plot the first 9 misclassified digits
fig, axes = plt.subplots(3, 3, figsize=(8, 8))
for i, ax in enumerate(axes.flat):
    index = misclassified_indices[0][i]
    ax.imshow(X_test[index].reshape(28, 28), cmap='gray')
    ax.set_title(f"True: {y_test[index]}, Predicted: {y_pred[index]}")
    ax.axis('off')
plt.tight_layout()
plt.show()


#### Exercise 11
Compute the precision, recall, and F1-score for each class using the '**classification_report**' function from the '**sklearn.metrics**' module.

In [None]:
# Hint: Use the `classification_report` function to compute the precision, recall, and F1-score for each class.
# Code sketch:
# from sklearn.metrics import classification_report
# report = classification_report(y_test, y_pred)
# print(report)

#### Exercise 12
Visualize the misclassified digits for a different classifier, such as the Random Forest Classifier, and compare the types of errors made by the two classifiers.

In [None]:
# Hint: Train a Random Forest Classifier, make predictions on the test dataset, find the misclassified digits, and plot them.
# Code sketch:
# Train a Random Forest Classifier
# rf_model.fit(X_train, y_train)

# Make predictions on the test dataset
# y_pred_rf = rf_model.predict(X_test)

# Find the misclassified digits
# misclassified_indices_rf = np.where(y_test != y_pred_rf)

# Plot the first 9 misclassified digits for the Random Forest Classifier
# (use the code from the "Visualizing the misclassified digits" section, replacing the misclassified_indices variable with misclassified_indices_rf)

## 7. Improving the Model
### 7.1. Feature engineering techniques
Feature engineering is the process of creating new features or transforming existing features to improve the performance of a machine learning model. Some common techniques for feature engineering with image data include:
* Image augmentation (rotations, flips, shifts, etc.)
* Dimensionality reduction (e.g., PCA)

In [None]:
# Example of image augmentation using rotation
from scipy.ndimage import rotate

# Rotate an image by 15 degrees clockwise
rotated_image = rotate(X_train[0].reshape(28, 28), -15, reshape=False)

# Plot the original and rotated image
plt.subplot(121)
plt.imshow(X_train[0].reshape(28, 28), cmap='gray')
plt.title("Original Image")
plt.axis('off')

plt.subplot(122)
plt.imshow(rotated_image, cmap='gray')
plt.title("Rotated Image")
plt.axis('off')

plt.tight_layout()
plt.show()


### 7.2. Hyperparameter tuning
Hyperparameter tuning involves finding the optimal set of hyperparameters for a machine learning model. For the case of our SVM classifier, we can use a technique like Grid Search or Randomized Search to find the best combination of hyperparameters.

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# Define the hyperparameters and their possible values
param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}

# Create a GridSearchCV object
grid_search = GridSearchCV(SVC(), param_grid, cv=3, verbose=2, n_jobs=-1)

# Perform the grid search on the training dataset
grid_search.fit(X_train, y_train)

# Print the best hyperparameters
print(f"Best hyperparameters: {grid_search.best_params_}")


### 7.3. Implementing a more complex model (e.g., Convolutional Neural Network)
Convolutional Neural Networks (CNNs) are a type of deep learning model that have shown great success in image classification tasks. To implement a CNN, we can use a deep learning library like TensorFlow or PyTorch.

In [None]:
# Example of a simple CNN using TensorFlow
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Create a CNN model
cnn_model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
cnn_model.fit(X_train.reshape(-1, 28, 28, 1), y_train_one_hot, epochs=5, batch_size=32)


#### Exercise 13
Try applying different feature engineering techniques to the dataset, such as image augmentation or dimensionality reduction, and evaluate the performance of the model after applying these techniques.

In [None]:
# Hint: Apply the chosen feature engineering technique to the training dataset and retrain the model.
# Code sketch:
# Apply feature engineering technique (e.g., image augmentation, PCA, etc.)
# Retrain the model using the transformed dataset


#### Exercise 14
Perform hyperparameter tuning for a different classifier, such as Random Forest or K-Nearest Neighbors, and compare the performance with the tuned SVM model.

In [None]:
# Hint: Define the hyperparameters and their possible values for the chosen classifier and perform a GridSearchCV or RandomizedSearchCV.
# Code sketch:
# Import the classifier (e.g., RandomForestClassifier, KNeighborsClassifier, etc.)
# Define the hyperparameters and their possible values
# Create a GridSearchCV or RandomizedSearchCV object
# Perform the search on the training dataset
# Print the best hyperparameters and compare the performance with the tuned SVM model


#### Exercise 15
Implement a more complex CNN architecture and compare its performance with the simpler CNN model provided above.

In [None]:
# Hint: Add more layers or increase the number of filters in the Conv2D layers.
# Code sketch:
# Create a more complex CNN model (e.g., add more layers, increase the number of filters, etc.)
# Compile the model
# Train the model
# Evaluate the model's performance and compare it with the simpler CNN model

## 8. Deploying the Model
### 8.1. Saving the trained model
Once we have trained and fine-tuned our model, we need to save it so that it can be used for inference later on. In this section, we'll learn how to save the trained model and load it for use in an application.

In [None]:
import joblib

# Save the trained model
joblib.dump(model, 'mnist_svm_model.pkl')


#### Exercise 16
Save the improved CNN model.

In [None]:
# Hint: Save the improved CNN model using the 'save' method.
# Code sketch:
# model.save('mnist_cnn_model.h5')

### 8.2. Loading the model for inference
After saving the model, we need to load it to use for predictions. Here's how we can load the saved model and use it for inference.

In [None]:
# Load the saved model
loaded_model = joblib.load('mnist_svm_model.pkl')

# Use the loaded model for inference
sample_digit = X_test[0]
prediction = loaded_model.predict(sample_digit.reshape(1, -1))
print("Predicted digit:", prediction[0])

#### Exercise 17
Load the saved CNN model and use it for inference.

In [None]:
# Hint: Load the saved CNN model using the 'load_model' method from tensorflow.keras.models
# Code sketch:
# from tensorflow.keras.models import load_model
# loaded_cnn_model = load_model('mnist_cnn_model.h5')
# Use the loaded CNN model for inference on a sample digit

### 8.3. Creating a simple application for digit recognition
Now that we have saved and loaded our model, we can create a simple application that utilizes the model for digit recognition. This can be a web application, a desktop application, or even a mobile application.

In [None]:
# Hint: For this exercise, we will provide a high-level overview and some code snippets for creating a simple Flask web application.
# You'll need to adapt the code and integrate the necessary components for digit recognition.
# Note that this exercise is more open-ended and may require you to explore additional documentation.

# Code sketch:

# Step 1: Install Flask (if not already installed)
# !pip install Flask

# Step 2: Create a new Flask application file (e.g., app.py) with the following content:

# from flask import Flask, render_template, request
# import numpy as np
# from tensorflow.keras.models import load_model

# app = Flask(__name__)

# def preprocess_input(image_data):
#     # Preprocess the input image_data here (e.g., normalize the pixel values, reshape the input, etc.)
#     return preprocessed_image

# @app.route('/', methods=['GET', 'POST'])
# def index():
#     if request.method == 'POST':
#         # Get the image data from the POST request
#         image_data = request.form['image_data']
        
#         # Preprocess the input image data
#         preprocessed_image = preprocess_input(image_data)
        
#         # Load the saved CNN model
#         loaded_cnn_model = load_model('mnist_cnn_model.h5')
        
#         # Use the loaded CNN model for inference on the preprocessed image
#         prediction = loaded_cnn_model.predict(preprocessed_image)
        
#         # Get the predicted digit and return it as a response
#         predicted_digit = np.argmax(prediction)
#         return str(predicted_digit)
#     return render_template('index.html')

# if __name__ == '__main__':
#     app.run(debug=True)

# Step 3: Create an 'index.html' file in a 'templates' folder with a simple user interface that captures the user's input (e.g., drawn digit).

# Step 4: Implement the 'preprocess_input' function in 'app.py' to handle the preprocessing of the input image data.

# Step 5: Run the Flask application and test the digit recognition functionality.

# Note: This code sketch provides a high-level overview of creating a Flask web application for digit recognition. You'll need to adapt the code, create the necessary files, and integrate the components for digit recognition. This exercise requires additional exploration and learning beyond the provided hints.


## 9. Conclusion
### 9.1. Recap of the tutorial
In this tutorial, we went through the process of creating a handwritten digit recognition model using the MNIST dataset. We started by loading and exploring the data, followed by preprocessing it for optimal performance. We then built, trained, and evaluated a machine learning model, and even improved it using feature engineering, hyperparameter tuning, and implementing a more complex model such as a Convolutional Neural Network (CNN). Finally, we demonstrated how to deploy the model using a simple Flask web application.

### 9.2. Potential applications of digit recognition
Handwritten digit recognition has a variety of real-world applications, including:
1. Postal mail sorting: recognizing handwritten zip codes on envelopes to facilitate mail sorting.
2. Bank check processing: identifying the amount written on checks in order to process transactions.
3. Form digitization: converting handwritten numbers on various forms (e.g., surveys, exams) into digital data for further processing and analysis.
4. Assistive technology: helping people with disabilities to communicate more effectively by recognizing handwritten numbers.
### 9.3. Further resources and recommendations
To deepen your understanding and explore other aspects of digit recognition and machine learning, we recommend the following resources:
1. [Deep Learning Specialization on Coursera](https://www.coursera.org/specializations/deep-learning) by Andrew Ng: A comprehensive course on deep learning, covering various aspects including Convolutional Neural Networks.
2. [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) by Jake VanderPlas: A useful guide for learning more about data science and machine learning using Python.
3. [TensorFlow Tutorials](https://www.tensorflow.org/tutorials): Official TensorFlow tutorials for various machine learning tasks, including digit recognition.
By continuing to practice and learn from these resources, you'll be well-equipped to tackle more complex machine learning problems and create even more powerful models. Happy learning!