<a href="https://colab.research.google.com/github/tinnethx/Machine-Learning-Course-2days/blob/main/testML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import warnings
warnings.filterwarnings('ignore')

# MNIST digit classification: Classical ML

## 1. The dataset

The MNIST dataset<sup>1</sup> (Modified National Institute of Standards and Technology dataset) is a large dataset containing pre-processed **28x28 pixel** images of handwritten digits. The dataset is widely used for training and testing in the field of machine learning.

<sub>[1] THE MNIST DATABASE of handwritten digits, Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond.</sub>

Let's load this dataset. As this is a widely used dataset in Machine Learning, it can be loaded straight from the [openml.org](https://www.openml.org/) public repository with the following Scikit-learn function: 

In [None]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', as_frame=False, cache=False)

The `fetch_openml()` function returns a Python [dictionary-like object](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html). The actual data can be obtained with the following keys:

- data: np.array, scipy.sparse.csr_matrix of floats, or pandas DataFrame
- target: np.array, pandas Series or DataFrame

Let's explore the feature vectors:

In [None]:
print("Length of feature vector: {}\n".format(len(mnist.data[0])))
print("Example of feature vector:\n")
print(mnist.data[0])

Each image in the loaded dataset is represented by a 784 dimensional vector with one gray-scale value (0 means black, 255 means white) for each of the 28x28 pixels.

We can reshape this feature vector to the gray-scaled image as follows:

In [None]:
mnist.data[0].reshape((28,28))

We will denote the feature vectors as `X` and the corresponding labels as `y`:

In [None]:
X = mnist.data
y = mnist.target

print(X.dtype)
print(y.dtype)

Notice that the labels are Python objects (strings):

In [None]:
print(y)

We convert these to numbers (integers):

In [None]:
y = mnist.target.astype('int64')

We can use the Python [matplotlib](https://matplotlib.org/) library to plot the digit images in `X` (the label for each image is shown in the title):

In [None]:
import matplotlib.pyplot as plt
from random import randint

# Display 9 randomly selected images
for c in range(1, 10):
    plt.subplot(3, 3,c)
    i = randint(0,X.shape[0])
    im = X[i].reshape((28,28))
    plt.axis("off")
    plt.title("Label = {}".format(y[i]))
    plt.imshow(im, cmap='gray')

It is 'best practice' in Machine Learning to normalize the feature values such that all features have values with the same 'not to large' scale. This facilitates faster convergence during training.

Scikit-learn has functions to normalize features in `sklearn.preprocessing`. The most common ones are [`MinMaxScaler()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) and [`StandardScaler()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html). 

We will use the `MinMaxScaler()`:

In [None]:
from sklearn.preprocessing import MinMaxScaler

X = MinMaxScaler().fit_transform(X)

print(X[0])

To evaluate our trained model we need to first create an independent test set with images that are not used during training:

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, shuffle=True, random_state=42)

## 2. The model

Now we are ready to define our model. We will fit a [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) model:

In [None]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()

Each Scikit-learn model has a function `fit()` that optimizes the modelparameters to minimize the cost function:

In [None]:
clf.fit(X_train,y_train)

We can access the modelparameters of the fitted logistic regression model as follows: 

In [None]:
print(clf.coef_)
print(clf.intercept_)

As the Scikit-learn implementation of logistic regression performs a one-vs-all multi-class apporach we get 10 lists of modelparameters and 10 intercepts, one for each of the 10 classes.

Let's plot the modelparameters for each class as a 28x28 image:

In [None]:
import numpy as np

coef = clf.coef_
scale = np.abs(coef).max()
plt.figure(figsize=(16,6))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(28,28), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

Each Scikit-learn model also has a function `predict()` that applies the fitted model to compute class lables for feature vectors:

In [None]:
# Perform the predictions
y_predicted = clf.predict(X_test)

print(y_predicted)

## 3. Evaluation

Scikit-learn offers many [metrics](https://scikit-learn.org/stable/modules/model_evaluation.html) for evaluating the prediction performance. The most common metric is `accuracy`:

In [None]:
from sklearn.metrics import accuracy_score

print("Accuracy = {}%".format(accuracy_score(y_test, y_predicted)*100))

To get more insight into the prediction errors for each class we can compute a confusion matrix:

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

ConfusionMatrixDisplay.from_predictions(y_test, y_predicted)
plt.show()

Scikit-learn also offers a `classification_report()` function that computes metrics that are more suitable for imbalanced multi-class classification tasks:

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_predicted, labels=range(0,10)))

Now, let's take a look a the misclassified images in the test set: 

In [None]:
index = 0
misclassified_images = []
for label, predict in zip(y_test, y_predicted):
    if label != predict: 
        misclassified_images.append(index)
    index +=1
    
print("Number of misclassified test set images: {}".format(len(misclassified_images)))

Let's plot some of these:

In [None]:
plt.figure(figsize=(10,10))
for plot_index, bad_index in enumerate(misclassified_images[0:20]):
    p = plt.subplot(4,5, plot_index+1) # 4x5 plot
    
    p.imshow(X_test[bad_index].reshape(28,28), cmap=plt.cm.gray,
            interpolation='bilinear')
    p.set_xticks(()); p.set_yticks(()) # remove ticks
    
    p.set_title(f'Pred: {y_predicted[bad_index]}, Actual: {y_test[bad_index]}');

## 4. Hyperparameter optimization

Scikit-learn offers [many functions](https://scikit-learn.org/stable/modules/grid_search.html) for hyperparameter optimization. We will use `GridSearchCV()` that evaluates different hyperparamter value combinations using cross-validation.

In `GridSearchCV()` you define the hyperparameter values to consider as a Python dictionary:  

In [None]:
from sklearn.model_selection import GridSearchCV

grid = {
    "C":np.logspace(-3,3,7)
}

print(grid)

Next, we can initialize `GridSearchCV()` just like any other model in Scikit-learn:

In [None]:
clf_cv = GridSearchCV(clf, grid, cv=5, verbose=2, scoring='f1_macro')

Now we can just call the `fit()` function again to fit all the models and evaluate their prediction performance with the cross-validation procedure:

In [None]:
idx = np.random.choice(len(X_train),1000,replace=False)
X_train_small = X_train[idx]
y_train_small = y_train[idx]

In [None]:
clf_cv.fit(X_train_small,y_train_small)

A fitted `GridSearchCV()` has an attribute `cv_results_` that contains the cross-validation scores for each of the hyperparameter value combinations considered. The following code creates a Pandas Dataframe from `cv_results_` for easy visualization:

In [None]:
import pandas as pd

result_cv = pd.DataFrame()
result_cv["param_C"] = clf_cv.cv_results_["param_C"].data
result_cv["score"] = clf_cv.cv_results_["mean_test_score"]

result_cv

A fitted `GridSearchCV()` also has attributes `best_estimator_` and `best_score_` that contain the best performing model and its corresponding cross-validation score respectively: 

In [None]:
print(clf_cv.best_estimator_)
print(clf_cv.best_score_)

A fitted `GridSearchCV()` also has the function `predict()` that applies `best_estimator_` to predict the classes:

In [None]:
y_predicted = clf_cv.predict(X_test)

print(classification_report(y_test, y_predicted, labels=range(0,10)))

In [None]:
clf = LogisticRegression(C=1)

clf.fit(X_train,y_train)

y_predicted = clf.predict(X_test)

print(classification_report(y_test, y_predicted, labels=range(0,10)))

The `predict()` function returns the classes only. The logisitic regression algorithm 'predicts' probabilities for each class. The `predict_proba()` functions returns these probabilities:

In [None]:
y_predicted = clf.predict_proba(X_test)

These are the class probability predictions for the first instance in the test set `y_test`:

In [None]:
print(y_predicted[0])

This is the true class of this instance:

In [None]:
print(y_test[0])