### MNIST Classification projects

In this project, I'm assigned to do classification on the MNIST handwritten digit, this is an old dataset for any Machine Learning Beginner. 

At the moment, I've learnt KNN, Naive Bayes, Decision Tree and SVM (convert to dual problem and solve by quadratic programming). After the first try with SVM quadratic programming with the following objective function:

$$ g(\lambda, \mu) = \min_{\mathbf{w}, b, \xi} \mathcal{L}(\mathbf{w}, b, \xi, \lambda, \mu) $$
$$ g(\lambda, \mu) = \sum_{n=1}^N \lambda_n - \frac{1}{2} \sum_{n=1}^N\sum_{m=1}^N \lambda_n \lambda_m y_n y_m k(\mathbf{x}_n^T, \mathbf{x}_m) $$

\begin{eqnarray}
     \lambda &=& \arg \max_{\lambda} g(\lambda)   &&\\
     \text{subject to:}~ && \sum_{n=1}^N \lambda_ny_n = 0 && \\
     && 0 \leq \lambda_n \leq C, ~\forall n= 1, 2, \dots, N && 
\end{eqnarray}

I realize that it's impossible to compute $ k(\mathbf{x}_n^T, \mathbf{x}_m) $ for the number of dataset = 60000. One way to overcome this difficulty is that approach the problem in another way in SVM that we can use gradient descent or iterative methods and change the loss function, I will try this approach later.

In this project, I do 2 algorithms in deep learning way: 
- Softmax Regression (no hidden layer)
- Lenet (5 hiddens layer)

--------------
 

#### 1. Load MNIST dataset

In [1]:
from utils import load_dataset_mnist, preprocess_data
from mnist import MNIST
from sklearn.preprocessing import StandardScaler

load_dataset_mnist()

mndata = MNIST('data_mnist')

images, labels = mndata.load_training()

-------> Downloading MNIST dataset
-------> Finish


#### 2. Training or Loading Lenet Model

In [2]:
import os
from lenet import Lenet
import tensorflow as tf

training_phase = "saved_model" not in os.listdir()

lenet = Lenet(20, 64, tf.train.AdamOptimizer(learning_rate=0.001), tf.losses.softmax_cross_entropy)

if training_phase:
    images, labels = mndata.load_training()
    images, labels = preprocess_data(images, labels, True)
    lenet.train(images, labels)
else:
    images_test, labels_test = mndata.load_testing()
    images_test, labels_test = preprocess_data(images_test, labels_test, True, True)

    pred = lenet.predict(images_test)
    print("Accuracy:", len(labels_test[pred == labels_test]) / len(labels_test))  # 98%

    from sklearn.metrics.classification import confusion_matrix
    print("Confusion matrix: ")
    print(confusion_matrix(labels_test, pred))


Instructions for updating:
Colocations handled automatically by placer.
[7 2 1 ... 4 5 6]
Accuracy: 0.9799
Confusion matrix: 
[[ 972    0    2    0    0    0    2    1    3    0]
 [   0 1128    1    2    0    1    2    1    0    0]
 [   5    2 1014    2    0    0    0    5    4    0]
 [   0    0    3  994    0    7    0    3    3    0]
 [   1    1    3    0  942    0    8    2    3   22]
 [   1    1    0    4    0  883    2    0    0    1]
 [   7    4    1    1    2    8  932    0    3    0]
 [   1    1   12    3    0    0    0 1008    2    1]
 [   3    0    2    5    1    3    0    2  956    2]
 [   3    6    0    3    6    7    1    7    6  970]]


------------------
Accuracy is 98% with 20 epochs. We can observe when looking at the confusion matrix that at the class 5, model misclassify to class 9 with the highest rate with 22. Also at class 7, model misclassify to class 2 with 12.

#### 3. Training with Softmax Regression

In [10]:
from softmax_regression import SoftmaxRegression

images, labels = mndata.load_training()
images, labels = preprocess_data(images, labels)
softmax = SoftmaxRegression(epochs=20)
softmax.train(images, labels)

images_test, labels_test = mndata.load_testing()
images_test, labels_test = preprocess_data(images_test, labels_test, test=True)

pred = softmax.predict(images_test)

print("Accuracy:", len(pred[labels_test == pred]) / len(pred))
from sklearn.metrics.classification import confusion_matrix
print("Confusion matrix: ")
print(confusion_matrix(labels_test, pred))

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


Loss at epoch 1 14.09
Loss at epoch 2 6.05
Loss at epoch 3 4.60
Loss at epoch 4 3.91
Loss at epoch 5 3.47
Loss at epoch 6 3.15
Loss at epoch 7 2.92
Loss at epoch 8 2.74
Loss at epoch 9 2.59
Loss at epoch 10 2.45
Loss at epoch 11 2.34
Loss at epoch 12 2.23
Loss at epoch 13 2.15
Loss at epoch 14 2.07
Loss at epoch 15 2.01
Loss at epoch 16 1.94
Loss at epoch 17 1.88
Loss at epoch 18 1.83
Loss at epoch 19 1.78
Loss at epoch 20 1.73
Accuracy: 0.8675
Confusion matrix: 
[[ 928    0    4    5    4   17   12    5    4    1]
 [   2 1068   13    6    2    4    4    2   33    1]
 [  11   29  884   28   10    5   19   14   19   13]
 [   6   12   33  869    4   41    5   18   13    9]
 [   3    6    9    6  860    3   10   16   16   53]
 [  15   10    7   48   14  716   20   13   37   12]
 [  23    2    9    1   16   25  867    3    8    4]
 [   4   14   26   13   15    1    3  891    5   56]
 [   8   20   17   33   20   44   12   18  776   26]
 [   7    5    3   12   75   13    6   58   14  816]]


With softmax regression the accuracy is 87%! Not so bad.