### MNIST Classification projects

In this project, I'm assigned to do classification on the MNIST handwritten digit, this is an old dataset for any Machine Learning Beginner. 

At the moment, I've learnt KNN, Naive Bayes, Decision Tree and SVM (convert to dual problem and solve by quadratic programming). After the first try with SVM quadratic programming with the following objective function:

$$ g(\lambda, \mu) = \min_{\mathbf{w}, b, \xi} \mathcal{L}(\mathbf{w}, b, \xi, \lambda, \mu) $$
$$ g(\lambda, \mu) = \sum_{n=1}^N \lambda_n - \frac{1}{2} \sum_{n=1}^N\sum_{m=1}^N \lambda_n \lambda_m y_n y_m k(\mathbf{x}_n^T, \mathbf{x}_m) $$

\begin{eqnarray}
     \lambda &=& \arg \max_{\lambda} g(\lambda)   &&\\
     \text{subject to:}~ && \sum_{n=1}^N \lambda_ny_n = 0 && \\
     && 0 \leq \lambda_n \leq C, ~\forall n= 1, 2, \dots, N && 
\end{eqnarray}

I realize that it's impossible to compute $ k(\mathbf{x}_n^T, \mathbf{x}_m) $ for the number of dataset = 60000. One way to overcome this difficulty is that approach the problem in another way in SVM that we can use gradient descent or iterative methods and change the loss function, I will try this approach later.

In this project, I do 2 algorithms in deep learning way: 
- Softmax Regression (no hidden layer)
- Lenet (5 hiddens layer)

--------------
 

#### 1. Load MNIST dataset

In [1]:
from utils import load_dataset_mnist, preprocess_data
from mnist import MNIST
from sklearn.preprocessing import StandardScaler

load_dataset_mnist()

mndata = MNIST('data_mnist')

images, labels = mndata.load_training()

-------> Downloading MNIST dataset
-------> Finish


#### 2. Training or Loading Lenet Model

In [2]:
import os
from lenet import Lenet
import tensorflow as tf

training_phase = "saved_model" not in os.listdir()

lenet = Lenet(20, 64, tf.train.AdamOptimizer(learning_rate=0.001), tf.losses.softmax_cross_entropy)

if training_phase:
    images, labels = mndata.load_training()
    images, labels = preprocess_data(images, labels, True)
    lenet.train(images, labels)
else:
    images_test, labels_test = mndata.load_testing()
    images_test, labels_test = preprocess_data(images_test, labels_test, True, True)
    lenet.load_model()
    pred = lenet.predict(images_test)
    print("Accuracy:", len(labels_test[pred == labels_test]) / len(labels_test))  # 98%

    from sklearn.metrics.classification import confusion_matrix
    print("Confusion matrix: ")
    print(confusion_matrix(labels_test, pred))


Instructions for updating:
Colocations handled automatically by placer.
----> LOADED WEIGHTS
Accuracy: 0.9863
Confusion matrix: 
[[ 970    0    1    0    1    0    5    1    2    0]
 [   0 1129    2    0    0    2    2    0    0    0]
 [   2    2 1020    1    1    0    0    4    2    0]
 [   0    0    1  996    0    4    0    5    2    2]
 [   0    0    1    0  966    0    4    1    1    9]
 [   3    0    0    5    0  881    2    1    0    0]
 [   0    2    1    0    3    2  948    0    1    1]
 [   0    6    4    0    1    0    0 1010    1    6]
 [   1    0    2    3    1    5    0    2  956    4]
 [   2    2    0    1    7    4    0    3    3  987]]


------------------
Accuracy is 98% with 20 epochs. Very well!

#### 3. Training with Softmax Regression

In [3]:
from softmax_regression import SoftmaxRegression

images, labels = mndata.load_training()
images, labels = preprocess_data(images, labels)
softmax = SoftmaxRegression(epochs=20)
softmax.train(images, labels)

images_test, labels_test = mndata.load_testing()
images_test, labels_test = preprocess_data(images_test, labels_test, test=True)

pred = softmax.predict(images_test)

print("Accuracy:", len(pred[labels_test == pred]) / len(pred))
from sklearn.metrics.classification import confusion_matrix
print("Confusion matrix: ")
print(confusion_matrix(labels_test, pred))

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


Loss at epoch 1 5.61
Loss at epoch 2 2.89
Loss at epoch 3 2.09
Loss at epoch 4 1.72
Loss at epoch 5 1.50
Loss at epoch 6 1.35
Loss at epoch 7 1.24
Loss at epoch 8 1.16
Loss at epoch 9 1.09
Loss at epoch 10 1.04
Loss at epoch 11 1.00
Loss at epoch 12 0.96
Loss at epoch 13 0.92
Loss at epoch 14 0.89
Loss at epoch 15 0.87
Loss at epoch 16 0.84
Loss at epoch 17 0.82
Loss at epoch 18 0.80
Loss at epoch 19 0.79
Loss at epoch 20 0.77
Accuracy: 0.8472
Confusion matrix: 
[[ 910    1    8    6    0   22   13    4   13    3]
 [   0 1070    4   10    0    5    3    2   40    1]
 [  14    9  865   25   14    6   20   18   50   11]
 [   9    7   28  841    0   61    3   11   34   16]
 [   1    5   11    5  836    9   26    9   20   60]
 [  17    7   11   71   23  643   23   10   77   10]
 [  30    3   16    4   24   14  854    1   10    2]
 [   9   14   28   10   10    2    3  878   13   61]
 [  13   14   20   41   19   51   15   17  764   20]
 [  11    5    4    9   74   15    2   53   25  811]]


With softmax regression the accuracy is 85%! Not so bad.