## Comparing K-Nearest Neighbor Classification with Neural Network Classification
---
Created by Terron Ishihara, Modified by University of Washington AI4All, 2020

### K-Nearest Neighbor Classification with a Larger Dataset

Let's first explore building a K-Nearest Neighbors classifier using the same MNIST dataset as was used to build our neural network classifier. In the K-nearest neighbor classification, we used very low-resolution (8x8 pixels) version of the MNIST data set trained with about 1600 images.  The larger dataset used to train the neural network classifier was 7000 28 * 28 pixel images. 

#### Data and Library Import

In [6]:
import matplotlib.pyplot as plt    # charting library
import numpy as np                 # Python array library
import pandas as pd                # Pandas dataframe library
import joblib                      # for importing and storing classifier libraries
import datetime

# Logic to run this notebook on Google Colab.   Prior to running this notebook,
# Create a "\neural" folder under "\Colab Notebooks" to write classifier model files to.
# If run from from local Jupyter install, comment out drive commands and make FILEROOT an empty string.
 
from google.colab import drive
drive.mount("/drive", force_remount=True)
FILEROOT = "/drive/My Drive/Colab Notebooks/neural/"

from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
from sklearn import datasets, neighbors

print (datetime.datetime.now())
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = X / X.max()

# Partition the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y)

print (datetime.datetime.now())
print (len(X))

Mounted at /drive
2020-07-28 04:55:15.764468
2020-07-28 04:55:36.642545
70000


### Training and Testing the K-Nearest Neighbor with a Larger Dataset

> All that's left is to import the K-Nearest Neighbors classifier, train the classifier on the training set, and test the resulting model on the test set. 

> If you actually run this you will find that although buiding/training the classifer with the larger dataset is very quick, running tests is actually very slow. 

#### Warning: Computing accuracy with this model is slow

In [7]:
# Import the default K-Nearest Neighbors classifier

print (datetime.datetime.now())
knn = neighbors.KNeighborsClassifier(n_neighbors=5)

# Train the classifer
knn.fit(X_train, y_train)
print (datetime.datetime.now())

mnist_file = FILEROOT + "mnist_kmeans_model_KNKN.pkl"
joblib.dump(knn, mnist_file)

# Compute the score (mean accuracy) on test set
score = knn.score(X_test, y_test)
print('KNN score: %f' % score)
print (datetime.datetime.now())

2020-07-28 04:55:36.656042
2020-07-28 04:55:49.377482
KNN score: 0.968229
2020-07-28 05:21:44.020134


### Neural Network Classification with a Smaller Dataset

Next let's explore building a Neural Network classifier using the same smaller MNIST dataset as was used to build the K-nearest neighbor classifier a few lessons ago. In the K-nearest neighbor classification, we used very low-resolution (8x8 pixels) version of the MNIST data set trained with about 1600 images.  The larger dataset used to train the neural network classifier was 7000 28 * 28 pixel images. 

#### Data Import


In [8]:
# Load the digits data set
digits = datasets.load_digits() 

# Extract the input data, force values to be between 0.0 and 1.0
X_digits = digits.data / digits.data.max()

# Extract the true values for each sample (each a digit between 0-9)
y_digits = digits.target

# Partition the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_digits, y_digits)

#### Training and Testing the Neural Network

In [9]:
from sklearn.neural_network import MLPClassifier

# Initialize the classifier.   You will need to play with the parameters to get the best results

print (datetime.datetime.now())
mlp_clf = MLPClassifier(
    hidden_layer_sizes=(98,32), 
    solver='sgd', 
    activation='relu',
    max_iter=1000
)

# Train the classifier
mlp_clf.fit(X_train, y_train)

print (datetime.datetime.now())

# Save the model file in the current working directory.   Change the file name for each iteration
mnist_file = FILEROOT + "mnist_model_SNSN.pkl"
joblib.dump(mlp_clf, mnist_file)

# Get the mean accuracy on the test data and print it
score = mlp_clf.score(X_test, y_test)
print (score)

print (datetime.datetime.now())

2020-07-28 05:21:44.148166
2020-07-28 05:21:58.451959
0.9644444444444444
2020-07-28 05:21:58.507248
