# **Classification**

#### Book used: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (third edition)

#### Chapter 3 exercises

## 1. An MNIST Classifier With Over 97% Accuracy

#### Exercise: _Try to build a classifier for the MNIST dataset that achieves over 97% accuracy on the test set. Hint: the `KNeighborsClassifier` works quite well for this task; you just need to find good hyperparameter values (try a grid search on the `weights` and `n_neighbors` hyperparameters)._

Importing the dataset and creating training and testing sets:

In [1]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', as_frame=False)

In [2]:
X, y = mnist.data, mnist.target

In [3]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

Trying a `KNeighborsClassifier`:

In [4]:
from sklearn.neighbors import KNeighborsClassifier

knn_classifier = KNeighborsClassifier()
knn_classifier.fit(X_train, y_train)

In [5]:
initial_accuracy = knn_classifier.score(X_test, y_test)
initial_accuracy    

0.9688

Checking the hyperparameters:

In [6]:
hyperparams = knn_classifier.get_params()
print(hyperparams)

{'algorithm': 'auto', 'leaf_size': 30, 'metric': 'minkowski', 'metric_params': None, 'n_jobs': None, 'n_neighbors': 5, 'p': 2, 'weights': 'uniform'}


Trying a grid search to go through different values for `n_neighbors` and `weights` (first 10,000 to speed things up):

In [7]:
from sklearn.model_selection import GridSearchCV

param_grid = [
    {"weights": ["uniform", "distance"],
     "n_neighbors": [3, 4, 5, 6, 7, 8, 9, 10]}
]

grid_search = GridSearchCV(knn_classifier, param_grid, cv=3, scoring="accuracy")
grid_search.fit(X_train[:10000], y_train[:10000])

In [8]:
grid_search.best_params_

{'n_neighbors': 4, 'weights': 'distance'}

In [9]:
grid_search.best_score_

0.9397994088551026

On the full dataset:

In [11]:
grid_search.best_estimator_.fit(X_train, y_train)
test_accuracy = grid_search.score(X_test, y_test)
test_accuracy

0.9714