# k-Nearest Neighbors

In this lab you will explore tuning a k-Nearest Neighbors model for image classification.

We will use the [Imagenette dataset](https://github.com/fastai/imagenette), a small subset of the larger ImageNet dataset.  Download the 160 px version and extract it in the same folder as this notebook.

In [1]:
import glob
import imageio
import skimage
from matplotlib import pyplot as plt
import numpy as np

Here is some code to load the dataset and remap the labels to nicer names.

In [2]:
label_map = {
    'n01440764':'fish',
    'n02102040':'dog',
    'n02979186':'cassette_tape',
    'n03000684':'chain_saw',
    'n03028079':'church',
    'n03394916':'french_horn',
    'n03417042':'garbage_truck',
    'n03425413':'gas_station',
    'n03445777':'golf_ball',
    'n03888257':'parachute'
}

In [None]:
def load_split(split):
    images = []
    labels = []
    for key,value in label_map.items():
        image_paths = sorted(glob.glob(f'imagenette2-160/{split}/{key}/*.JPEG'))
        for path in image_paths:
            image = imageio.imread(path)
            if len(image.shape)>2:
                image = skimage.color.rgb2gray(image)
            if image.shape[0] > 160:
                image = image[image.shape[0]//2-80:image.shape[0]//2+80,:]
            if image.shape[1] > 160:
                image = image[:,image.shape[1]//2-80:image.shape[1]//2+80]
            images.append(image)
            labels.append(value)
    return images, labels

train_images, train_labels = load_split('train')
val_images, val_labels = load_split('val')

  image = imageio.imread(path)
  image = imageio.imread(path)


Now what we have are lists `train_images` and `train_labels` containing the images and labels for the training set, and the same for the validation set.

1. Compute a HOG descriptor for each image to make two lists, `train_descriptors` and `val_descriptors` (```skimage.features.hog```).

(9469, 26244)

2. Build a k-nearest neighbors classifier on the training set (```sklearn.neighbors.KNeighborsClassifier```).

This model will find the $k$ nearest neighbors to the query point and output the most common label.  Use the default value of $k$.

Run the model on the test set and print out the accuracy (```sklearn.metrics.accuracy_score```).

3. Test $k$ from 1 to 20 and make a plot of the train and test accuracy.  Explain how bias and variance changes as $k$ increases.  Which is the best setting of $k$?