k-Nearest-Neighbor Classifier

In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

Note : 
1. k-NN is a type of instance-based learning, or lazy learning
2. KNN is a non-parametric model


Important points:

1. Dimensionality reduction
2. Curse of dimensionality
3. Parameter selection (K)
4. 1-nearest neighbor classifier
5. Kernel 


Make a simple example with out any lib


Import required lib

In [1]:
import numpy as np

import matplotlib.pyplot as plt

Prepare data


In [2]:
X_train = np.array([
[158, 64],
[170, 86],
[183, 84],
[191, 80],
[155, 49],
[163, 59],
[180, 67],
[158, 54],
[170, 67]
])
y_train = ['male', 'male', 'male', 'male', 'female', 'female', 'female',
'female', 'female']

In [3]:
plt.figure()
plt.title('Human Heights and Weights by Sex')
plt.xlabel('Height in cm')
plt.ylabel('Weight in kg')

<matplotlib.text.Text at 0x2f4cb021d68>

In [4]:
for i, x in enumerate(X_train):
# Use 'x' markers for instances that are male and diamond markers for instances that are female
    plt.scatter(x[0], x[1], c='k', marker='x' if y_train[i] == 'male' else 'D')
plt.grid(True)
plt.show()

In [5]:
x = np.array([[155, 70]])
distances = np.sqrt(np.sum((X_train - x)**2, axis=1))
distances

array([  6.70820393,  21.9317122 ,  31.30495168,  37.36308338,
        21.        ,  13.60147051,  25.17935662,  16.2788206 ,  15.29705854])

In [6]:
nearest_neighbor_indices = distances.argsort()[:3]
nearest_neighbor_genders = np.take(y_train, nearest_neighbor_indices)
nearest_neighbor_genders


array(['male', 'female', 'female'],
      dtype='<U6')

In [7]:
from collections import Counter
b = Counter(np.take(y_train, distances.argsort()[:3]))
b.most_common(1)[0][0]

'female'

Let implement by scikit-learn

In [15]:
from sklearn.preprocessing import LabelBinarizer
from sklearn.neighbors import KNeighborsClassifier


In [16]:
lb = LabelBinarizer()
y_train_binarized = lb.fit_transform(y_train)
y_train_binarized

array([[1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0]])

Set the parameter

In [17]:
K = 3

In [18]:
clf = KNeighborsClassifier(n_neighbors=K)
clf.fit(X_train, y_train_binarized.reshape(-1))
prediction_binarized = clf.predict(np.array([155, 70]).reshape(1,
-1))[0]
predicted_label = lb.inverse_transform(prediction_binarized)
predicted_label

array(['female'],
      dtype='<U6')