# K-Nearest Neighbour 

<img src="https://cdn-images-1.medium.com/max/587/1*hncgU7vWLBsRvc8WJhxlkQ.png">

 * For classification the majority vote of the k closest training points with respect to the input feature vector is taken. 
 * For regression: the value for the test example becomes the weighted average of the values of the k neighours 
 * knn performs well if all the data has same scale 

### Making K-NN more powerful 
 * A good value for K can be determined by considering a range of K values. 
 * We can use a distance based voting scheme, where closer neighbour have more influence. 
 * can use other distance measures (manhattan, minkowski,hamming ...) 
 
### Pros and Cons 
 * simple and powerful. No need for tuning complex parameters to build a model 
 * no training involved. 
 * The quality of the predictions depends on the distance measure. Therefore, the KNN algorithm is suitable for applications for which sufficient domain knowledge is available
 * expensive and slow. To determine the nearest neibhour of a new point x, must compute the distance to all m trianing samples.
 
 
### Applications 
* Hand writen character classification 
* intrusion detection 
* fault detection 

## Algorithm pseudocode 
<img src = "https://www.researchgate.net/profile/Jung-Keun-Hyun/publication/260397165/figure/fig7/AS:214259620421658@1428094882662/Pseudocode-for-KNN-classification.png">

## Implimentation 

In [31]:
import numpy as np 
from collections import Counter 

# global function 
def euclidean_distance(x1,x2):
    return np.sqrt(np.sum(x1-x2)**2)

class KNN:
    def __init__(self,k = 3):
        self.k = k 
        
    def fit(self,X,y):
        self.X_train = X 
        self.y_train = y 
    
    def predict(self,X):
        # X is a multiple input 
        predicted_labels = [self._predict(x) for x in X]
        return np.array(predicted_labels) 
    
    
    def _predict(self,x):
        # compute the distances 
        distances = [euclidean_distance(x,x_train) for x_train in self.X_train]
        
        # sorting and get the k nearest neighbours 
        k_indices = np.argsort(distances)[0:self.k]
        k_nearest_labels = [self.y_train[i] for i in k_indices] 
        print('indices wrt smallest k distances = ',k_indices,' ---> labels = ',k_nearest_labels)
            
        # majority vote
        most_common = Counter(k_nearest_labels).most_common(1) 
        return most_common[0][0]    

In [32]:
# example 
import numpy as np 
from sklearn import datasets 
from sklearn.model_selection import train_test_split 
import matplotlib.pyplot as plt 

iris = datasets.load_iris() 
X,y = iris.data, iris.target
X_train, X_test, y_train,y_test = train_test_split(X,y, test_size = 0.2, random_state = 2) 


In [33]:
knn_model = KNN()
knn_model.fit(X_train,y_train)

In [34]:
knn_model.predict(X_test)

indices wrt smallest k distances =  [30 55 53]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [117 110  48]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [118  74 109]  ---> labels =  [1, 2, 1]
indices wrt smallest k distances =  [117 110  48]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [ 69 103  16]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [ 26  78 114]  ---> labels =  [2, 2, 2]
indices wrt smallest k distances =  [76 55 30]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [106  83  96]  ---> labels =  [2, 2, 2]
indices wrt smallest k distances =  [ 96 106  83]  ---> labels =  [2, 2, 2]
indices wrt smallest k distances =  [ 48 117 110]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [116  37  24]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [ 99  73 117]  ---> labels =  [0, 0, 0]
indices wrt smallest k distances =  [53 48 30]  ---> labels =  [0, 0, 0]
indices wrt smallest 

array([0, 0, 1, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 0, 2, 1, 0, 1, 2, 1, 1, 1,
       2, 1, 1, 0, 0, 1, 0, 2])

In [38]:
y_test

array([0, 0, 2, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 0, 1, 1, 0, 1, 2, 1, 1, 1,
       2, 1, 1, 0, 0, 2, 0, 2])