# Exercise 6 - Statistical Reasoning - ‘k’ Nearest Neighbour

### AIM:
To write a python program to implement the 'k' Nearest Neighbour algorithm.

### ALGORITHM :
```
Algorithm euclidian_dist(p1,p2)
    Input : p1,p2 - points as Tuple()s
   Output : euclidian distance between the two points
    
    return sqrt(
        sum(
            List([(p1[i]-p2[i])^2 for i <- 0 to p1.length])
        )
    )
end Algorithm

Algorithm KNN_classify(dataset,k,p)
    Input : dataset – Dict() with class labels as keys
                      and data_points for the class as values.
            p - test point p(x,y),
            k - number of nearest neighbour.
   Output : predicted class of the test point 
     
    dist=List([
        Tuple(euclidian_dist(test_point,data_point),class)
        for class in dataset
        for data_point in class
    ])
     dist = first k elements of sorted(dist,ascending)
     freqs = Dict(class:(freqency of class in dist) for class in data_set)
     return (class with max value in freqs)
end Algorithm
```

### SOURCE CODE :

In [1]:
from math import sqrt
def euclidian_dist(p1,p2):
    return sqrt(
        sum([(x1-x2)**2 for (x1,x2) in zip(p1,p2)])
    )

class KNNClassifier:
    def __init__(self,data_set,k=3,dist=euclidian_dist):
        self.data_set = data_set
        self.k = k
        self.dist = dist
    
    def classify(self,test_point):
        distances = sorted([    
            (self.dist(data_point,test_point),data_class)
            for data_class in self.data_set
            for data_point in self.data_set[data_class]
        ])[:self.k]
        freqs={data_class:0 for data_class in self.data_set}
        for (_,data_class) in distances:
            freqs[data_class]+=1
        return max(freqs,key = freqs.get)

if __name__ == "__main__":
    data_set = {
        "Class 1":{(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)},
        "Class 2":{(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)}
    }
    test_points= [(2.5,7),(7,2.5)]
    classifier = KNNClassifier(data_set,3)
    for test_point in test_points:
        print(
            f"The given test point {test_point} is classified to:",
            classifier.classify(test_point)
        )

The given test point (2.5, 7) is classified to: Class 1
The given test point (7, 2.5) is classified to: Class 2


### Alternative method using numpy:

In [2]:
import numpy as np

def euclidian_dist_np(p1,p2):
    return np.sqrt(np.sum((p1-p2)**2,axis=-1))

class KNNClassifier:
    def __init__(self,train_x,train_y,k=3,dist=euclidian_dist_np):
        self.train_x = train_x
        assert train_y.dtype == np.int, "Class labels should be integers"
        self.train_y = train_y
        self.k = k
        self.dist = dist
    
    def classify(self,test_point):
        k_nearest_classes = self.train_y[
            # indexes of k nearest neignbours
            np.argsort(self.dist(self.train_x,test_point))[:self.k] 
        ]
        # maximum occuring class 
        return np.bincount(k_nearest_classes).argmax() 

In [3]:
if __name__ == "__main__":
    dataset = np.loadtxt("knn_dataset.csv",dtype=np.float,delimiter=",")
    train_x,train_y = dataset[:,:-1], dataset[:,-1].astype(np.int)
    test_x= np.array([[2.5,7],[7,2.5]])
    k = 3
    classifier = KNNClassifier(train_x,train_y,k=k)
    for test_vector in test_x:
         print(
            f"The given test point {test_vector} is classified to Class :",
            classifier.classify(test_vector)
        )

The given test point [2.5 7. ] is classified to Class : 1
The given test point [7.  2.5] is classified to Class : 2


---