# K-nearest neighbors (KNN)

## Theory

K-nearest neighbors is a supervised machinlearning algorithm used for classification 
and regression. It assumes that data points corresponding to the same class are close 
to eachother. The algorithm computes the distance between a new data point and 
all other data points. Subsequently, it selects the K nearest neighbors and counts which 
class is most common. The most common class in the K nearest neighbors is the predicted 
class of the new example.

## Implementation

In [45]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split
from scipy.stats import mode

In [58]:
def k_nearest_neighbors(X_train, y_train, X_test, k=3):
	"""
	Runs the K nearest neighbors algorithm to determine 
	to which class the X_test values belong to.

	:param X_train: feature matrix, where every row is a training example
	:param y_train: vector of the class label where the corresponding feature vector belongs to
	:param X_test: feature matrix of new examples
	:param k: number of groups (default is 3)
	"""
	predictions = []

	# Compute the distance between the predict data point and each training data point
	for x_test in X_test:

		distances = []

		for x_train in X_train:
			
			distance = np.linalg.norm(x_train - x_test)
			distances.append(distance)

		# Get the labels of the k closest training points
		sort_indices = np.argsort(distances)[:k]
		labels = y_train[sort_indices]
		
		# Assing the most common label to the new data point
		predictions.append(mode(labels)[0])

	return np.asarray(predictions).flatten()

In [88]:
df = pd.read_csv("../data/BreastCancer.csv")
df.drop(labels=["id", "Unnamed: 32"], axis=1, inplace=True)

X = df.drop(labels=["diagnosis"], axis=1).to_numpy()
y = df["diagnosis"].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Compute the accuracy of the algorithm
score = np.equal(k_nearest_neighbors(X_train, y_train, X_test, k=2), y_test).sum()/y_test.shape[0]
print(score)

0.9210526315789473
