KNN is one of the simpliest and most applicable classifiers. Its variations (such as TIFU-KNN: https://dl.acm.org/doi/abs/10.1145/3397271.3401066) are extremely widespread in RecSys.

#### The algorithm behind KNN
- We select the train sample, consisting of $X_{train}$ and corresponding labels $Y_{train}$, which ideally must be as big as possible to capture as much information as possible;
- On new, unobserved sample, we select $K$ items from a sorted vector in ascending order that is calculated by $L_2(x_i,X_{train})$, where $L_2$ is an l-2 norm or Euclidean distance;
- To get the best label-prediction for the sample we could simply take the mode of $Y_{train}$ that is clipped by $K$ nearest items

In [7]:
import numpy as np

class customKNN():

    def __init__(self, k=5) -> None:
        self.k = k
        self.X = []
        self.y = []

    def _get_k_nearest(self, x_i):
        return np.argsort(np.linalg.norm(
            np.expand_dims(x_i,axis=0)-self.X, axis=0))[:self.k]

    def fit(self, X_train, y_train):
        self.X = np.array(X_train)
        self.y = np.array(y_train)

    def _get_mode(self, y_sample):
        _, counts = np.unique(y_sample, return_counts=True)
        return y_sample[np.argmax(counts)]

    def predict(self, X):
        return [self._get_mode(
            self.y[self._get_k_nearest(x_i)]) for x_i in X]

Yet again we can compare our own implementation with scikit-learn on their example provided in the docs for the KNN:

In [10]:
# custom implementation
X = [[0, 2], [1, 2], [2, 3], [3, 4]]
y = [0, 0, 1, 1]
knn = customKNN(3)
knn.fit(X, y)
knn.predict([[1, 1]])

[0]

In [11]:
# sklearn implementation
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y)
neigh.predict([[1, 1]])

array([0])