# K - nearest neighbor Algorithm 

* Non parametric, instance -based , lazy learning algorithm. 
* Lazy - No training phase, all work at prediction time
* Instance -based - predictions use the training data directly.
* Non-parametric - no fixed finite parameter vector. In KNN the model is the metric + the value of k. There are no learned parameters like in linear regression.
* Local averaging estimator
* **Given a query point $x$; we compute the distances $d(x,x_i)$ and then select the $k$ nearest points; $N_k(x)$. Predict using only those points.**


$$
\Large \hat f(x)=\frac{1}{k}\sum_{i \in \mathcal{N}_k(x)} y_i
$$

$$
\Large \hat y(x)=\arg\max_{c}\sum_{i \in \mathcal{N}_k(x)} \mathbb{I}(y_i = c)
$$

- If we choose small k , very local, sensitive to noise
- If we choose large k, heavy averaging, over-smoothing
- Feature scaling is mandatory.
- KNN works because of local smoothness; points close in input space tends to ahve similar outputs. As dimension increases, all distances becomes similar.
- Mahalanobis distance handles correlated features
- Feature scaling, correlated features, mixed data types, curse of dimensionality.

$$
\Large d_M(x, z)=\sqrt{(x - z)^T \Sigma^{-1} (x - z)}
$$

* when we take the normal difference $x-z$ . It does not tell you how surprising that difference it ; so surprise depends on how variable the data is and in which direction data naturally varies. so distance must be normalized by variance. 
* Mahalanobis distance answers how many standard deviations away is one point from another, considering the dataâ€™s natural variation. 
* If we remove the correlations and makes variances equal to 1, then Mahalanobis distance is Euclidean distance in whitened space.

# Implementation

In [1]:
import numpy as np

# Toy dataset
X_train = np.array([
    [1.0, 2.0],
    [2.0, 3.0],
    [3.0, 3.0],
    [6.0, 5.0],
    [7.0, 7.0],
    [8.0, 6.0]
])

y_train = np.array([0, 0, 0, 1, 1, 1])  # Binary labels

# Query point
x_query = np.array([5.0, 5.0])


In [3]:
def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

In [5]:
# Compute distances to all training points
distances = np.array([euclidean_distance(x_query, x) for x in X_train])
distances

array([5.        , 3.60555128, 2.82842712, 1.        , 2.82842712,
       3.16227766])

In [6]:
def knn_predict(X_train, y_train, x_query, k=3):
    # Compute distances
    distances = np.sqrt(np.sum((X_train - x_query) ** 2, axis=1))
    
    # Get indices of k nearest neighbors
    nearest_idx = np.argsort(distances)[:k]
    
    # Classification (majority vote)
    nearest_labels = y_train[nearest_idx]
    counts = np.bincount(nearest_labels)
    predicted_class = np.argmax(counts)
    
    return predicted_class, nearest_idx

pred_class, nearest_idx = knn_predict(X_train, y_train, x_query, k=3)
pred_class, nearest_idx


(np.int64(1), array([3, 2, 4]))

In [7]:
y_train_reg = np.array([1.2, 1.9, 2.1, 5.0, 6.5, 5.5])  # Continuous outputs

def knn_regression(X_train, y_train, x_query, k=3):
    distances = np.sqrt(np.sum((X_train - x_query) ** 2, axis=1))
    nearest_idx = np.argsort(distances)[:k]
    return np.mean(y_train[nearest_idx]), nearest_idx

pred_value, nearest_idx = knn_regression(X_train, y_train_reg, x_query, k=3)
pred_value, nearest_idx


(np.float64(4.533333333333333), array([3, 2, 4]))

In [8]:
# Suppose second feature has higher scale
X_train_scaled = np.array([
    [1.0, 200.0],
    [2.0, 300.0],
    [3.0, 250.0],
    [6.0, 500.0],
    [7.0, 600.0],
    [8.0, 550.0]
])

# Query point
x_query_scaled = np.array([5.0, 500.0])

# Euclidean distance without scaling
distances_unscaled = np.sqrt(np.sum((X_train_scaled - x_query_scaled) ** 2, axis=1))

# Min-Max scaling
X_train_norm = (X_train_scaled - X_train_scaled.min(axis=0)) / (X_train_scaled.max(axis=0) - X_train_scaled.min(axis=0))
x_query_norm = (x_query_scaled - X_train_scaled.min(axis=0)) / (X_train_scaled.max(axis=0) - X_train_scaled.min(axis=0))

# Euclidean distance after scaling
distances_scaled = np.sqrt(np.sum((X_train_norm - x_query_norm) ** 2, axis=1))

distances_unscaled, distances_scaled


(array([300.02666548, 200.02249873, 250.00799987,   1.        ,
        100.019998  ,  50.08991915]),
 array([0.9428842 , 0.65853889, 0.68721005, 0.14285714, 0.37964806,
        0.44642857]))

In [9]:
from numpy.linalg import inv

# Covariance matrix of training data
cov_matrix = np.cov(X_train.T)
cov_inv = inv(cov_matrix)

def mahalanobis_distance(x, data, cov_inv):
    diffs = data - x
    return np.sqrt(np.diag(diffs @ cov_inv @ diffs.T))

maha_distances = mahalanobis_distance(x_query, X_train, cov_inv)
maha_distances


array([1.5411035 , 1.04446594, 1.36515067, 1.14812099, 1.36515067,
       1.90990242])

In [10]:
def knn_mahalanobis(X_train, y_train, x_query, k=3):
    cov_matrix = np.cov(X_train.T)
    cov_inv = inv(cov_matrix)
    distances = mahalanobis_distance(x_query, X_train, cov_inv)
    nearest_idx = np.argsort(distances)[:k]
    nearest_labels = y_train[nearest_idx]
    counts = np.bincount(nearest_labels)
    return np.argmax(counts), nearest_idx

pred_class_maha, nearest_idx_maha = knn_mahalanobis(X_train, y_train, x_query, k=3)
pred_class_maha, nearest_idx_maha


(np.int64(0), array([1, 3, 2]))