# I. Algorithm

## 1. Math
### Norms:
$$||\mathbf{x}||_p = (|x_1|^p + |x_2|^p + \dots |x_n|^p)^{\frac{1}{p}}$$

Note:

__Manhattan distance__ ($p = 1$):
$$||\mathbf{z} - \mathbf{x_i}||_1 = ||\mathbf{z}||_1 - ||\mathbf{x_i}||_1(1)$$
__Euclidean distance__ ($p = 2$):
$$||\mathbf{z} - \mathbf{x_i}||_2^2 = (\mathbf{z} - \mathbf{x_i})^T(\mathbf{z} - \mathbf{x_i}) = ||\mathbf{z}||_2^2 + ||\mathbf{x_i}||_2^2 - 2||\mathbf{x_i}||^T||\mathbf{z}||(2)$$
- The formula is squared to avoid calculate square root.  

### Weight:
- "uniform" : each neighbor is treated equally.
- "distance" : nearer neigbor has more weight - the weight of each label is calculated by $\frac{1}{d}$. 

## 2. Code:
- Step 1: Use $(1)$ and $(2)$ to calculate distances.
- Step 2: Label based on the weights of n nearest neigbors.

In [85]:
import numpy as np
# Calculate distance
def calculate_manhattan_distance(X_train, X_test):
    return np.sum(X_train)
def calculate_euclidean_distance(X_train, X_test):
    return np.sum(X_train*X_train, 1).reshape(1, -1) + np.sum(X_test*X_test).reshape(-1, 1) - 2*X_test.dot(X_train.T)
# Rewrite np.choose
def choose(a,c):
    return np.array([c[a[I]] for I in np.lib.index_tricks.ndindex(a.shape)])
# Get n_nearest neighbors
def get_labels(distance,y_train,n_neighbors):
    distance = distance.argsort()[:, :n_neighbors]
    label = choose(distance, y_train).reshape(-1,n_neighbors)
    return label
# Calculate weight
def calculate_weight_uniform(label):
    unique = np.unique(label)
    result = np.zeros((label.shape[0],unique.shape[0]))
    for i in range(unique.shape[0]):
        zeros = np.zeros((labels.shape[0],labels.shape[1]))
        zeros[labels == i] = 1
        result[:,i] = zeros.sum(axis = 1)
        y_pre = result.argmax(axis = 1)
    return y_pre
def calculate_weight_distance(label,distance):
    unique = np.unique(label)
    result = np.zeros((label.shape[0],unique.shape[0]))
    for i in range(unique.shape[0]):
        zeros = np.zeros((labels.shape[0],labels.shape[1]))
        zeros[labels == i] = 1
        distance = distance[:, :n_neighbors]
        result[:,i] = (zeros/distance).sum(axis = 1)
        y_pre = result.argmax(axis = 1)
    return y_pre

# II. Practice

Apply K nearest neighbors on Iris data set
![](iris.png)

In [83]:
# Load data
from sklearn import neighbors, datasets
from sklearn.model_selection import train_test_split
np.random.seed(7)
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=130)

In [89]:
# Apply model
n_neighbors = 3
distances = calculate_euclidean_distance(X_train, X_test)
labels = get_labels(distances,y_train,n_neighbors)
y_pre = calculate_weight_distance(labels,distances)
from sklearn.metrics import accuracy_score
print(100*accuracy_score(y_test, y_pre))

93.84615384615384
