# BIG DATA ANALYSIS: Distance and Similarity
- 여러 종류의 Distance와 Similarity를 직접 계산해보자.
---

## 1. Distances

In [1]:
import numpy as np

<img src="data/distance.jpg"/>

In [2]:
def ManhattanDistance(x,y):
    return np.sum(abs(x-y))


def EuclideanDistance(x,y):
    return np.sqrt(np.sum(np.power(x-y,2)))

def Chebyshevdistance(x,y):
    return np.max(abs(x-y))




<img src="data/distance_table.png"/>

In [3]:
points =np.matrix([
    [0,2],
    [2,0],
    [3,1],
    [5,1],
])

In [4]:
tables = np.zeros((points.shape[0],points.shape[0]))
distances = [EuclideanDistance, ManhattanDistance, Chebyshevdistance]
for Distance in distances:
    for i in range(len(points)):
        for j in range(len(points)):
            tables[i][j]=(Distance(points[i],points[j]))
    print(Distance.__name__)
    print(tables)

EuclideanDistance
[[0.         2.82842712 3.16227766 5.09901951]
 [2.82842712 0.         1.41421356 3.16227766]
 [3.16227766 1.41421356 0.         2.        ]
 [5.09901951 3.16227766 2.         0.        ]]
ManhattanDistance
[[0. 4. 4. 6.]
 [4. 0. 2. 4.]
 [4. 2. 0. 2.]
 [6. 4. 2. 0.]]
Chebyshevdistance
[[0. 2. 3. 5.]
 [2. 0. 1. 3.]
 [3. 1. 0. 2.]
 [5. 3. 2. 0.]]


## 2. Minkowski 거리

${\displaystyle D\left(X,Y\right)=\left(\sum _{i=1}^{n}|x_{i}-y_{i}|^{p}\right)^{1/p}}$

In [5]:
def MinkowskiDistance(x,y,r):
    return np.power(np.sum(np.power(abs(x-y),r)),1/r)

## 3. Cosine Similarity / Distance

${\displaystyle {\text{similarity}}=\cos(\theta )={\mathbf {A} \cdot \mathbf {B}  \over \|\mathbf {A} \|\|\mathbf {B} \|}={\frac {\sum \limits _{i=1}^{n}{A_{i}B_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{A_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{B_{i}^{2}}}}}},}$

In [6]:
def CosinDistance(x,y):
    return 1-np.sum(np.multiply(x,y))/(np.sqrt(np.sum(np.power(x,2)))*np.sqrt(np.sum(np.power(y,2))))
