# Exercise 11: Proximity

Objectives
+ Demonstrate the ability to code algorithms from text desriptions
+ Practice different types of (dis)similarity measures.

In this exercise I'd like you to implement <ins>three functions</ins> to measure proximity or similarity:

+ `euclidean_dist` which receives two 1-D numpy arrays and returns the Euclidian distance between the two
+ `minkowski_dist` which receives two 1-D numpy arrays and a "norm order" and returns the distance between the two 1-D arrays
+ `jaccard_sim` which receives two 1-D sparse arrays (0 and 1 valued only) and returns the jaccard similarity measure of the two
 

Test your work as you go along.  You may use the [Euclidian](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html#scipy.spatial.distance.euclidean), [Minkowski](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html#scipy.spatial.distance.minkowski) and [Jaccard](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html#scipy.spatial.distance.jaccard) methods in the `scipy.spatial.distance` [package](https://docs.scipy.org/doc/scipy/reference/spatial.distance.html) to assist your development.  (Do not use these in the solution, though.)  I recommend creating some test cases for each and then seeing if your solution matches the solution from the `scipy.spatial.distance` methods.

In [1]:
import numpy as np
import scipy.spatial.distance as distance

## Euclidean Distance

In [2]:
def euclidean_dist(x,y):
    if len(x) == len(y):
        return sum((x[i] - y[i])**2 for i in range(0,len(x)))**0.5
    else:
        return "The two arrays need to be same size!"

In [3]:
a1 = np.random.rand((10000))
a2 = np.random.rand((10000))
print(f"The euclidean distance calculated using function created is {euclidean_dist(a1,a2)}")
print(f"The euclidean distance calculated using scipy function is {distance.euclidean(a1,a2)}")

The euclidean distance calculated using function created is 40.9920473345346
The euclidean distance calculated using scipy function is 40.99204733453462


## Minkowski Distance

In [4]:
def minkowski_dist(x,y,p):
    if len(x) == len(y):
        return (sum(np.abs(x[i]-y[i])**p for i in range(0,len(x))))**(1/p)
    else:
        return "The two arrays need to be same size!"    

In [5]:
a1 = np.random.rand((10000))
a2 = np.random.rand((10000))
print(f"The minkowski distance calculated using function created is {minkowski_dist(a1,a2,5)}")
print(f"The minkowski distance calculated using scipy function is {distance.minkowski(a1,a2,5)}")

The minkowski distance calculated using function created is 3.421279789150257
The minkowski distance calculated using scipy function is 3.4212797891502573


## Jaccard Similarity

In [6]:
def jaccard_sim(x,y):
    if len(x) == len(y) and all(k in (0,1) for k in x) and all(l in (0,1) for l in y):
        sim11 = sum((x[i] == 1 and y[i] == 1) for i in range(len(x)))
        sim00 = sum((x[i] == 0 and y[i] == 0) for i in range(len(x)))
        return 1-sim11/(len(x)-sim00)
    elif len(x) != len(y):
        return "The two arrays need to be same size!"
    else:
        return "Jaccard similarity require the arrays to be binary"

In [7]:
a1 = np.random.choice([0,1],size=(10000))
a2 = np.random.choice([0,1],size=(10000))
print(f"The jaccard similarity calculated using function created is {jaccard_sim(a1,a2)}")
print(f"The jaccard similarity calculated using scipy function is {distance.jaccard(a1,a2)}")

The jaccard similarity calculated using function created is 0.6672471533824514
The jaccard similarity calculated using scipy function is 0.6672471533824514
