<a href="https://colab.research.google.com/github/yasntrk/metric_distances/blob/main/Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center> **Distance Metrics and Similarity** </center>
<hr><center>
Yasin TÜRK</center>
<hr>
In mathematics, a metric or distance function is a function that defines a distance between each pair of point elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set, but not all topologies can be generated by a metric. A topological space whose topology can be described by a metric is called metrizable.
<br>
<p><center><img src="https://i0.wp.com/dataaspirant.com/wp-content/uploads/2015/04/cover_post_final.png" width=800></center>

Ref: <a href="https://bigdata-madesimple.com/implementing-the-five-most-popular-similarity-measures-in-python/">Big Data</a><br>


# <center> **Euclidean Distance** </center>

<center>In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore occasionally being called the Pythagorean distance.</center>


<p><center><img src="https://bigdata-madesimple.com/wp-content/uploads/2015/06/Five-most-popular-similarity-measures-implementation-in-python-1.png" width=600></center>



In [None]:
from math import*
#Euclidean Distance 
uV = [2 , 2]
uX = [2 , 4]
zipped = zip(uV,uX)
Euclidean_distance = 0
for x,y in zipped:
  Euclidean_distance += sqrt(pow(x-y,2))
print(Euclidean_distance)


2.0


# <center> **Manhattan Distance** </center>

<center>The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|.</center>

<p><center><img src="https://bigdata-madesimple.com/wp-content/uploads/2015/06/Five-most-popular-similarity-measures-implementation-in-python-2.png" width=600></center>

In [None]:
from math import*
#Manhattan Distance
P1 = [2 , 2]
P2 = [2 , 4]
zipped = zip(P1,P2)
manhattan_sum = 0
for x,y in zipped:
  manhattan_sum += abs(x-y)
print(manhattan_sum)

2


# <center> **Minkowski Distance** </center>

<center>The Minkowski distance or Minkowski metric is a metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance. It is named after the German mathematician Hermann Minkowski.</center>

<p><center><img src="https://bigdata-madesimple.com/wp-content/uploads/2015/06/Five-most-popular-similarity-measures-implementation-in-python-3.png" width=600></center>

Ref: <a href="https://github.com/jaimezorno/Distance-Metrics/blob/main/Distance%20Metrics.ipynb">Jaime Zornoza</a><br>

In [None]:
# import math library 
import math
import numpy as np
from decimal import Decimal

In [None]:
# Calculate the p root of a certain numeric value  
def p_root(value, root): 
  root_value = 1 / float(root) 
  return round (Decimal(value) **
                Decimal(root_value), 3)

In [None]:
squared_root_3 = p_root(3,2)
print(squared_root_3)
squared_root_4 = p_root(4,2)
print(squared_root_4)

1.732
2.000


In [None]:
# Function implementing the Minkowski distance
def minkowski_distance(x, y, p):    
  # pass the p_root function to calculate 
    # all the value of vector parallely  
  return (p_root(sum(pow(abs(a-b), p) for a, b in zip(x, y)), p))

In [None]:
# Our observations
a = np.array((1.1, 2.2, 3.3))
b = np.array((4.4, 5.5, 6.6))

In [None]:
from scipy.spatial import distance
# Calculate Manhattan Distance
p = 1
print("Manhattan Distance (p = 1)")
print(distance.minkowski(a, b, p))
# Calculate Euclidean Distance
p = 2
print("Euclidean Distance (p = 2)")
print(distance.minkowski(a, b, p))
# Calculate intermediate norm distance
p = 1.5
print("Intermidiate norm Distance (p = 1.5)")
print(distance.minkowski(a, b, p))

Manhattan Distance (p = 1)
9.899999999999999
Euclidean Distance (p = 2)
5.715767664977295
Intermidiate norm Distance (p = 1.5)
6.864276616071283


# <center> **Cosine Similarty** </center>

<center>Cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we will effectively try to find the cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.</center>

<p><center><img src="https://bigdata-madesimple.com/wp-content/uploads/2015/06/cosine-similarity.png" width=600></center>

In [None]:
from math import*

def square_rooting(x):
    return round(sqrt(sum([a*a for a in x])),3)

def cosine_similarity(x,y):
  numerator = sum(a*b for a,b in zip(x,y))
  denominator = square_rooting(x)*square_rooting(y)
  return round(numerator/float(denominator),3)

print(cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15]))

TypeError: ignored

# <center> **Hamming Distance** </center>

<center>Hamming distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different. The Hamming distance between two strings, a and b is denoted as d(a,b).</center>

<p><center><img src="https://rahimtech.com/wp-content/uploads/2021/03/Hamming-Distance.png" width=600></center>

In [None]:
a = np.array((1, 0, 1, 0, 0, 1, 0))
b = np.array((0, 0, 1, 1,0 , 0, 1))
hamming_distance = np.bitwise_xor(a,b)
print(hamming_distance)
print(hamming_distance.sum())

[1 0 0 1 0 1 1]
4
