<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 1. Representing Points
*in Machine Learning*

----
In this lesson, you will learn three different ways to define the distance between two points:
1. Euclidean Distance
2. Manhattan Distance
3. Hamming Distance

<br/>Before diving into the distance formulas, it is first important to consider how to represent points in your code.

<br/>In this exercise, we will use a list, where each item in the list represents a dimension of the point. For example, the point (5, 8) could be represented in Python like this:

In [1]:
pt1 = [5, 8]

Points aren’t limited to just two dimensions. For example, a five-dimensional point could be represented as `[4, 8, 15, 16, 23]`.

<br/>Ultimately, we want to find the distance between two points. We’ll be writing functions that look like this:

In [None]:
distance([1, 2, 3], [5, 8, 9])

*Exercise:*
<br/>Create a new point named four_d that has four dimensions.

In [2]:
two_d = [10, 2]
four_d = [1, 2, 5, 90]
five_d = [30, -1, 50, 0, 2]

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 2. Euclidean Distance
*in Machine Learning*

----
Euclidean Distance is the most commonly used distance formula. To find the Euclidean distance between two points, we first calculate the squared distance between each dimension. If we add up all of these squared differences and take the square root, we’ve computed the Euclidean distance.

<br/>Let’s take a look at the equation that represents what we just learned:
<center>$ \sqrt{(a_1-b_1)^2+(a_2-b_2)^2+...+(a_n-b_n)^2} $</center>

<br/>The image below shows a visual of Euclidean distance being calculated:
<img src="Images/Euclidean_Distance_Formula_1.png" style="width:500px">

<br/>*Exercise:*
1. Create a function named `euclidean_distance()` that takes two lists as parameters named `pt1` and `pt2`. In the function, create a variable named `distance`, set it equal to `0`, and return `distance`.

In [3]:
def euclidean_distance(pt1, pt2):
    distance = 0
    return distance

2. After defining distance, create a `for` loop to loop through the dimensions of each point. Add the squared difference between each dimension to `distance`. Remember, in Python, you can square the variable `num` by using `num ** 2`.

In [4]:
def euclidean_distance(pt1, pt2):
    distance = 0
    for a, b in zip(pt1, pt2): distance += (a-b)**2
    return distance

3. Outside of the `for` loop, take the square root of distance and return that value.

In [5]:
def euclidean_distance(pt1, pt2):
    distance = 0
    for a, b in zip(pt1, pt2): distance += (a-b)**2
    return distance**0.5

4. Print the Euclidean distance between `[1, 2]` and `[4, 0]`. Add another print statement which shows the Euclidean distance between `[5, 4, 3]` and `[1, 7, 9]`.

In [6]:
print(euclidean_distance([1, 2], [4, 0]))
print(euclidean_distance([5, 4, 3], [1, 7, 9]))

3.605551275463989
7.810249675906654


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 3. Manhattan Distance
*in Machine Learning*

----
Manhattan Distance is extremely similar to Euclidean distance. Rather than summing the squared difference between each dimension, we instead sum the absolute value of the difference between each dimension. It’s called Manhattan distance because it’s similar to how you might navigate when walking city blocks. If you’ve ever wondered “how many blocks will it take me to get from point A to point B”, you’ve computed the Manhattan distance.

<br/>The equation is shown below:
<center>$ |a_1-b_1|+|a_2-b_2|+...+|a_n-b_n| $</center>

<br/>Note that Manhattan distance will always be greater than or equal to Euclidean distance. Take a look at the image below visualizing Manhattan Distance:
<img src="Images/Manhattan_Distance_Formula_1.png" style="width:500px">

<br/>*Exercise:*
<br/>1. Create a function called `manhattan_distance()` that takes two lists named `pt1` and `pt2` as parameters. In the function, create a variable named `distance`, set it equal to `0`, and return it.

In [7]:
def manhattan_distance(pt1, pt2):
    distance = 0
    return distance

2. After defining distance, create a `for` loop to loop through the dimensions of each point. Add the absolute value of the difference between each dimension to `distance`. Remember, in Python, you can take the absolute value of `num` by using `abs(num)`.

In [8]:
def manhattan_distance(pt1, pt2):
    distance = 0
    for a, b, in zip(pt1, pt2): distance += abs(a-b)
    return distance

3. You’re done with `manhattan_distance()`! Go ahead and find the Manhattan distance between the same points as last time. Below the print statements for Euclidean distance, print the Manhattan distance between `[1, 2]` and `[4, 0]`. Also print the Manhattan distance between `[5, 4, 3]` and `[1, 7, 9]`.

In [9]:
print(manhattan_distance([1, 2], [4, 0]))
print(manhattan_distance([5, 4, 3], [1, 7, 9]))

5
13


<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 4. Hamming Distance
*in Machine Learning*

----
Hamming Distance is another slightly different variation on the distance formula. Instead of finding the difference of each dimension, Hamming distance only cares about whether the dimensions are exactly equal. When finding the Hamming distance between two points, add one for every dimension that has different values.

<br/>Hamming distance is used in spell checking algorithms. For example, the Hamming distance between the word “there” and the typo “thete” is one. Each letter is a dimension, and each dimension has the same value except for one.

<br/>*Exercise:*
1. Define a function named `hamming_distance()` and have two parameters named `pt1` and `pt2`. Create a variable named `distance`, have it start at `0`, and return it.

In [10]:
def hamming_distance(pt1, pt2):
    distance = 0
    return distance

2. After defining distance, create a `for` loop to loop through the dimensions of each point. If the values at each dimension are different, add `1` to `distance`.

In [11]:
def hamming_distance(pt1, pt2):
    distance = 0
    for a, b in zip(pt1, pt2):
        if a != b: distance += 1
    return distance

3. `hamming_distance()` is done as well! Print the Hamming distance between `[1, 2]` and `[1, 100]`. Print the Hamming distance between `[5, 4, 9]` and `[1, 7, 9]`.

In [12]:
print(hamming_distance([1, 2], [1, 100]))
print(hamming_distance([5, 4, 9], [1, 7, 9]))

1
2


*Hamming Distance: Uses*
- If we’re interested in only the number of differences for an application, then we would prefer to have the implementation presented earlier in this lesson, that is, `HammingDistance = #` of differences. 
- However, sometimes we want to think about the percentage that one data item differs from another. In this case, SciPy's implementation, `HammingDistance = (# of differences)/(# of dimensions)`, is more helpful. 
- However, it’s simple to switch between the two implementations by multiplying or dividing by the number of dimensions. So we don’t need to pick one implementation over the other. We simply need to be aware of which version is implemented in the software that we’re using.

<img src="Images/atom.png" alt="Atom" style="width:60px" align="left" vertical-align="middle">

## 5. SciPy Distances
*in Python*

----
Now that you’ve written these three distance formulas yourself, let’s look at how to use them using Python’s SciPy library:
- Euclidean Distance `.euclidean()`
- Manhattan Distance `.cityblock()`
- Hamming Distance `.hamming()`

<br/>There are a few noteworthy details to talk about:

<br/>First, the `scipy` implementation of Manhattan distance is called `cityblock()`. Remember, computing Manhattan distance is like asking how many blocks away you are from a point.

<br/>Second, the `scipy` implementation of Hamming distance will always return a number between `0` an `1`. Rather than summing the number of differences in dimensions, this implementation sums those differences and then divides by the total number of dimensions. For example, in your implementation, the Hamming distance between `[1, 2, 3]` and `[7, 2, -10]` would be 2. In `scipy`‘s version, it would be `2/3`.

<br/>*Exercise:*
<br/>1. Call `distance.euclidean()` using the points `[1, 2]` and `[4, 0]` as parameters. Print the result.

In [17]:
from scipy.spatial import distance
print(distance.euclidean([1, 2], [4, 0]))

3.605551275463989


2. Call `distance.cityblock()` using the points `[1, 2]` and `[4, 0]` as parameters. Print the result.

In [18]:
print(distance.cityblock([1, 2], [4, 0]))

5


3. Call `distance.hamming()` using `[5, 4, 9]` and `[1, 7, 9]` as parameters and print the results. Your answer *shouldn’t* match your function’s results. Remember, `scipy` divides by the number of dimensions.

In [19]:
print(distance.hamming([5, 4, 9], [1, 7, 9]))

0.6666666666666666
