<a href="https://colab.research.google.com/github/zachfreitas/Distances/blob/main/Distances.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Distance Calculations

In [None]:
# libraries


## Euclidean Distance (L2 Norm)

We start with the most common distance measure, namely Euclidean distance. It is a distance measure that best can be explained as the length of a segment connecting two points.

The formula is rather straightforward as the distance is calculated from the cartesian coordinates of the points using the Pythagorean theorem.

$$D(x,y) = \sqrt{ \sum_{i=1}^{n}(x_i-y_i)^2}$$
<br>
<br>
Example:
$$d = \sqrt{ (x_2-x_1)^2 + (y_2-y_1)^2}$$
<br>
$$d = \textrm{distance}$$
$$(x_2-x_1) = \textrm{coordinates of the first point}$$
$$(y_2-y_1) = \textrm{coordinates of the second point}$$

## Manhattan Distance

The Manhattan distance, often called Taxicab distance or City Block distance, calculates the distance between real-valued vectors. Imagine vectors that describe objects on a uniform grid such as a chessboard. Manhattan distance then refers to the distance between two vectors if they could only move right angles. There is no diagonal movement involved in calculating the distance.

Disadvantages
Although Manhattan distance seems to work okay for high-dimensional data, it is a measure that is somewhat less intuitive than euclidean distance, especially when using in high-dimensional data.

Moreover, it is more likely to give a higher distance value than euclidean distance since it does not the shortest path possible. This does not necessarily give issues but is something you should take into account.

Use Cases
When your dataset has discrete and/or binary attributes, Manhattan seems to work quite well since it takes into account the paths that realistically could be taken within values of those attributes. Take Euclidean distance, for example, would create a straight line between two vectors when in reality this might not actually be possible.

$$D(x,y) = \sqrt{ \sum_{i=1}^{k}|x_i-y_i|^2}$$

Example:
$$d = {|x_2-x_1| + |y_2-y_1|}$$
$$d = \textrm{distance}$$
$$|x_2-x_1| = \textrm{coordinates of the first point}$$
$$|y_2-y_1| = \textrm{coordinates of the second point}$$

## Minkowski Distance
Minkowski distance is a bit more intricate measure than most. It is a metric used in Normed vector space (n-dimensional real space), which means that it can be used in a space where distances can be represented as a vector that has a length.

This measure has three requirements:

Zero Vector — The zero vector has a length of zero whereas every other vector has a positive length. For example, if we travel from one place to another, then that distance is always positive. However, if we travel from one place to itself, then that distance is zero.
Scalar Factor — When you multiple the vector with a positive number its length is changed whilst keeping its direction. For example, if we go a certain distance in one direction and add the same distance, the direction does not change.
Triangle Inequality — The shortest distance between two points is a straight line.

The formula for the Minkowski distance is shown below:

$$D(x,y) = \left({\sum_{i=1}^{k}|x_i-y_i|^{p}}\right)^{\dfrac{1}{p}}$$

Most interestingly about this distance measure is the use of parameter $p$. We can use this parameter to manipulate the distance metrics to closely resemble others.

Common values of $p$ are:

$p$=1 — Manhattan distance <br>
$p$=2 — Euclidean distance <br>
$p$=$∞$ — Chebyshev distance <br>
**Disadvantages**<br>
Minkowski has the same disadvantages as the distance measures they represent, so a good understanding of metrics like Manhattan, Euclidean, and Chebyshev distance is extremely important.

Moreover, the parameter p can actually be troublesome to work with as finding the right value can be quite computationally inefficient depending on your use-case.

**Use Cases**<br>
The upside to p is the possibility to iterate over it and find the distance measure that works best for your use case. It allows you a huge amount of flexibility over your distance metric, which can be a huge benefit if you are closely familiar with p and many distance measures.

## Cosine Similarity
Cosine similarity has often been used as a way to counteract Euclidean distance’s problem with high dimensionality. The cosine similarity is simply the cosine of the angle between two vectors. It also has the same inner product of the vectors if they were normalized to both have length one.

Two vectors with exactly the same orientation have a cosine similarity of 1, whereas two vectors diametrically opposed to each other have a similarity of -1. Note that their magnitude is not of importance as this is a measure of orientation.

$$D(x,y) = \cos(\theta) = \dfrac{x * y}{\|x\| \|y\|}$$

**Disadvantages**

One main disadvantage of cosine similarity is that the magnitude of vectors is not taken into account, merely their direction. In practice, this means that the differences in values are not fully taken into account. If you take a recommender system, for example, then the cosine similarity does not take into account the difference in rating scale between different users.

**Use Cases**

We use cosine similarity often when we have high-dimensional data and when the magnitude of the vectors is not of importance. For text analyses, this measure is quite frequently used when the data is represented by word counts. For example, when a word occurs more frequently in one document over another this does not necessarily mean that one document is more related to that word. It could be the case that documents have uneven lengths and the magnitude of the count is of less importance. Then, we can best be using cosine similarity which disregards magnitude.