SVM (Support Vector Machines) is a class of supervised models all leveraging the idea of minimizing some cost function by adjusting two vectors against some hyperplane and introduces an algorithm to quantify a level of separation between classes. SVM includes:
* SVC (Classifier)
* SVR (Regressor)
* [Ranking SVM](https://en.wikipedia.org/wiki/Ranking_SVM). 

All of them work **almost the same**, despite some slight differences in the loss. 

Before diving into the logic behind SVM it's important to introduce prerequisite math concepts:

#### Norm of a vector [$||X||$]
> Norm is the correct name for the **length of a vector**. To calculate it, the Euclidean norm is used: $||X||=\sqrt{\sum_i^N x_i^2}$

#### Unit vector [$\hat{X}$]
> Unit vector describes the relative **direction of a vector**. In the SVM problem the unit vector, that describes the direction, will be denoted as **W**. To calculate it we divide each feature by vector's norm: $\hat{X}=\frac{X}{||X||}$. **Norm** of $\hat{X}$ is always **equal to $1$**. 
>
> We can also calculate $\bar{X}$ using **$cos$**. If we have a vector of only two features $(x_1, x_2)$, then $cos(\beta)=\frac{x_1}{||x||}$ and $cos(\alpha)=\frac{x_2}{||x||}$. Therefore the unit vector will now be equal $\hat{X}=(cos(\beta), cos(\alpha))$ as seen in the picture:
> <p align="center"> <img src="./media/unit_vector_as_cos.png" alt="dot product"/> </p>

#### Dot product [$\cdot$]
> Dot product is a **scalar** (any real number $x\in\mathbb{R}$, which is named that way in order to just stress that it is just a number and not a vector or a matrix), defined by the following formula: $x\cdot y = ||x||\times||y||\times cos(\theta)$. It describes the **relation between vectors** as with the less angle between two ($\theta\rightarrow 0$) and less difference in directions, the higher is the dot product. It is also the only way vectors are **multiplied**:
> <p align="center"> <img src="./media/dot_product.png" alt="dot product"/> </p>
>
> The **algebraic formula** for the dot product, that is **used more commonly**, is: $x\cdot y = \sum_{i=0}^N (x_iy_i)$. Here's how it is derived:
> * From the definition of the unit vector we remember that each scalar can be represented as the measure of the angle. Therefore we can derive $\theta$ by subtracting angles (coordinates of unit vectors) from both sides, therefore $\theta=\alpha-\beta$ 
> * $$
    cos(\theta)=cos(\alpha-\beta)=cos(\alpha)cos(\beta)+sin(\alpha)sin(\beta)= \\
    =\frac{x_1}{||x||}\frac{y_1}{||y||} + \frac{x_2}{||x||}\frac{y_2}{||y||} = \\
    = \frac{x_1y_1+x_2y_2}{||x||\times||y||}
>    $$
> * Therefore, we derive the **algebraic formula**: $x\cdot y=\frac{x_1y_1+x_2y_2}{||x||\times||y||}\times ||x||\times||y|| = x_1y_1+x_2y_2$ 


#### Hyperplane
> Hyperplane is a surface in some $N$-dimensional space. The **equation for the "hyperplane"** in the SVM problem is defined as: $W\cdot X + b = 0$. Here's how we get it:
> * For the simplier 2D case it will be defined by the function $y=ax+b$
> * For more clarity, we will denote $x$ as $x_1$ and $y$ as $x_2$, so that we'll get: $ax_1-x_2+b=0$. 
> * In the SVM problem we define $X=(x_1, x_2)$ and $W=(a,-1)$, to represent the previous equation in a matrix form, therfore getting the above-defined equation of the hyperplane

#### Classifier
> In the SVM problem we use **hyperplane** to make predictions. The hypothesis function $h$, which is practically a classifier, is defined as: 
> $$
    h\left(x_i\right)= \begin{cases}+1 & \text { if } w \cdot x+b \geq 0 \\ -1 & \text { if } w \cdot x+b<0\end{cases}
> $$
> It says that every point **above or on** the hyperplane is defined as $1$ and those, that are **below** the hyperplane, as $-1$. This is exactly the **idea behind SVM** models: to find a hyperplane which could separate the data accurately.