# Support Vector Machine

What's a good decision boundary for classification?

There is an infinite number of potential decision boundaries that separate the classes ("hyperplanes").

## Maximum Margin Classifier


<div>
<img src="files/SVM_1.webp" width="55%" source='https://medium.com/@senihberkay/svm-support-vector-machines-77c0e11f75f2' align='center'/>
</div>

- The hyperplane (lines) that generalizes best to unseen data is the one that is furthest from all the points (maximizes the margin).
- The points on the margin boundary are called support vectors.
- Finding them is a convex optimization problem (one single best solution).
- This one is a **Maximum Margin Classifier** algorithm, it tries to find the largest margin.
- SVM is very light to store in comparison to KNN.
- We need features (X1, X2, X3 etc.) to be scaled.
- A "support vector" is a datapoint which is used to draw the margin line.

## Soft Margin Classifier

A SVM can very easily overfit and is very sensitive to outliers, that's why we use the **"Soft Margin Classifier"**. The Maximum Margin Classifier is almost never used.

With the Soft Margin Classifier, you can tolerate a few points outside of the margin.

<div>
<img src="files/SVM_outliers.jpg" width="55%" source='https://roboticsbiz.com/pros-and-cons-of-support-vector-machine-svm/' align='center'/>
</div>

### Hinge Loss

The **Hinge Loss** is the penalty applied to each point on the wrong side.

- The deeper a point lies within the margin, the higher the loss.
- The penalty is linear, like MAE.

<div>
<img src="files/hinge_loss.png" width="45%" source='https://iq.opengenus.org/hinge-loss-for-svm/' align='center'/>
<br>

When you're on the "good side" of the margin, the penalty is 0. Otherwise, when you cross the margin of your class, the penalty starts to increase.

- How strong should the penalty be for wrongly classified datapoints?
- How steep should the hinge loss be?
- How narrow should the margin be?

Once again, it's a tradeoff between classifying training data well and generalizing to new data.

## Regularization with hyperparameter C

Strength of the penalty applied on points located on the wrong side of the margin.

- The higher $C$, the stricter the margin. A "Maximum Margin Classifier" has $C = +\infty$.
- The smaller $C$, the softer the margin, the more it is regularized. $C$ is similar to $\frac{1}{\alpha}$ in Ridge.

<div>
<img src="files/SVM_C.webp" width="75%" source='https://stackabuse.com/understanding-svm-hyperparameters/' align='center'/>
<br>


In [None]:
from sklearn.svm import SVC
svc = SVC(kernel='linear', C=10)

# equivalent but with SGD solver
from sklearn.linear_model import SGDClassifier
svc_bis = SGDClassifier(loss='hinge', penalty='l2', alpha=1/10)