Support Vector Machines is very powerful, capable of performing linear or nonlinear classification, regressioin, and even outlier detection. SVMs are particularly well suited for classification of complex but small- or medium-sized datasets.

# Linear SVM Classification
You can think of an SVM classifier as fitting the widest possible street between the classes.This is called **large margin classification**.
![fig 5-1](images/5-1.png)
Notice that adding more training instances "off the street" will not affect the decision boundary at all:it is fully determined(or "supported") by the instances located on the edge of the street. These instances are called the support vectors.

SVMs are sensitive to the feature scales, as you can see in fig 5-2.
![fig 5-2](images/5-2.png)

# Soft Margin Classification
If we strictly impose that all instances be off the street and on the right side, this is called **hard margin classification**. There are 2 main issues with that.
- It only works if the data is linearly separable.
- It is quite sensitive to outliers.

![fig 5-3](images/5-3.png)

To avoid these issues it is preferable to use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations. This is **Soft Margin Classification**.

**In Scikit-Learn's SVM classes, you can control this balance using the C hyperparameter: a smaller C value leads to a wider street but more margin violations.** Fig 5-4 shows the decision boundaries and margins of two soft margin SVM classifiers on a nonlinearly separable dataset. On the left, using a high C value the classifier makes fewer margin violations but ends up with a smaller margin. On the right, using a low C value the margin is much larger, but many instances end up on the street. However, it seems likely that the second classifier will generalize better: in fact even on this training set it makes fewer prediction errors, since most of the margin violations are actually on the correct side of the decision boundary.
![fig 5-4](images/5-4.png)

**If your SVM model is overfitting, you can try regularizing it by reducing C**.

Below are the codes of the right of fig 5-4.

In [2]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris['data'][:,(2,3)] # petal length, petal width
y = (iris['target']==2).astype(np.float64) # iris-virginica

svm_clf = Pipeline((
            ('scaler',StandardScaler()),
            ('linear_svc',LinearSVC(C=1, loss='hinge')),
        ))

svm_clf.fit(X, y)

Pipeline(memory=None,
     steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('linear_svc', LinearSVC(C=1, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
     penalty='l2', random_state=None, tol=0.0001, verbose=0))])

In [3]:
svm_clf.predict([[5.5, 1.7]])

array([ 1.])

**Alternatively, we could use the SVC class, using `SVC(kernel='linear',C=1)`, but it's much slower, especially with large training sets, so it is not recommended. Another option is to use the `SGDClassifier` class, with `SGDClassifier(loss='hinge', alpha=1/(m*C))`. This applies regular SGD to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but is can be useful to handle huge datasets taht do not fit in memory, or to handle online classification tasks.**

**The LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its means. This is automatic if you scale the data using the `StandardScaler`. Moreover, make sure you set the `loss` hyperparameter to `'hinge'`, as it is not the default value. Finally for better performance you should set the `dual` hyperparameter to `False`, unless there are more features than training instances.**

# Nonlinear SVM Classification