# Support Vector Machines
This algorithm intends to place a line between the groups of your data, while maintaining the largest distance between the two groups, thus splitting the groups down the middle.  

An SVM works by trying to maximise the distance between the separating line and the points of the data. We create a set of parallel lines that go alongside our central line, known as the margin, and the algorithm is designed to maximise the distance between the line and the margin. 

It all revolves around minimising the error function, but with this time we have an additional error that is associated with our margins and not just the accuracy of the model and the classifications that it performs. That way we incorporate the margin and thus the placement of the line into the creation of the final model; we don't want points within our margin, and we also would like the margin to be as wide as possible.  

Normally you would penalise those misclassified points from the main line, but with SVM you punish those that are near the line as well, marking them as misclassified, so that they have an impact on moving the line through the gradient descent iterations.  

Remember that we are adding an error metric that is associated with the size of the margin; we want to create a margin with a large margin, so we inversely punish that by giving a large error value to models that have small margins. This will stop the model generating a non-existent margin to avoid additional classification errors. 

## Polynomial method

Further to this, we add a constant C that is a weight on the classification error that allows us to dictate which is more important to us, classification error or margin error; this will depend on the scenario and whether we can afford to get things wrong when we are looking at our problem.  

> Large C means that we classify points very well but have a small margin 

When the points cannot be separated by a straight line in 2 dimensions, we can think about expanding that to 5 dimensions or higher so that we have more combinations of polynomial functions to work with in order to find the best classifying solution: brings in `x^2` and `xy` and `y^2` which allows for the drawing of hyperboles and circles to try and find a solution that can be applied at the original level of dimensions. This is known as the _kernel trick_. 

> Using a higher degree polynomial we add more dimensions to the data, find a higher dimensional surface that separates the points, predict it down and we get our curves in the original dimension. 

The degree of polynomial is a hyperparameter that we can train to find the best possible model. 

## RBF algorithm 
This is similar to the kernel (polynomial) trick, whereby you push into higher dimensions and then find a function in that higher degree polynomial that can separate the points. This is done through the generation of 'mountain ranges' using distributions around the different classified points; pushing one class down and the other up produces a higher degree plane with peaks and valleys that can be separated by a cutting line.  

Gamma is the main hyperparameter here and decides the steepness of the 'mountains' as it dictates the spread of the distribution curves over the points. A large gamma value means that the spread is small and the model tends to overfit the data by drawing a ring around each point (the mountains are steep and include sometimes only a single point), while a lower value for gamma may underfit the data and mean that there are some miss classifications where a valley for one class doesn't lower it enough if it is near a cluster of points from the other class.  

This is probably preferred generally, and is particularly efficient when points from one class are surrounded by another for whatever reason. 

# `sklearn` implementation
## Prepare our data for the model 

In [2]:
# Import the SVC class from the svm library
from sklearn.svm import SVC

# Generic use example
# model = SVC()
# model.fit(x_values, y_values)

For the sake of using some actual data, we can pull in one of the toy datasets that is included with the `sklearn` library; let's have a look at the breast cancer wisconsin dataset. 

In [3]:
import numpy as np
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

In [4]:
X = np.array(data.data)
y = np.array(data.target)

In [5]:
len(X) == len(y)

True

Alright, now that we have loaded that in, and have made sure that we have the correct number of elements in both our variables and outcome arrays, we can continue to making our support vector machine model. 

In [6]:
# Import the accuracy metric to assess model performance
from sklearn.metrics import accuracy_score

# Import the train test split element to test out of sample performance
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1000)

In [8]:
len(X_train)

455

In [9]:
len(X_test)

114

Things that are good about using the `sklearn.model_selection` are that the `random_state` can be set to increase reproducability, and that you can use the `train_test_split` to make sure that your classes are evenly spread between test and train sets.  

If you were to just take the first 80% for training and the last 20% of testing then you couldn't be sure that there wasn't an ordering bias present in the data. 

In [10]:
np.unique(y_test, return_counts=True)

(array([0, 1]), array([44, 70]))

In [11]:
np.unique(y_train, return_counts=True)

(array([0, 1]), array([168, 287]))

## Training our model
Let's train the main types of models that we have talked about previously, and then compare their predictions using the default hyperparameters for each kernel type. 

In [None]:
# Specify the polynomial model
poly_model = SVC(kernel = 'poly').fit(X_train, y_train)

In [None]:
# Default selection
rbf_model = SVC(kernel = 'rbf').fit(X_train, y_train)

In [None]:
poly_pred = poly_model.predict(X_test)
print("Polynomial kernel accuracy: {}".format(accuracy_score(y_test, poly_pred)))

In [None]:
rbf_pred = rbf_model.predict(X_test)
print("RBF kernel accuracy".format(accuracy_score(y_test, rbf_pred)))