Support Vector Machines - is a model that can do both classification and 
prediction. 

<img src="svm1.png", width=300, height=200>

References: https://commons.wikimedia.org/wiki/File:SVM_Example_of_Hyperplanes.png

Let's look at another example

<img src="svm2.png" width=300, height=200>

https://commons.wikimedia.org/wiki/File:SVM_margin.png

Support vectors are the points that lie close to the decision boundary. 

The dataset will comprise of $(x_i, y_i)$ where $y_i$ is either 1 or -1 
that indicates the class that $x_i$ belongs.

Our goal is to find a hyperplane that separates the two classes with maximum margin. This hyperplane can be represented by 
$\vec{w}\vec{x} - \vec{b} = 0.$

Using the training dataset, we compute $\vec{w}$ and $\vec{b}.$ 

Any point that lies on or above 

$\vec{w}\vec{x} - \vec{b} = 1$ will be classified as class 1 
and any point thay lies on or below

$\vec{w}\vec{x} - \vec{b} = -1$ will be classified as 
class 2.

The distance between the two hyperplanes is $\frac{2}{||\vec{w}||},$ we want to maximize this which is same as minimizing $||\vec{w}||.$ 

In Hard-margin, we are very particular about the margin. No data points can lie within the margin. So hard-margins are narrow.

In Soft-margin, data points can lie within the margin. Soft-margins are wide. 

The loss function for SVM is defined by
$max(0, 1-y_i(\vec{w}.\vec{x_i} - b))$ 



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv("bill_authentication.csv")

In [None]:
df.shape

In [None]:
df.head()

In [None]:
X = df.drop('Class', axis=1)
y = df['Class']

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [None]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

In [None]:
y_pred = svclassifier.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

### Kernel Trick


References: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

https://prateekvjoshi.com/2012/09/01/kernel-functions-for-machine-learning/

<img src="kernel1.png", width=400, height=300>


<img src="kernel3.jpg", width=400, height=300>

#### Different kernel functions

References: https://www.slideshare.net/okamoto-laboratory/families-of-triangular-norm-based-kernel-function-and-its-application-to-kernel-kmeans-conference

<img src="kernel2.png", width=400, height=300>

When to use which kernel?

Use linear SVM for linear problems and non-linear kernels such as RBF for non-linear data. 

Let us consider SVM with kernel trick

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)

In [None]:
X = irisdata.drop('Class', axis=1)
y = irisdata['Class']

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [None]:
# polynomial kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=4)
svclassifier.fit(X_train, y_train)

In [None]:
y_pred = svclassifier.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

In [None]:
# Guassian kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)

In [None]:
y_pred = svclassifier.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))