Support Vector Machines - is a model that can do both classification and 
prediction. 

<img src="svm1.png", width=300, height=200>

References: https://commons.wikimedia.org/wiki/File:SVM_Example_of_Hyperplanes.png

Let's look at another example

<img src="svm2.png" width=300, height=200>

https://commons.wikimedia.org/wiki/File:SVM_margin.png

Support vectors are the points that lie close to the decision boundary. 

The dataset will comprise of $(x_i, y_i)$ where $y_i$ is either 1 or -1 
that indicates the class that $x_i$ belongs.

Our goal is to find a hyperplane that separates the two classes with maximum margin. This hyperplane can be represented by 
$\vec{w}\vec{x} - \vec{b} = 0.$

Using the training dataset, we compute $\vec{w}$ and $\vec{b}.$ 

Any point that lies on or above 

$\vec{w}\vec{x} - \vec{b} = 1$ will be classified as class 1 
and any point thay lies on or below

$\vec{w}\vec{x} - \vec{b} = -1$ will be classified as 
class 2.

The distance between the two hyperplanes is $\frac{2}{||\vec{w}||},$ we want to maximize this which is same as minimizing $||\vec{w}||.$ 

In Hard-margin, we are very particular about the margin. No data points can lie within the margin. So hard-margins are narrow.

In Soft-margin, data points can lie within the margin. Soft-margins are wide. 

The loss function for SVM is defined by
$max(0, 1-y_i(\vec{w}.\vec{x_i} - b))$ 



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv("bill_authentication.csv")

In [3]:
df.shape

(1372, 5)

In [4]:
df.head()

Unnamed: 0,Variance,Skewness,Curtosis,Entropy,Class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [5]:
X = df.drop('Class', axis=1)
y = df['Class']

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [7]:
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [8]:
y_pred = svclassifier.predict(X_test)

In [9]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[148   2]
 [  1 124]]
             precision    recall  f1-score   support

          0       0.99      0.99      0.99       150
          1       0.98      0.99      0.99       125

avg / total       0.99      0.99      0.99       275



### Kernel Trick


References: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

https://prateekvjoshi.com/2012/09/01/kernel-functions-for-machine-learning/

<img src="kernel1.png", width=400, height=300>


<img src="kernel3.jpg", width=400, height=300>

#### Different kernel functions

References: https://www.slideshare.net/okamoto-laboratory/families-of-triangular-norm-based-kernel-function-and-its-application-to-kernel-kmeans-conference

<img src="kernel2.png", width=400, height=300>

When to use which kernel?

Use linear SVM for linear problems and non-linear kernels such as RBF for non-linear data. 

Let us consider SVM with kernel trick

In [10]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
irisdata = pd.read_csv(url, names=colnames)

In [11]:
X = irisdata.drop('Class', axis=1)
y = irisdata['Class']

In [19]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [20]:
# polynomial kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='poly', degree=4)
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=4, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [21]:
y_pred = svclassifier.predict(X_test)

In [22]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[13  0  0]
 [ 0  8  1]
 [ 0  0  8]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        13
Iris-versicolor       1.00      0.89      0.94         9
 Iris-virginica       0.89      1.00      0.94         8

    avg / total       0.97      0.97      0.97        30



In [16]:
# Guassian kernel

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [17]:
y_pred = svclassifier.predict(X_test)

In [18]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[ 9  0  0]
 [ 0 14  0]
 [ 0  0  7]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         9
Iris-versicolor       1.00      1.00      1.00        14
 Iris-virginica       1.00      1.00      1.00         7

    avg / total       1.00      1.00      1.00        30

