# Clssification using SVM

# Introduction

Suppose you are given plot of two label classes on graph as shown in image (A). Can you decide a separating line for the classes?

[![image](https://www.linkpicture.com/q/SVM1.png)](https://www.linkpicture.com/view.php?img=LPic62fa32c51e305162112667)

You might have come up with something similar to following image (image B). It fairly separates the two classes. Any point that is left of line falls into black circle class and on right falls into blue square class. Separation of classes. That’s what SVM does. It finds out a line/ hyper-plane (in multidimensional space that separate outs classes).

[![image](https://www.linkpicture.com/q/SVM2.png)](https://www.linkpicture.com/view.php?img=LPic62fa32c51e305162112667)

Making it a Bit complex…
-------------------------
So far so good. Now consider what if we had data as shown in image below? Clearly, there is no line that can separate the two classes in this x-y plane. So what do we do? We apply transformation and add one more dimension as we call it z-axis. Lets assume value of points on z plane, w = x² + y². In this case we can manipulate it as distance of point from z-origin. 

Now if we plot in z-axis, a clear separation is visible 
and a line can be drawn .

[![image](https://www.linkpicture.com/q/SVM3.png)](https://www.linkpicture.com/view.php?img=LPic62fa32c51e305162112667)


[![image](https://www.linkpicture.com/q/SVM4.png)](https://www.linkpicture.com/view.php?img=LPic62fa32c51e305162112667)

When we transform back this line to original plane, it maps to circular boundary as shown in image E. These transformations are called kernels.

[![image](https://www.linkpicture.com/q/SVM5.png)](https://www.linkpicture.com/view.php?img=LPic62fa32c51e305162112667)

Making it a little more complex…
-----------------------------------------------

What if data plot overlaps? 
Or, 
what in case some of the black points are inside the blue ones? 
Which line among 1 or 2 ? should we draw ?

[![image](https://www.linkpicture.com/q/SVM6.png)](https://www.linkpicture.com/view.php?img=LPic62fa348765b821502125874)

This is called regularization parameter. 

* In next section, we define two terms regularization parameter and gamma. 

** These are tuning parameters in SVM classifier. Varying those we can achive considerable non linear classification line with more accuracy in reasonable amount of time.

Tuning parameters: Regularization, Gamma and Margin.
--------------------------------------------------------------------------------

Regularization :
---------------------
The Regularization parameter (often termed as C parameter in python’s sklearn library) tells the SVM optimization how much you want to avoid misclassifying each training example.

For large values of C(right diagram), the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C(left diagram) will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points.

The images below are example of two different regularization parameter. 
**Left one has some misclassification due to lower regularization value. 
**Higher value leads to results like right one.

GOOGLE SAYS "LIFE AND HAPPINESS" IS THE MOST CRITICAL DATA. IF ANY THING OTHER THAN THAT COMES, WE CAN GO FOR LEFT DIAGRAM, ELSE WE SHOULD CUMPOLSORILY GO FOR RIGHT DIAGRAM



[![image](https://www.linkpicture.com/q/SVM7.png)](https://www.linkpicture.com/view.php?img=LPic62fa364fe1fa1389015983)

Gamma
-----------
The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. In other words, with low gamma, points far away from plausible seperation line are considered in calculation for the seperation line. Where as high gamma means the points close to plausible line are considered in calculation.

IF YOU HAVE TOO MUCH CONCENTRATION IN THE DATA, YOU USE HIGH GAMMA. EG: AVG MARKS OF 100 ARE MORE SATURATED IN REGION 50-75.

IF YOU HAVE WELL DISTRIBUTED DATA(GAUSSIAN DISTRIBUTION), YOU USE LOW GAMMA. EG: AVG MARKS OF 100 ARE WELL DISTRIBUTED FROM 0 TO 100

[![image](https://www.linkpicture.com/q/SVM8.png)](https://www.linkpicture.com/view.php?img=LPic62fa364fe1fa1389015983)

Margin
----------

A margin is a separation of line to the closest class points.

A good margin is one where this separation is larger for both the classes. Images below gives to visual example of good and bad margin. A good margin allows the points to be in their respective classes without crossing to other class.

INCASE OF BAD MARGIN, THE POINT WHICH IS VERY CLOSE TO THE MARGIN WILL GET UP MISCLASSIFIED WHEN THE NEW DATA COMES IN.

[![image](https://www.linkpicture.com/q/SVM9.png)](https://www.linkpicture.com/view.php?img=LPic62fa379877673194483145)

# SVM on Linear Classification Problem

In [10]:
#Doing the minimum necessary imports

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
%matplotlib inline

# reading data from CSV file. 
# reading bank currency note data into pandas dataframe.
bankdata = pd.read_csv("https://raw.githubusercontent.com/venkatshan707/Data_Science_with_Python/main/Support%20Vector%20Machine/bill_authentication.csv")  

# Exploratory Data Analysis
print(bankdata.shape)  
print("------------")
print(bankdata.head()) 
#0=Fake Note, 1=Genuine Note

(1372, 5)
------------
   Variance  Skewness  Curtosis  Entropy  Class
0   3.62160    8.6661   -2.8073 -0.44699      0
1   4.54590    8.1674   -2.4586 -1.46210      0
2   3.86600   -2.6383    1.9242  0.10645      0
3   3.45660    9.5228   -4.0112 -3.59440      0
4   0.32924   -4.4552    4.5718 -0.98880      0


In [2]:
bankdata.Class.value_counts ()

0    762
1    610
Name: Class, dtype: int64

In [11]:
# Data Preprocessing


# Dividing data as Independat and Target Variables
X = bankdata.drop('Class', axis=1)  
y = bankdata['Class']  

# Dividing data into training and test sets
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)


# https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html# Training the Algorithm. Here we would use simple SVM , i.e linear SVM
from sklearn.svm import SVC

# classifying linear data
svclassifier = SVC(kernel= 'linear') # classify linear data
#svclassifier = SVC() #bydefault, its rbf
# kernel can take many values like
# Gaussian, polynomial, sigmoid, or computable kernel

# fit the model over data
svclassifier.fit(X_train,y_train)


# Making Predictions
y_pred = svclassifier.predict(X_test)


# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix

print("Confusion Matrix: \n\n", confusion_matrix(y_test,y_pred), "")
print("Classification Report: ",classification_report(y_test,y_pred))


# Remember : for evaluating classification-based ML algo use  
# confusion_matrix, classification_report and accuracy_score.
# And for evaluating regression-based ML Algo use Mean Squared Error(MSE), ...

Confusion Matrix: 

 [[149   1]
 [  3 122]] 
Classification Report:                precision    recall  f1-score   support

           0       0.98      0.99      0.99       150
           1       0.99      0.98      0.98       125

    accuracy                           0.99       275
   macro avg       0.99      0.98      0.99       275
weighted avg       0.99      0.99      0.99       275



Note : to understand Precision, recall, f1-score, support; see this post
https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

For example : In the above o/p -> (refer confusion matrix)
166/167 bank entries were correctly predicted false.
also, 108/108 bank entries were correctly predicted true.

The total no. of observations are also indicated as support. 
see support values -> for 0(i.e false) it is 167 and for 1(i.e true) it is 108 

further, Precision talks about how precise/accurate your model is ?
Precision tells us, out of those predicted positive, how many of them are actually positive. Our SVM model's precision is 1.00 i.e 100% in predicting the actual Negatives and 99% in predicting the actual positives. 

 # Applying SVM over non-linear data
 
In case of non-linearly separable data, the simple SVM algorithm cannot be used. Rather, a modified version of SVM, called Kernel SVM, is used.

Basically, the kernel SVM projects the non-linearly separable data in lower dimensions to linearly separable data in higher dimensions in such a way that data points belonging to different classes are allocated to different dimensions. Again, there is complex mathematics involved in this, but you do not have to worry about it in order to use SVM. Rather we can simply use Python's Scikit-Learn library to implement and use the kernel SVM.

Implementing Kernel SVM with Scikit-Learn is similar to the simple SVM. In this section, we will use the famous iris dataset to predict the category to which a plant belongs based on four attributes: sepal-width, sepal-length, petal-width and petal-length.

We will try all three possible kernels; namely polynomial, Gaussian, and sigmoid kernels. 

In [5]:
import seaborn as sns
from sklearn import svm, datasets
# import some data to play with
irisdata = sns.load_dataset('iris')
irisdata.head()  # have a look at the attributres(=> X) and Labels(=> y)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [12]:
# Preprocessing data
X = irisdata.drop('species', axis=1)  
y = irisdata['species']

# Train Test Split
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)  

# Training the Algorithm
To train the kernel SVM, we use the same SVC class of the Scikit-Learn's svm library.

We will implement polynomial, Gaussian, and sigmoid kernels to see which one works better for our problem.

# 1. Polynomial Kernel
In the case of polynomial kernel, you also have to pass a value for the degree parameter of the SVC class. This basically is the degree of the polynomial. Take a look at how we can use a polynomial kernel to implement kernel SVM:

In [22]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='poly', degree=8, gamma='auto')  # more higher degree, 
# more time the system will take. u cn choose any no as degree
# gamma is optional. But it gives a FutureWarning. To avoid it , specify
# gamma as 'auto' or 'scale'

svclassifier.fit(X_train, y_train)

# Making Predictions
# Now once we have trained the algorithm, 
# the next step is to make predictions on the test data.
y_pred = svclassifier.predict(X_test)  


# Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))

# Note : Note the misclassification in 'virginica' species

[[10  0  0]
 [ 0  8  0]
 [ 0  1 11]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       0.89      1.00      0.94         8
   virginica       1.00      0.92      0.96        12

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



# 2. Gaussian Kernel

To use Gaussian kernel, you have to specify 'rbf' as value for the Kernel parameter of the SVC class.

In [20]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='rbf', gamma='auto')  
svclassifier.fit(X_train, y_train) 

# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))  

# Note : Note the best performance thats 100% precise

[[10  0  0]
 [ 0  8  0]
 [ 0  1 11]]
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       0.89      1.00      0.94         8
   virginica       1.00      0.92      0.96        12

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



# 3. Sigmoid Kernel
Finally, let's use a sigmoid kernel for implementing Kernel SVM. 
To use the sigmoid kernel, you have to specify 'sigmoid' as value for the kernel parameter of the SVC class.Take a look at the following script:  

In [9]:
from sklearn.svm import SVC  
svclassifier = SVC(kernel='sigmoid', gamma='auto')  
svclassifier.fit(X_train, y_train)

# Prediction and Evaluation
y_pred = svclassifier.predict(X_test)  

from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))

# Note : Note the very poor perfomance from Sigmoid kernel

[[ 0  0 13]
 [ 0  0  9]
 [ 0  0  8]]
              precision    recall  f1-score   support

      setosa       0.00      0.00      0.00        13
  versicolor       0.00      0.00      0.00         9
   virginica       0.27      1.00      0.42         8

    accuracy                           0.27        30
   macro avg       0.09      0.33      0.14        30
weighted avg       0.07      0.27      0.11        30



  _warn_prf(average, modifier, msg_start, len(result))


# Comparison of Kernel Performance

If we compare the performance of the different types of kernels we can clearly see that the sigmoid kernel performs the worst. This is due to the reason that sigmoid function returns two values, 0 and 1, therefore it is more suitable for binary classification problems. However, in our case we had three output classes.

Amongst the Gaussian kernel and polynomial kernel, we can see that Gaussian kernel achieved a perfect 100% prediction rate while polynomial kernel misclassified three instances. Therefore the Gaussian kernel performed slightly better. However, there is no hard and fast rule as to which kernel performs best in every scenario. It is all about testing all the kernels and selecting the one with the best results on your test dataset.