## Lecture 20: Support Vector Machines

In this notebook, we are again going to use the Dogs/Cats dataset to learn how to use the `sklearn.svm.SVC` function to classify data.

In [6]:
import numpy as np
import os

import matplotlib.pyplot as plt
from matplotlib import rc

plt.rcParams['xtick.labelsize']=16      # change the tick label size for x axis
plt.rcParams['ytick.labelsize']=16      # change the tick label size for x axis
plt.rcParams['axes.linewidth']=1        # change the line width of the axis
plt.rcParams['xtick.major.width'] = 3   # change the tick line width of x axis
plt.rcParams['ytick.major.width'] = 3   # change the tick line width of y axis
rc('text', usetex=False)                # disable LaTeX rendering in plots
rc('font',**{'family':'DejaVu Sans'})   # set the font of the plot to be DejaVu Sans

In [3]:
from scipy import io
from sklearn import svm
from sklearn.model_selection import cross_val_score

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 1. Prepare dataset for training

In [27]:
path = "/content/drive/MyDrive/ME491"
dog_path = os.path.join(path, "data/dogData_w.mat")
cat_path = os.path.join(path, "data/catData_w.mat")
dogdata_mat = io.loadmat(dog_path)
catdata_mat = io.loadmat(cat_path)
dog = dogdata_mat['dog_wave']
cat = catdata_mat['cat_wave']

DC = np.concatenate((dog, cat), axis = 1)

# PCA
avgAnimal = np.mean(DC, axis = 1)
X = DC - np.tile(avgAnimal, (DC.shape[1],1)).T
U, S, VT = np.linalg.svd(X, full_matrices = False)

V = VT.T

features = np.arange(1,21)
xtrain = np.concatenate((V[:60,features], V[80:140,features]))
test = np.concatenate((V[60:80,features], V[140:160,features]))

label = np.repeat(np.array([1,-1]), 60)
truth = np.repeat(np.array([1,-1]), 20)

### 2. Perform SVM
`sklearn.svm.SVC` documentation: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
`sklearn.model_selection.cross_val_score` documentation: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

In [None]:
Mdl = svm.SVC(kernel='linear', gamma='auto').fit(xtrain, label)
test_labels = Mdl.predict(test)

CMdl = cross_val_score(Mdl, xtrain, label, cv = 6) #cross-validate the model
print(CMdl)
classLoss = 1 - np.mean(CMdl) # average error over all cross-validation iterations

### 3. SVM Performance Analysis (let's code together)
Here, we want to see how SVM performs for the three different Kernels:
1. Linear
2. Polynomial
3. Radial Basis Function
and also how SVM performs for different number of features (0, 19).

At the end, we want to make a bar plot for the accuracy of all three methods as a function of number of features.

Let's code this together.
