<a href="https://colab.research.google.com/github/visiont3lab/project-covid-mask-classifier/blob/main/utils/colab/SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machine


Può essere utilizzato per:
1. Classificazione (Lineare, Non Lineare)
2. Regressione
Si utilizza in problemi di classificazioni complessi che però non hanno un dataset enorme.

* > SVM richiede **feature scaling** (Standard Scaler)
* > Il classificatore SVM non da come output probabilità 


### SVM Objective

> Classification: Fit the largest possible street between two or more classes while limiting margin violations.

**Ho the Predictor work** <br>
$$w^Tx+b=w_1x_1 + ... + w_nx_n + b$$
<center>$\hat{y} =$ if $w^Tx+b<0 \rightarrow 0$ else  $w^Tx+b>=0 \rightarrow 1$</center>

**Training Objective** <br>
The slope of the decision function (l'inclinazione dell' hyperplane) is equal to the norm of the weight vector $||w||$. The smaller is the weight vector the  larger is the margin.

Hard Margin Linear SVM Classifier Objective
$$ min_{w,b} \frac{1}{2}w^Tw \rightarrow \frac{1}{2}||w||^2$$
$$subject \quad to \quad t^{(i)}(w^Tx^i+b)>=1 \quad for \quad i=1,2,..,m  $$
where $t^{(i)}=-1$ for negative sample and $t^{(i)}=1$ for positive one.

Soft Margin Linear SVM Classifier Objective
$$ min_{w,b} \frac{1}{2}w^Tw + C\sum_{i=1}^m\zeta^i$$
$$subject \quad to \quad t^{(i)}(w^Tx^i+b)>=1-\zeta^i \quad \zeta^i\ge0 \quad for \quad i=1,2,..,m  $$
where $t^{(i)}=-1$ for negative sample and $t^{(i)}=1$ for positive one. The slack variable $\zeta^i\ge0$ that we add for each instance mesure how much each instance $i$ is allowed to violate the margin.
Now we have two conflincting objectives. We want to make the slack variable as small as possible to reduce the margin violations and making $\frac{1}{2}w^Tw$ as small as possible to increase the margin.

We do no minimiza ||w|| because the derivative it is not nice. Indeed it is not differentiable at ||w||=0.

Both Hard and Soft margin are convex quadratic optimization problem with linear constraints. You can solve both using Qadratic Programming.

> Regression: fit as many instances as possible on the street while limiting margin violations. 
Adding more training instances withing the margin does not affect the model prediction ($\epsilon$-insenstive)
### SVM Error

* [Intuitive explaination SVM Error](https://www.youtube.com/watch?v=0WD7XXKlYS8&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=4)
* [Classification Error](https://www.youtube.com/watch?v=WzmadjVt_P4&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=6)
* [Margin Error](https://www.youtube.com/watch?v=Z1Qy9TfCIUg&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=7)

**Error = C* Classification Error + Margin Error**

* C: All'aumentare di C ci focalizziamo più sul Classification Error piuttosto che sul margin Error. Il parametro C moltiplica il Classification Error e pertanto all'aumentare di esso il desicision boundary sarà più irregolare.
Se C è circa 0 ci si focalizza sull'ottenere il più ampio margine passibile e quindi si introducono classificazioni sbagliate.

### Gaussian RBF Kernel
* $\gamma$ : Un $\gamma$ alto (5) rende la curva gaussiana più stretta e di conseguenza il decision boundary è più irregolare. Un $\gamma$ piccolo (0.1) rende la curva gaussiana più ampia che causa un decision boundary più meno irregolare (smussato). Qundi $\gamma$ agisce come parametro di regolarizzazione. Se il modello overfitta va diminuito. (Comportamento simile a C) 

* Gaussian Normal Distribution 
$$y=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
Dove $\sigma$ definisce l'ampiezza della campana $(\sigma^2)$ è la standard deviation) mentre $\mu$ (mean) è il centro delle curva.
All'aumenta di $\sigma$ la campana si allarga.
Definiamo $\gamma=\frac{1}{2\sigma^2}$. Quindi $\gamma$ inversamente proporzionale a $\sigma$

* Formula generale radial basis function
$$\phi(x,l)=e^{-\gamma||x-l||^2}$$ 
dove $l$ è il punto scelto (landmark).

[More in depth](https://en.wikipedia.org/wiki/Radial_basis_function_kernel)

Più in dettaglio  [ video Rbf Kernel part3 $\gamma$ paramameters](https://www.youtube.com/watch?v=wuKlhMDxtN0&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=15) 
### Tips
* È sempre conveniente provare il LinearSVC come primo (kernel lineare).
Utilizzando sklearn è meglio utilizzare LinearSVC piuttosto che SVC(kernel=linear).
* Come secondo test è utile provare il Gaussian RBF Kernel. Esso tiene conto della non linearità e ha pochi parametri da configurare.
* Il polinomial kernel è il più costoso in termini di calcolo ed è anche facile con esso overfittare.

# Reference
* [Support Vector Machine Playlist](https://www.youtube.com/playlist?list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY)
[Polynomial kernel video](https://www.youtube.com/watch?v=8nBzCfbra8s&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=10)
* [Polynomial Kernel](https://www.youtube.com/watch?v=8nBzCfbra8s&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=10)
* [Rbf Kernel part1 Gaussian RBF Kernel](https://www.youtube.com/watch?v=Z2_yh2sice8&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=13)
* [Rbf Kernel part2 High dimensional Explaination](https://www.youtube.com/watch?v=_3kyycL-Ehs&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=14)
* [Rbf Kernel part3 $\gamma$ paramameters](https://www.youtube.com/watch?v=wuKlhMDxtN0&list=PLC0PzjY99Q_Xc5IK-UE4FX7Loz1auXylY&index=15)

## Linear SVM Classification

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC,SVC

iris = datasets.load_iris()
X = iris["data"][:,(2,3)] # petal lenght, petal width
y = (iris["target"]==2).astype(np.float64) # Iris-Virginica

clf = Pipeline([
  ("scaler", StandardScaler()),
  ("linear_svc", LinearSVC(C=1, loss="hinge")) # Fast but for very large date we might fit memory
  #("linear_svc", SVC(kernel="linear",C=1, loss="hinge")) # Slower with large datasets
])
clf.fit(X,y)

clf.predict([[5.5,1.7]])


array([1.])

## Non Linear SVM Classification

In [None]:
from sklearn import datasets
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC,SVC
import plotly.graph_objects as go

#--- Load Data
X, y = datasets.make_moons(n_samples=200, noise=0.15, random_state=42)
#print(X)
#print(y)

#--- Display Input data
# Figure reference https://plotly.com/python/reference/
fig = go.Figure()
symbol_vec = []
color_vec = []
name_vec = []
for yval in y:
  if yval==0:
    symbol_vec.append("triangle-left-dot")
    color_vec.append("red")
    name_vec.append("cane")
  else:
    symbol_vec.append("diamond-open")
    color_vec.append("blue")
    name_vec.append("gatto")
fig.add_trace(go.Scatter(x=X[:,0],y=X[:,1],mode='markers',marker_symbol=symbol_vec, marker_color=color_vec,marker_size=10,
                          hovertemplate = 'X1: %{x:.2f}<br>'+ 'X2: %{y:.2f}<br>'+ 'Label: %{text}',name = '',text = name_vec))
fig.update_layout(hovermode="closest",title_text="Inputs",xaxis_title_text="x1",yaxis_title_text="x2")
fig.show()

# ---- Create a Classifier
clf = Pipeline([
  ("scaler", StandardScaler()),
  #("linear_svc", LinearSVC(C=1, loss="hinge"))
  #("svc", SVC(kernel="poly",degree=3,C=0.50, coef0=0.5)) # Kernel Trick
  ("svc", SVC(kernel="rbf",gamma=1,C=100)) # Kernel Trick # gamma=5, gamma=0.1, C=1000 , C=0.001
])
clf.fit(X,y)
y_pred = clf.predict(X)
print("Y_pred: ", y_pred)
print("Y     : ", y)
# Plot decision boundaries
#fig = go.Figure()
#fig.add_trace(go.Heatmap(x=X[:,0], y=X[:,1],z=y_pred,colorscale='Viridis',showscale=False))
#fig.show()
# --- Mesh plot
xx1 = np.linspace(X[:, 0].min()-0.3,X[:, 0].max()+0.3,num=150) #.reshape(-1,1)
xx2 = np.linspace(X[:, 1].min()-0.3,X[:, 1].max()+0.3,num=150) #.reshape(-1,1)
#X_fake =np.concatenate((xx1,xx2),axis=1)
xv1, xv2 = np.meshgrid(xx1, xx2)
X_fake = np.zeros((len(xx1)*len(xx2),2))
c = 0
for j in range(0,len(xx1)):
  for i in range(0,len(xx2)):
    X_fake[c,0] = xv1[i,j]
    X_fake[c,1] = xv2[i,j]
    c = c+1
#print(xv1[0])
#print(xv2.shape)
#X_fake = np.concatenate((xv1.T,xv2),axis=1)
y_pred = clf.predict(X_fake)
fig = go.Figure()
fig.add_trace(go.Scatter(x=X[:,0],y=X[:,1],mode='markers',marker_symbol=symbol_vec, marker_color=color_vec,marker_size=10,
                          hovertemplate = 'X1: %{x:.2f}<br>'+ 'X2: %{y:.2f}<br>'+ 'Label: %{text}',text = name_vec))
fig.add_trace(go.Heatmap(x=X_fake[:,0], y=X_fake[:,1],z=y_pred,colorscale='Viridis',showscale=False))
fig.update_layout(hovermode="closest",title_text="Results",xaxis_title_text="x1",yaxis_title_text="x2")
fig.show()

Y_pred:  [0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1
 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0
 1 1 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 1 1
 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1
 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1
 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1]
Y     :  [0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 1
 1 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0
 1 1 0 1 1 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 1 1
 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1
 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1
 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1]


## Radial Basis function Visualization Kernel Trick

In [None]:
import plotly.graph_objects as go
import numpy as np

l = [-3,-2.3,-2, 3,3.6,3.9, 6,6.2] # landmark (inputs)
x = np.linspace(-8,8,num=100)
sigma=0.5
mu1 = -2.3 
y1 = ( 1/(np.sqrt(sigma*np.pi) ) * np.exp( -( (x - mu1)**2 ) / (2*sigma**2) ))
mu2 = 3
y2 = ( 1/(np.sqrt(sigma*np.pi) ) * np.exp( -( (x - mu2)**2 ) / (2*sigma**2) ))
mu3 = 6
y3 = ( 1/(np.sqrt(sigma*np.pi) ) * np.exp( -( (x - mu3)**2 ) / (2*sigma**2) ))

fig = go.Figure()
fig.add_trace(go.Scatter(x=x,y=y1,name="gaussian kernel 1"))
fig.add_trace(go.Scatter(x=x,y=y2,name="gaussian kernel 2"))
fig.add_trace(go.Scatter(x=x,y=y3,name="gaussian kernel 3"))

fig.add_trace(go.Scatter(x=x,y=0.1*np.ones(len(x)),name="Threshold line"))
fig.add_trace(go.Scatter(x=l, y=np.zeros(len(l)),mode="markers",name="landmark"))

fig.show()