[ Support vector machine (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine)
 ---
- a supervised max-margin model
- supports linear and nonlinear classification, regression and outlier detection
  - nonlinearity is achieved by kernel functions
- suitable from small to medium-sized nonlinear datasets
  - not well-scalable to very large datasets

In [None]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt, matplotlib as mpl
import sklearn as skl, sklearn.datasets as skds
from sklearn.svm import SVC

Linear SVM classification of - [the iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set)
---
- consists of 3 different types of irises 
  - Setosa, Versicolour, and Virginica
  - 150 samples with 4 features
    - Sepal Length, Sepal Width, Petal Length and Petal Width

In [None]:
iris = skds.load_iris(as_frame=True)
print(iris.DESCR)

In [None]:
iris.data.head(2)

In [None]:
iris.data.min(axis=0), iris.data.max(axis=0)

In [None]:
np.unique(iris.target.values), iris.target_names

In [None]:

X = iris.data[['sepal length (cm)', 'sepal width (cm)']].values
y = iris.target
setosa_or_versicolor = (y==0) | (y==1)
X = X[setosa_or_versicolor]
y = y[setosa_or_versicolor]

# SVM classifier model with Linear kernel
lsvm_clf = SVC(kernel='linear', C=1e100)
lsvm_clf.fit(X,y)

In [None]:
lsvm_clf.support_vectors_

In [None]:
# the plot below uses the first two features
_,ax1 = plt.subplots()
# iris.data.plot(ax=ax1, kind='scatter', x='sepal length (cm)', y='sepal width (cm)', c=iris.target)
iris_scatter = ax1.scatter(iris.data['sepal length (cm)'], iris.data['sepal width (cm)'], c=iris.target)

def plot_svc_street(ax, svm_clf, xmin, xmax):
  # decision line: w0*x0+w1*x1+b=0 => x1=-w0/w1*x0-b/w1
  w, b = svm_clf.coef_[0], svm_clf.intercept_[0]
  x = np.linspace(xmin, xmax, 200)
  decision_line = -w[0]/w[1]*x-b/w[1]
  margin = 1/w[1]
  street_up, street_down = decision_line + margin, decision_line-margin
  svs = svm_clf.support_vectors_
  ax.plot(x, decision_line, 'b-', linewidth=2)
  ax.plot(x, street_up, 'r:', linewidth=2)
  ax.plot(x, street_down, 'r:', linewidth=2)
  ax.scatter(svs[:,0], svs[:,1], s=100, facecolor='#ff0000',zorder=-1)

plot_svc_street(ax1, lsvm_clf, 4.3, 6.2)

x0 = np.linspace(4.3, 6.2, 200)
bad0 = 7/9*(x0-4.5)+2.3
ax1.plot(x0, bad0, 'c--', linewidth=2)

ax1.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])
_ = ax1.legend(iris_scatter.legend_elements()[0],
               iris.target_names,
               loc='lower right',
               title='Classes')

- with the first two features,
  - setosa is linearly separable from versicolor
  - however, versicolor and virginica are not separable
- both the SVM decision boundary (the blue sold line) and the cyan dash line (the decision boundary of another poor classifier) separate the setosa and versicolor
  - there is a possibly widest margin for the SVM blue line
    - marked with the two dotted red lines
      - fully determined by the samples located on the edge of the street
      - these samples are called *support vectors*
    - so SVM classification is also called *large margin classification*
    - instances off the street will not affect the decision boundary
  - but no margin for the cyan line
    - may not perform well on new instances

SVMs are sensitive to the feature scales
---
- the widest possible street is closer to the feature with smaller scale

In [None]:
from sklearn.preprocessing import StandardScaler

Xs = np.array([[1, 52], [2,30], [5, 21], [3, 83], [4,75], [5, 60]]).astype(np.float64)
ys = np.array([0, 0, 0, 1, 1, 1])
svm_clf = SVC(kernel="linear", C=100).fit(Xs, ys)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(Xs)
svm_clf_scaled = SVC(kernel='linear', C=100).fit(X_scaled, ys)

fig, axes = plt.subplots(1, 2, figsize=(8,3))
axes[0].plot(Xs[:, 0][ys==1], Xs[:, 1][ys==1], "bo")
axes[0].plot(Xs[:, 0][ys==0], Xs[:, 1][ys==0], "ms")
plot_svc_street(axes[0], svm_clf, 0.8, 5.4)

axes[1].plot(X_scaled[:, 0][ys==1], X_scaled[:, 1][ys==1], "bo")
axes[1].plot(X_scaled[:, 0][ys==0], X_scaled[:, 1][ys==0], "ms")
axes[1].axis([-2,2,-2,2])
plot_svc_street(axes[1], svm_clf_scaled, -1.8, 1.8)