<h2>Suport Vector Machines</h2>
<img src="../images/svm.png">
<p>Support Vector Machine algorithm follows the idea of using a linear decision boundary that best seperates the data. The Best hyper plain is the one that represents the largest margin.</p>
<p>The idea is to choose the hyper plane (red line) with the maximum distance to the nearest data point.</p>
<p>Where the hyper plane folows the linear model:</p>
<h3>Linear Model</h3>

\begin{align}
w \cdot x - b = 0 \\
\end{align}

<p>but each of the classes must satisfy one of the following equation:</p>

\begin{align}
\end{align}

\begin{align}
w \cdot x_{i} - b \geq \enspace 1 \quad \text{if } y_{i} = 1
\end{align}
\begin{align}
w \cdot x_{i} - b \leq -1 \quad \text{if } y_{i} = -1
\end{align}

\begin{align}
\end{align}

<p>If we multiply the left side, with the right side, we get:</p>

\begin{align}
y_{i} (w \cdot x_{i} - b) \geq 1
\end{align}

<p>We'll need to find out the optimal weights and biases through the cost function</p>

<h3>Cost Function</h3>

<h3>The loss function we'll use will be the: </h3>
<h4>Hinge Loss</h4>
\begin{align}
l = max(0, 1 - y_{i}(w \cdot x_{i} - b))
\end{align}

<p>that can be illustrated as:</p>
<img src="../images/hinge.png">

<p>where:</p>
\begin{align}
l =\begin{cases}
    0 & \text{if }y \cdot f(x) \geq 1\\
    1 - y \cdot f(x) & \text{otherwise}.
  \end{cases}
\end{align}

<h3>Add Regulation</h3>
\begin{align}
J = \lambda {\lVert w \lVert}^{2} + \frac{1}{n} \sum_{i=1}^{n} max(0, 1 - y_{i}(w \cdot x_{i} - b))
\end{align}

\begin{align}
\end{align}

if $y_{i} \cdot f(x) \geq 1$:
\begin{align}
\end{align}
\begin{align}
J_{i} = \lambda {\lVert w \lVert}^{2}
\end{align}

\begin{align}
\end{align}

else:
\begin{align}
\end{align}
\begin{align}

J_{i} = \lambda {\lVert w \lVert}^{2} + 1 - y_{i}(w \cdot x_{i} - b)
\end{align}

<h3>Add Gradient</h3>
if $y_{i} \cdot f(x) \geq 1$:
\begin{align}
\end{align}
\begin{align}
\frac{dJ_{i}}{dw_{k}} = 2 \lambda w_{k}
\end{align}

\begin{align}
\end{align}

\begin{align}
\frac{dJ_{i}}{db} = 0
\end{align}

\begin{align}
\end{align}

else:
\begin{align}
\end{align}
\begin{align}
\frac{dJ_{i}}{dw_{k}} = 2 \lambda w_{k} - y_{i} \cdot x_{i}
\end{align}

\begin{align}
\end{align}

\begin{align}
\frac{dJ_{i}}{db} = y_{i}
\end{align}

<h3>Update Rule</h3>
For each training sample $ x_{i}$ :
\begin{align}
\end{align}
\begin{align}
w = w - \alpha \cdot dw
\end{align}
\begin{align}
b = b - \alpha \cdot db
\end{align}

In [None]:
import numpy as np 


class SVM:

    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        self.lr = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None


    def fit(self, X, y):
        n_samples, n_features = X.shape
        
        y_ = np.where(y <= 0, -1, 1)
        
        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y_[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    self.w -= self.lr * (2 * self.lambda_param * self.w)
                else:
                    self.w -= self.lr * (2 * self.lambda_param * self.w - np.dot(x_i, y_[idx]))
                    self.b -= self.lr * y_[idx]


    def predict(self, X):
        approx = np.dot(X, self.w) - self.b
        return np.sign(approx)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

from svm import SVM

X, y =  datasets.make_blobs(n_samples=50, n_features=2, centers=2, cluster_std=1.05, random_state=40)
y = np.where(y == 0, -1, 1)

clf = SVM()
clf.fit(X, y)
#predictions = clf.predict(X)
 
print(clf.w, clf.b)

def visualize_svm():
    def get_hyperplane_value(x, w, b, offset):
        return (-w[0] * x + b + offset) / w[1]

    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    plt.scatter(X[:,0], X[:,1], marker='o',c=y)

    x0_1 = np.amin(X[:,0])
    x0_2 = np.amax(X[:,0])

    x1_1 = get_hyperplane_value(x0_1, clf.w, clf.b, 0)
    x1_2 = get_hyperplane_value(x0_2, clf.w, clf.b, 0)

    x1_1_m = get_hyperplane_value(x0_1, clf.w, clf.b, -1)
    x1_2_m = get_hyperplane_value(x0_2, clf.w, clf.b, -1)

    x1_1_p = get_hyperplane_value(x0_1, clf.w, clf.b, 1)
    x1_2_p = get_hyperplane_value(x0_2, clf.w, clf.b, 1)

    ax.plot([x0_1, x0_2],[x1_1, x1_2], 'y--')
    ax.plot([x0_1, x0_2],[x1_1_m, x1_2_m], 'k')
    ax.plot([x0_1, x0_2],[x1_1_p, x1_2_p], 'k')

    x1_min = np.amin(X[:,1])
    x1_max = np.amax(X[:,1])
    ax.set_ylim([x1_min-3,x1_max+3])

    plt.show()

visualize_svm()