SVM

Please check the .ipynb files instead of readme files

SVM code from scratch Classification

Prerequisite：Linearly separable

Basic idea is Max Margin Classifier, we have to find the widest road between class1 and class2.

\begin{align} max:margin(w,b) \st. \left{\begin{matrix} w^Tx_i +b>0,y_i=+1\ w^Tx_i +b<0,y_i=-1 \end{matrix}\right. \ \Rightarrow y_i(w^{T}x_{i}+b) > 0\ \forall i=1,2,...,N \end{align}

For the hard margin:
- We set margin equals 1, and $y_i(w^Tx_i+b)\geqslant1$;
For the soft margin:
- We allow some noise, so that we can increase the robustness of our system.
- We introduced slckness variable $\xi$, which represent loss.
- And now the margin function changes to $y_i(w^Tx_i+b)\geqslant1-\xi_i,s.t.xi\geqslant0$

Suppose we have two classes X and O, for class X:

When X is correctly classified, means X locates outside the margin, then $\xi=0$
WHen X is incorrecly classified and locates on the right side of $y_i(w^Tx_i+b)\geqslant0$,then $0<\xi\leqslant0$
When X is on the other side of margin, which means $y_i(w^Tx_i+b)\leqslant0$, then $\xi>1$

Based on these 3 conditions, we get the new target funtion:

\begin{align} min_{w,b,\xi }:\frac{1}{2}\left | w \right |^{2}+C\sum_{i=1}^{N}\xi_i\ s.t. \left{\begin{matrix}y_i(w^Tx_i + b)\geqslant 1-\xi_i \ \xi_i\geqslant0\ \end{matrix}\right. \ \end{align}

The loss function here called Hinge Loss, basically it uses distance to measure loss: $\xi$ represents the distance from a point to its corresponding margin $w^Tx+b=1$ when it is miss-classified.

If $w^Tx+b\geqslant1$, $\xi_i=0$, No loss, correct.
If $w^Tx+b<1$, $\xi_i=1-y_i(w^Tx+b)$

So now we have: \begin{align} \xi_i =max\left { 0,1-y_i(w^Tx_i + b) \right } \end{align}

Base on lagrange duality and KKT conditions, now we get the new target:

\begin{align} min: \sum_{i=1}^{N}\alpha_i - \frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i\alpha_jy_iy_jX_i^TX_j\ s.t. \left{\begin{matrix} 0< \alpha_i<C\ \sum_{i=1}^{N}\alpha_iy_i=0 \end{matrix}\right. \ \end{align}

Optimal Solutions:

$W^* = \sum_{i=1}^{N}x_iy_i\alpha_i$

$b^* = y_i-w^Tx_i$

SMO: Sequential Minimal Optimization

Basic idea: SGD and paired $\alpha$

Each time, we only update tow $\alpha$s, the rest of them were treated as constant Const.

Now set:

$K_{ij} = X_i^TX_j$
$f(x_k)=\sum_{i=1}^{N}\alpha_iy_iX_i^Tx_k+b$
Error: $E_i = f(x_i)-y_i$
$\xi=K_{mm}+K_{nn}-2K_{mn}$

Then we get:

$\alpha_2^{new} = \alpha_2^{old} + y_2\frac{(E_1-E_2)}{\xi}$
$\alpha_1^{new}=\alpha_1^{old}+y_1y_2(\alpha_2^{old}-\alpha_2^{new})$

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
LinearSVM.ipynb		LinearSVM.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVM

SMO: Sequential Minimal Optimization

About

Uh oh!

Releases

Packages

Languages

License

uttgeorge/SVM

Folders and files

Latest commit

History

Repository files navigation

SVM

SMO: Sequential Minimal Optimization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages