# SVM (support vector machine)

$h(\textbf{x}) = \textbf{w} \textbf{x} - b$

<img src='imgs/svm1.png'>

<img src='imgs/svm2.png'>

<img src='imgs/svm5.png'>

<img src='imgs/svm4.png'>

<img src='imgs/svm6.png'>

<img src='imgs/svm7.png'>

<img src='imgs/svm8.png'>

<img src='imgs/svm9.png'>

<img src='imgs/svm12.png'>

<img src='imgs/svm15.png'>

<img src='imgs/svm16.png'>

<img src='imgs/svm11.png'>

<img src='imgs/svm13.png'>

## Other SVMs

* SGD + SVM
* One-class SVM
* Neural kernel SVM

In [7]:
from sklearn.svm import SVC

In [8]:
SVC?

# Naive Bayes

<img src='imgs/nb7.png'>

<img src='imgs/nb1.png'>

<img src='imgs/nb5.png'>

<img src='imgs/nb2.png'>

<img src='imgs/nb3.png'>

In [None]:
from sklearn.naive_bayes import BernoulliNB, CategoricalNB, GaussianNB

# Hyperparameters optimization

**examples of hyperameters**
* regularization constant
* learning rate
* number of epochs
* number of models

**goals**
* improve performance
* reduce human work

<img src='imgs/im1.png'>

## Grid search

1. For each $\Lambda_i$ choose $L_i\subseteq \Lambda_i$ - set of values
2. S = $\prod L_i$ - set of trial points
3. Evaluate V


## Random Search

1. Set of trial points $S\sim_{iid}$ **$\Lambda$**
2. Evaluate V

### Random vs Grid
* for the fixed number of iterations K random can view K different values for each parameter, whereas grid can only explore $K^{\frac{1}{N}}$

<img src='imgs/im3.png'>

grid search is usualy worse
* often a small fraction of hyperparameters matter
* for differrent datasets can matter different parameters

**Problems**

* function evaluation can be extremely expensive
* hyperparameter space is often high-dimensional and complex
* hyperparameter space can have conditionality
* no gradient

In [9]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import make_scorer, roc_auc_score
from sklearn.preprocessing import StandardScaler

import numpy as np

In [10]:
data = load_breast_cancer()
x, y = data.data, data.target
x = StandardScaler().fit_transform(x)

In [11]:
auc_scorer = make_scorer(roc_auc_score)

In [12]:
params = {'penalty': ['l2', 'l1'], 
          'C': np.linspace(0.1, 1, 10),
          'solver':['saga'], 
          'max_iter':[10000]}
model = LogisticRegression()

In [84]:
clf = GridSearchCV(model, params, scoring=auc_scorer)
clf.fit(x, y)

GridSearchCV(estimator=LogisticRegression(),
             param_grid={'C': array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),
                         'max_iter': [10000], 'penalty': ['l2', 'l1'],
                         'solver': ['saga']},
             scoring=make_scorer(roc_auc_score))

In [86]:
clf.best_params_, clf.best_score_

({'C': 0.30000000000000004,
  'max_iter': 10000,
  'penalty': 'l2',
  'solver': 'saga'},
 0.9783765253016808)

In [76]:
from scipy.stats import uniform

In [79]:
params = {'penalty': ['l2', 'l1'], 
          'C': uniform(0, 1),
          'solver':['saga'], 
          'max_iter':[10000]}
model = LogisticRegression()

In [80]:
clf = RandomizedSearchCV(model, params, scoring=auc_scorer, n_iter=20)
clf.fit(x, y)

RandomizedSearchCV(estimator=LogisticRegression(), n_iter=20,
                   param_distributions={'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fba9cad6d90>,
                                        'max_iter': [10000],
                                        'penalty': ['l2', 'l1'],
                                        'solver': ['saga']},
                   scoring=make_scorer(roc_auc_score))

In [82]:
clf.best_params_, clf.best_score_

({'C': 0.31472625862091097,
  'max_iter': 10000,
  'penalty': 'l2',
  'solver': 'saga'},
 0.9783765253016808)

## Multi-fidelity optimization

* low-fidelity (low cost) approximations of the V
* trade-off between optimization performance and runtime

how to approximate
* use subset of data
* train for few iterations
* using subset of features

## Predictive termination

* predicting performance of the model as a function of number of iterations

* train for the some iterations and predict value at the end of training

## Hyperband

* initialize set of models and train them for some time t
* remove bottom half of models
* continue training for t
* repeat until 1 model is left

<img src='imgs/im7.png'>

# Bayesian optimization

<img src='imgs/im4.png'>

<img src='imgs/im5.png' width=900 style="float: left">


* https://github.com/hyperopt/hyperopt

* https://github.com/facebook/Ax

* https://github.com/fmfn/BayesianOptimization

* https://github.com/optuna/optuna

* https://github.com/zygmuntz/hyperband

* https://github.com/ray-project/ray
* more