# Hyperparameter Tuning

Some of the parameters of the models are not learn. We call these parameters "hyperparameters". Such hyperparameters include the l1, l2 regularization terms, C and the kernel in Support Vector Classification and most parameters of tree models. Though sklearn's standard parameters are usually enough, you can expect to see an increase in accuracy by 1-5%  in most cases.

## The General Idea

You have:
- A model.
- A loss function to optimize.
- A set of parameters to optimize.

You need:
- A parameter space.

You can:
- Run an exhaustive search on all possible combinations of parameter values and test against the loss function.
- Run a random search on some combinations of parameter values and test against the loss function.


It is common to cross validate the results of each model produced from the grid search. As you can imagine, a set of parameters choosen strictly on one part of the dataset can cause your model to overfit.

## A Primer of Cross Validation

![](https://garthtarr.github.io/avfs/lectures/imgs/k_fold_cv.jpg)

# Sklearn GridSearchCV

```python
from sklearn.model_selection import GridSearchCV

GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, cv=None)
```
__estimator:__ A sklearn estimator/model that has the methods fit and predict.

__param_grid:__ A dictionary of parameters.

__scoring:__ Either a string such as 'f1', 'roc_auc' or a scorer function

__cv:__ Either an int specifying the number of folds for cross validation or a specific cross validation function such as StratifiedKfoldSplit.

__n_jobs:__ How many cpu cores you are willing use. -1 is for all available cpus.




more on scoring strings: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter



## Parameter Grid for GridSearchCV

For an exhaustive search we need to explicitly define which values the model should evaluate.

The keys must share the same name as the arguments to the model.

In [None]:
{
    'C': [0.1,1, 10, 100],
    'gamma': [1,0.1,0.01,0.001],
    'kernel': ['rbf', 'poly', 'sigmoid']
}

## Making a scorer

```python
from sklearn.metrics import make_scorer

make_scorer(score_func, greater_is_better=True, needs_proba=False)
```

The arguments:
- score_func: a function that takes y and the prediction as arguments
- greater_is_better: Helpful boolean to distinguish between minimizing and maximizing.
- needs_proba: Helpful boolean to define functions that use probabilities, an example would be negative logarithmic loss.


# GridSearchCV in Action

In [1]:
from seaborn import load_dataset
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.svm import SVC
import pandas as pd

In [34]:
params = {
    'C': [0.1,100],
    'gamma': [1,0.1,0.001],
    'kernel': ['rbf', 'poly','linear'],
    'degree': [3, 5 ]
}

In [43]:
data = load_dataset('titanic').drop(["alive","adult_male","who","class",'embarked'],axis=1)
hot_cols = ['pclass','sex','sibsp','parch','deck','embark_town','alone']
df = pd.get_dummies(data,columns=hot_cols)
df.fillna(df.median(),inplace=True)
df.head()

Unnamed: 0,survived,age,fare,pclass_1,pclass_2,pclass_3,sex_female,sex_male,sibsp_0,sibsp_1,...,deck_C,deck_D,deck_E,deck_F,deck_G,embark_town_Cherbourg,embark_town_Queenstown,embark_town_Southampton,alone_False,alone_True
0,0,22.0,7.25,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,1,1,0
1,1,38.0,71.2833,1,0,0,1,0,0,1,...,1,0,0,0,0,1,0,0,1,0
2,1,26.0,7.925,0,0,1,1,0,1,0,...,0,0,0,0,0,0,0,1,0,1
3,1,35.0,53.1,1,0,0,1,0,0,1,...,1,0,0,0,0,0,0,1,1,0
4,0,35.0,8.05,0,0,1,0,1,1,0,...,0,0,0,0,0,0,0,1,0,1


In [44]:
grid = GridSearchCV(estimator=SVC(max_iter=100000),param_grid=params,cv=StratifiedKFold(3), verbose =2, n_jobs=-1)
grid.fit(df.drop("survived",axis=1), df.survived)
print(f"Best params are {grid.best_params_}")
print(f"Best score is {grid.best_score_}")

Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    3.2s


Best params are {'C': 100, 'degree': 3, 'gamma': 0.001, 'kernel': 'rbf'}
Best score is 0.7912457912457912


[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:    8.0s finished


# Sklearn RandomizedSearchCV

```python
from sklearn.model_selection import RandomizedSearchCV
```

The notable difference is the parameters to explore can be much bigger and n_iter parameters decides on how many parameters are sampled.

In [27]:
from scipy import stats
[stats.uniform(0,scale=100).rvs() for i in range(10)]
# Instead of testing for C = [0.1,1, 10, 100] explicitly we sample rvs

[37.93193497544323,
 9.129222920351044,
 23.916529446974366,
 5.981025513511562,
 88.00369265235241,
 26.539294123447775,
 34.6307561613006,
 45.86174064470276,
 80.76971016143008,
 4.362722553194532]

# RandomizedSearchCV in action


In [26]:
from sklearn.model_selection import RandomizedSearchCV
