# Hyperparameter tuning with scikit-learn

This notebooks contains a few examples on how hyperparameter tuning works with scikit-learn.

Author: Umberto Michelucci (umberto.michelucci@toelt.ai).

In [9]:
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.model_selection import HalvingGridSearchCV
import pandas as pd

First of all we always need to define what hyperparameters we want to test.

In [2]:
param_grid = {'max_depth': [3, 5, 10], 'min_samples_split': [2, 5, 10]}

Then we need to define what kind of model we want to test.

In [3]:
base_estimator = RandomForestClassifier(random_state=0)

The following cell will generate some *fake* data to use for our tuning.

In [4]:
X, y = make_classification(n_samples=1000, random_state=0)

Finally we can do the actual search. By default, the resource is defined in terms of number of samples. That is, each iteration will use an increasing amount of samples to train on. You can however manually specify a parameter to use as the resource with the resource parameter. Here is an example where the resource is defined in terms of the number of estimators of a random forest:

In [5]:
sh = HalvingGridSearchCV(base_estimator, param_grid, cv=5,
                    factor=2, resource='n_estimators',
                    max_resources=30).fit(X, y)

And now we can get the best parameters.

In [6]:
sh.best_estimator_

As mentioned above, the number of resources that is used at each iteration depends on the ```min_resources``` parameter. If you have a lot of resources available but start with a low number of resources, some of them might be wasted (i.e. not used). Let us try a different example.

In [10]:
param_grid= {'kernel': ('linear', 'rbf'),
              'C': [1, 10, 100]}
base_estimator = SVC(gamma='scale')

In [11]:
X, y = make_classification(n_samples=1000)

In [12]:
sh = HalvingGridSearchCV(base_estimator, param_grid, cv=5,
                          factor=2, min_resources=20).fit(X, y)

In [13]:
sh.n_resources_

[20, 40, 80]

The search process will only use 80 resources at most, while our maximum amount of available resources is ```n_samples=1000```. Here, we have ```min_resources = r_0 = 20```. For ```HalvingGridSearchCV```, by default, the min_resources parameter is set to ```exhaust```. This means that min_resources is automatically set such that the last iteration can use as many resources as possible, within the max_resources limit

There are many more possibilities, and looking at the official documentation is always a good idea to explore all possibilities.

## Analysis of the results

The cv_results_ attribute contains useful information for analyzing the results of a search. It can be converted to a pandas dataframe with ```df = pd.DataFrame(est.cv_results_)```.

In [17]:
df = pd.DataFrame(sh.cv_results_)
df

Unnamed: 0,iter,n_resources,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,...,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,0,20,0.000358,0.000119,0.000158,2.5e-05,1,linear,"{'C': 1, 'kernel': 'linear'}",0.75,...,0.65,0.374166,6,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,0,20,0.00026,5e-06,0.00014,5e-06,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.75,...,0.6,0.3,9,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,0,20,0.000361,0.00013,0.000155,2.3e-05,10,linear,"{'C': 10, 'kernel': 'linear'}",0.75,...,0.65,0.374166,6,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,0,20,0.000521,0.000169,0.000246,5.8e-05,10,rbf,"{'C': 10, 'kernel': 'rbf'}",0.75,...,0.6,0.339116,9,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,0,20,0.000355,6.3e-05,0.000205,9.2e-05,100,linear,"{'C': 100, 'kernel': 'linear'}",0.75,...,0.65,0.374166,6,1.0,1.0,1.0,1.0,1.0,1.0,0.0
5,0,20,0.000278,1.4e-05,0.000145,5e-06,100,rbf,"{'C': 100, 'kernel': 'rbf'}",0.75,...,0.6,0.339116,9,1.0,1.0,1.0,1.0,1.0,1.0,0.0
6,1,40,0.000345,2.4e-05,0.000176,1.9e-05,1,linear,"{'C': 1, 'kernel': 'linear'}",0.875,...,0.825,0.1,2,1.0,1.0,1.0,1.0,1.0,1.0,0.0
7,1,40,0.000604,0.000265,0.000232,9.2e-05,10,linear,"{'C': 10, 'kernel': 'linear'}",0.875,...,0.825,0.1,2,1.0,1.0,1.0,1.0,1.0,1.0,0.0
8,1,40,0.000308,1.8e-05,0.000146,5e-06,100,linear,"{'C': 100, 'kernel': 'linear'}",0.875,...,0.825,0.1,2,1.0,1.0,1.0,1.0,1.0,1.0,0.0
9,2,80,0.000452,4.4e-05,0.000262,0.000108,10,linear,"{'C': 10, 'kernel': 'linear'}",1.0,...,0.8375,0.10155,1,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Each row corresponds to a given parameter combination (a candidate) and a given iteration. The iteration is given by the ```iter column```. The ```n_resources``` column tells you how many resources were used.

In case you are interested in knowing how hyperband works, you can refer to L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar, Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization, in Machine Learning Research 18, 2018.