<a href="https://colab.research.google.com/github/victorviro/Machine-Learning-Python/blob/master/Hyperopt_Hyperparameter_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parameter optimization with hyperopt

[Hyperopt](https://github.com/hyperopt/hyperopt) is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.

It is designed for large-scale optimization for models with hundreds of parameters and allows the optimization procedure to be scaled across multiple cores and multiple machines.

The library was explicitly used to optimize machine learning pipelines, including data preparation, model selection, and model hyperparameters.

Let's see how to find the best model and parameters for classifying the [Housing dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html) using this library.

## Prepare the dataset

In [1]:
from sklearn.datasets import load_boston
import pandas as pd
X, y = load_boston(return_X_y=True)
X = pd.DataFrame(X)
print(X.shape)
print(y.shape)
X.head()

(506, 13)
(506,)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


We split the data into into training and test sets.

In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

## Define a function to minimize

In this example, we want to search for a regression linear model. We define a parameter `params['type']` as the model name. We define a function to run the training and return the mean squared error.

In [3]:
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import ElasticNet
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

In [4]:
def objective(params):
    regressors_type = params['type']
    del params['type']
    if regressors_type == 'ElasticNet':
        pipe = Pipeline([('scaler', StandardScaler()),
                         ('ElasticNet', ElasticNet(**params))])
        pipe.fit(X_train, y_train)
        y_pred_test = pipe.predict(X_test)
    else:
        return 0    
    loss = mean_squared_error(y_test, y_pred_test)
    
    return {'loss': loss, 'status': STATUS_OK}

## Define the search space over hyperparameters

See the [Hyperopt docs](https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions) for details on defining a search space and parameter expressions.

Use `hp.choice` to select different models.

In [5]:
search_space = hp.choice('regressor_type', [
    {
        'type': 'ElasticNet',
        'alpha': hp.lognormal('alpha', 0, 1.0),
        'l1_ratio': hp.lognormal('l1_ratio', 0, 1.0)
    },
])

## Select a search algorithm

The two main choices are:

- `hyperopt.tpe.suggest`: Tree of Parzen Estimators, a Bayesian approach which iteratively and adaptively selects new hyperparameter settings to explore based on past results.

- `hyperopt.rand.suggest`: Random search, a non-adaptive approach which samples over the search space.

See [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).

In [6]:
algorithm = tpe.suggest

## Run the tuning algorithm with hyperopt `fmin()`

Set `max_evals` to the maximum number of points in hyperparameter space to test, that is, the maximum number of models to fit and evaluate.

In [59]:
best_result = fmin(
    fn=objective, 
    space=search_space,
    algo=algorithm,
    max_evals=16)

100%|██████████| 16/16 [00:00<00:00, 140.01it/s, best loss: 24.374209595650548]


  positive)

  positive)

  positive)



In [60]:
print(f'best parameters: {best_result}')

best parameters: {'alpha': 0.2807037574860731, 'l1_ratio': 1.510527308817968, 'regressor_type': 0}


**Note**: Our results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times.

## Hyperparameter Optimization with hyperopt-sklearn

[Hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) is Hyperopt-based model selection among machine learning algorithms in scikit-learn.

In [7]:
!pip install git+https://github.com/hyperopt/hyperopt-sklearn.git

Collecting git+https://github.com/hyperopt/hyperopt-sklearn.git
  Cloning https://github.com/hyperopt/hyperopt-sklearn.git to /tmp/pip-req-build-ak4msr39
  Running command git clone -q https://github.com/hyperopt/hyperopt-sklearn.git /tmp/pip-req-build-ak4msr39
Collecting nose
[?25l  Downloading https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl (154kB)
[K     |████████████████████████████████| 163kB 2.7MB/s 
Building wheels for collected packages: hpsklearn
  Building wheel for hpsklearn (setup.py) ... [?25l[?25hdone
  Created wheel for hpsklearn: filename=hpsklearn-0.0.3-cp36-none-any.whl size=26799 sha256=3e44c7132a8731abe8a43fb7c0b0992cba9cfaf829684dd566ba5e9bd0e4bb9c
  Stored in directory: /tmp/pip-ephem-wheel-cache-6hl448ha/wheels/28/93/20/67dca95c2aaa13466b4900ba79a7bab66022e50ce44f8a438d
Successfully built hpsklearn
Installing collected packages: nose, hpsklearn
Successfully installed hpskle

In [8]:
from hpsklearn import HyperoptEstimator
from hpsklearn import any_regressor
from hpsklearn import any_preprocessing
from hyperopt import tpe

from sklearn.metrics import mean_squared_error

WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely


We define the search procedure. We will explore all regressor algorithms and all data transforms available to the library and use the Tree of Parzen Estimators search algorithm.

The search will evaluate 50 pipelines and limit each evaluation to 30 seconds.

In [38]:
# define search
hyperopt_estimator = HyperoptEstimator(regressor=any_regressor('reg'),
                          preprocessing=any_preprocessing('pre'),
                          loss_fn=mean_squared_error,
                          algo=tpe.suggest, max_evals=50, trial_timeout=30)

We then start the search.

In [39]:
hyperopt_estimator.fit(X_train, y_train)

100%|██████████| 1/1 [00:00<00:00,  1.48it/s, best loss: 55.61112402065533]
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]




100%|██████████| 1/1 [00:00<00:00,  7.09it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  4.96it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  8.83it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:01<00:00,  1.18s/it, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  5.05it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  5.08it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  2.19it/s, best loss: 19.292188821582922]
100%|██████████| 1/1 [00:00<00:00,  2.40it/s, best loss: 19.18266657314485]
100%|██████████| 1/1 [00:00<00:00,  7.54it/s, best loss: 19.18266657314485]
100%|██████████| 1/1 [00:02<00:00,  2.47s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:01<00:00,  1.98s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:01<00:00,  1.38s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:01<00:00,  1.51s/it, best loss: 12.616274214690932]
1




100%|██████████| 1/1 [00:01<00:00,  1.63s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:03<00:00,  3.41s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:00<00:00,  2.68it/s, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:00<00:00,  5.06it/s, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:00<00:00,  1.32it/s, best loss: 12.616274214690932]
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]




100%|██████████| 1/1 [00:00<00:00,  4.79it/s, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:00<00:00,  6.36it/s, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:01<00:00,  1.74s/it, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:00<00:00,  5.07it/s, best loss: 12.616274214690932]
100%|██████████| 1/1 [00:01<00:00,  1.43s/it, best loss: 11.032993002355472]
100%|██████████| 1/1 [00:01<00:00,  1.43s/it, best loss: 10.589484419204354]
100%|██████████| 1/1 [00:01<00:00,  1.06s/it, best loss: 10.440331595617312]
100%|██████████| 1/1 [00:00<00:00,  1.22it/s, best loss: 10.440331595617312]
100%|██████████| 1/1 [00:00<00:00,  1.06it/s, best loss: 9.510321254226078]
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]




100%|██████████| 1/1 [00:02<00:00,  2.01s/it, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  5.45it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  1.68it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  3.67it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  1.70it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  3.25it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  5.75it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  1.05it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  5.69it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  1.51it/s, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:01<00:00,  1.84s/it, best loss: 9.510321254226078]
100%|██████████| 1/1 [00:00<00:00,  5.98it/s, best loss: 9.510321254226078]


We can report the performance of the model on the holdout dataset and summarize the best performing pipeline.

In [40]:
# summarize performance
mse = hyperopt_estimator.score(X_test, y_test)
print("MSE: %.3f" % mse)
# summarize the best model
print(f'Best model: {hyperopt_estimator.best_model()}')

MSE: 0.886
Best model: {'learner': XGBRegressor(base_score=0.5, booster='gbtree',
             colsample_bylevel=0.9978248847072497, colsample_bynode=1,
             colsample_bytree=0.6129378971302373, gamma=3.0369489070285863e-05,
             importance_type='gain', learning_rate=0.0059639879541257025,
             max_delta_step=0, max_depth=8, min_child_weight=4, missing=nan,
             n_estimators=1800, n_jobs=1, nthread=None, objective='reg:linear',
             random_state=0, reg_alpha=0.0004566228110865921,
             reg_lambda=2.3939670304952703, scale_pos_weight=1, seed=4,
             silent=None, subsample=0.6482508600159675, verbosity=1), 'preprocs': (StandardScaler(copy=True, with_mean=True, with_std=False),), 'ex_preprocs': ()}


# References

- [Hyperopt](https://github.com/hyperopt/hyperopt)

- [Hyperopt basic tutorial](https://github.com/hyperopt/hyperopt/wiki/FMin)

- [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)

- https://docs.databricks.com/_static/notebooks/hyperopt-sklearn-model-selection.html

- [Tutorial on hyperopt](https://www.kaggle.com/fanvacoolt/tutorial-on-hyperopt)

- [Hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn)

- https://machinelearningmastery.com/hyperopt-for-automated-machine-learning-with-scikit-learn/

- https://hyperopt.github.io/hyperopt-sklearn/

- [Hyperopt-Sklearn](https://www.ml4aad.org/wp-content/uploads/2018/07/automl_book_draft_hyperopt-sklearn.pdf)