# Optuna

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. To read more - https://github.com/pfnet/optuna



Optuna example that optimizes a classifier configuration for cancer dataset using LightGBM.
In this example, we optimize the validation accuracy of cancer detection using LightGBM.
We optimize both the choice of booster model and their hyperparameters.



In [2]:

import lightgbm as lgb
import numpy as np
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split

import optuna


In [3]:
# FYI: Objective functions can take additional arguments
# (https://optuna.readthedocs.io/en/stable/faq.html#objective-func-additional-args).
def objective(trial):
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        'objective': 'binary',
        'metric': 'binary_logloss',
        'verbosity': -1,
        'boosting_type': trial.suggest_categorical('boosting', ['gbdt', 'dart', 'goss']),
        'num_leaves': trial.suggest_int('num_leaves', 10, 1000),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-8, 1.0)
    }

    if param['boosting_type'] == 'dart':
        param['drop_rate'] = trial.suggest_loguniform('drop_rate', 1e-8, 1.0)
        param['skip_drop'] = trial.suggest_loguniform('skip_drop', 1e-8, 1.0)
    if param['boosting_type'] == 'goss':
        param['top_rate'] = trial.suggest_uniform('top_rate', 0.0, 1.0)
        param['other_rate'] = trial.suggest_uniform('other_rate', 0.0, 1.0 - param['top_rate'])

    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(test_x)
    pred_labels = np.rint(preds)
    accuracy = sklearn.metrics.accuracy_score(test_y, pred_labels)
    return 1.0 - accuracy




In [4]:
if __name__ == '__main__':
    study = optuna.create_study()
    study.optimize(objective, n_trials=100)

    print('Number of finished trials: {}'.format(len(study.trials)))

    print('Best trial:')
    trial = study.best_trial

    print('  Value: {}'.format(trial.value))

    print('  Params: ')
    for key, value in trial.params.items():
        print('    {}: {}'.format(key, value))

[I 2019-02-23 22:08:51,671] Finished a trial resulted in value: 0.39160839160839156. Current best value is 0.39160839160839156 with parameters: {'boosting': 'goss', 'num_leaves': 535, 'learning_rate': 1.4084980652636507e-08, 'top_rate': 0.2481010860600652, 'other_rate': 0.19955824131854402}.
[I 2019-02-23 22:08:51,873] Finished a trial resulted in value: 0.020979020979020935. Current best value is 0.020979020979020935 with parameters: {'boosting': 'goss', 'num_leaves': 672, 'learning_rate': 0.3628098045632216, 'top_rate': 0.017690666579892267, 'other_rate': 0.7978419678646624}.
[I 2019-02-23 22:08:52,047] Finished a trial resulted in value: 0.027972027972028024. Current best value is 0.020979020979020935 with parameters: {'boosting': 'goss', 'num_leaves': 672, 'learning_rate': 0.3628098045632216, 'top_rate': 0.017690666579892267, 'other_rate': 0.7978419678646624}.
[I 2019-02-23 22:08:52,274] Finished a trial resulted in value: 0.3706293706293706. Current best value is 0.020979020979020

Number of finished trials: 100
Best trial:
  Value: 0.013986013986013957
  Params: 
    boosting: gbdt
    num_leaves: 932
    learning_rate: 0.7999157126743421
