# Machine Learning Mastery Scikit-Optimize for Hyperparameter Tuning in Machine Learning

## Credit and Attribution

Jason Brownlee PhD, 

Scikit-Optimize for Hyperparameter Tuning in Machine Learning, 

Machine Learning Master, 

https://machinelearningmastery.com/scikit-optimize-for-hyperparameter-tuning-in-machine-learning/



In [1]:
!pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.8.1-py2.py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 2.5 MB/s ta 0:00:01
Collecting pyaml>=16.9
  Downloading pyaml-20.4.0-py2.py3-none-any.whl (17 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-20.4.0 scikit-optimize-0.8.1
You should consider upgrading via the '/Users/patrickryan/.local/share/virtualenvs/py37machinelearning_venv/bin/python3 -m pip install --upgrade pip' command.[0m


In [2]:
import skopt
skopt.__version__

'0.8.1'

In [3]:
# evaluate an svm for the ionosphere dataset
from numpy import mean
from numpy import std
from pandas import read_csv
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.svm import SVC


In [4]:
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
dataframe = read_csv(url, header=None)


In [5]:
dataframe.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,25,26,27,28,29,30,31,32,33,34
0,1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1.0,0.0376,...,-0.51171,0.41078,-0.46168,0.21266,-0.3409,0.42267,-0.54487,0.18641,-0.453,g
1,1,0,1.0,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1.0,-0.04549,...,-0.26569,-0.20468,-0.18401,-0.1904,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
2,1,0,1.0,-0.03365,1.0,0.00485,1.0,-0.12062,0.88965,0.01198,...,-0.4022,0.58984,-0.22145,0.431,-0.17365,0.60436,-0.2418,0.56045,-0.38238,g
3,1,0,1.0,-0.45161,1.0,1.0,0.71216,-1.0,0.0,0.0,...,0.90695,0.51613,1.0,1.0,-0.20099,0.25682,1.0,-0.32382,1.0,b
4,1,0,1.0,-0.02401,0.9414,0.06531,0.92106,-0.23255,0.77152,-0.16399,...,-0.65158,0.1329,-0.53206,0.02431,-0.62197,-0.05707,-0.59573,-0.04608,-0.65697,g


In [6]:
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)


(351, 34) (351,)


In [7]:
# define model model
model = SVC()
# define test harness
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
m_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
print('Accuracy: %.3f (%.3f)' % (mean(m_scores), std(m_scores)))

Accuracy: 0.937 (0.038)


## Manually Tune Parameters with Scikit-Optimize

In [9]:
from skopt.space import Real, Integer, Categorical
from skopt.utils import use_named_args

In [10]:
...
# define the space of hyperparameters to search
search_space = list()
search_space.append(Real(1e-6, 100.0, 'log-uniform', name='C'))
search_space.append(Categorical(['linear', 'poly', 'rbf', 'sigmoid'], name='kernel'))
search_space.append(Integer(1, 5, name='degree'))
search_space.append(Real(1e-6, 100.0, 'log-uniform', name='gamma'))

Define a function that will be used to create the model, set the candidate set of parameters and perform cross_val_score to determine the best set of parameters

In [11]:
# define the function used to evaluate a given configuration
@use_named_args(search_space)
def evaluate_model(**params):
	# configure the model with specific hyperparameters
	model = SVC()
	model.set_params(**params)
	# define test harness
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	# calculate 5-fold cross validation
	result = cross_val_score(model, X, y, cv=cv, n_jobs=-1, scoring='accuracy')
	# calculate the mean of the scores
	estimate = mean(result)
	# convert from a maximizing score to a minimizing score
	return 1.0 - estimate

The process will run until it converges and returns a result.

The result object has the best performing configuration.

In [13]:
from skopt import gp_minimize

# perform optimization
result = gp_minimize(evaluate_model, search_space)

In [14]:

# summarizing finding:
print('Best Accuracy: %.3f' % (1.0 - result.fun))
print('Best Parameters: %s' % (result.x))

Best Accuracy: 0.952
Best Parameters: [5.090818593037081, 'rbf', 2, 0.09044157203488279]


## Automatically Tune Algorithm Hyperparameters

Instead of using the Scikit-Learn GridSearch CV or RandomizedSearchCV, use Scikit-Optimize BayesSearchCV class

In [17]:
from skopt import BayesSearchCV


In [15]:

# define search space
params = dict()
params['C'] = (1e-6, 100.0, 'log-uniform')
params['gamma'] = (1e-6, 100.0, 'log-uniform')
params['degree'] = (1,5)
params['kernel'] = ['linear', 'poly', 'rbf', 'sigmoid']

In [21]:
# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define the search
search = BayesSearchCV(estimator=SVC(), search_spaces=params, n_jobs=-1, cv=cv)

Fit the model on the BayesSearchCV object to determine best parameters

In [22]:

# perform the search
search.fit(X, y)
# report the best result
print(search.best_score_)
print(search.best_params_)

0.9544159544159544
OrderedDict([('C', 15.705727074900938), ('degree', 2), ('gamma', 0.039313450342822895), ('kernel', 'rbf')])
