# Hyperopt Bayesian Hyperparameter Optimisation  - Table2

This Jupyter Notebook takes the data presented in Table2, and seeks an alternative form of hyperparameter optimisation. It was noted in the report that Bayesian optimisation can be an improvement over vanilla grid search and random search forms of hyperparameter optimisation, found within Scikit-learn. Bayesian optimisation is not found within Scikit-learn's API, however, so Hyperopt, an open-source Python library, was chosen. Hyperopt-sklearn is a python package that runs on top of this, although with hours of experimentation the API was not working, with unfixable bugs. Consequently, Hyperopt was used instead, with optimisation algorithms created from scratch.

Because the packages and data will be imported into this Jupyter notebook in the same way as in Table2, this process will be conducted without explaination.

In [1]:
# importing packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings(action = 'ignore')
# ignoring warnings, to make the results simpler to read

In [2]:
# importing and modifying tables
new = pd.read_excel('Data2.xlsx', sheet_name = 'Table 2 - Games')
new.set_index('Patient No. (ID)', inplace=True)
new.drop(['# Possible Targets', 'Targets hit', 'Time per target / overall'], axis=1, inplace=True)
new.rename(columns={'Average hit time (s) (for successful hits)': 'Average hit time', 'Total time taken (s)': 'Time taken'}, inplace=True)
new = new.round({'Time taken' : 0, '% targets hit': 2, 'Average hit time' : 1})
easy_filter = new['Difficulty'] == 'Easy'
easy = new[easy_filter]
medium_filter = new['Difficulty'] == 'Medium'
medium = new[medium_filter]
hard_filter = new['Difficulty'] == 'Hard'
hard = new[hard_filter]
easy.drop(['Difficulty'], axis=1, inplace=True)
medium.drop(['Difficulty'], axis=1, inplace=True)
hard.drop(['Difficulty'], axis=1, inplace=True)

In [3]:
# splitting data
from sklearn.model_selection import train_test_split
X_easy = easy.drop(columns=['Output'])
y_easy = easy.Output
X_train_easy, X_test_easy, y_train_easy, y_test_easy = train_test_split(X_easy, y_easy, test_size=0.2, random_state=13)

X_medium = medium.drop(columns=['Output'])
y_medium = medium.Output
X_train_medium, X_test_medium, y_train_medium, y_test_medium = train_test_split(X_medium, y_medium, test_size=0.2, random_state=13)

X_hard = hard.drop(columns=['Output'])
y_hard = hard.Output
X_train_hard, X_test_hard, y_train_hard, y_test_hard = train_test_split(X_hard, y_hard, test_size=0.2, random_state=13)

Next, we import Hyperopt, and the classification algorithms we are going to run.

In [4]:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

In [5]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression

We modify the hyperparameter space. This looks similar to the way it was specified in the GridSearchCV space. This is deliberate, because we want a fair comparisation between the two. Note, however, that Hyperopt, rather than iterate through every combination of values, will select a random set of values initially, and then change the hyperparameter values, within the range provided, with the objective of maximising model accuracy.

In [6]:
parameters_RFC = {'max_depth': hp.choice('max_depth', range(1,20)),
                 'max_features': hp.choice('max_features', ['log2', 'sqrt','auto']),
                 'n_estimators': hp.choice('n_estimators', range(100,500)),
                 'criterion': hp.choice('criterion', ['gini', 'entropy']), 
                 'min_samples_split': hp.choice('min_samples_split', range(2,10)),
                  'min_samples_leaf': hp.choice('min_samples_leaf', range(1,10))}
parameters_XGBC = {'n_estimators': hp.choice('n_estimators', range(200, 800)), 
                  'early_stopping_rounds': hp.choice('early_stopping_rounds', range(2, 8)), 
                  'learning_rate': hp.uniform('learning_rate', 0.05, 0.5)
                  } 
parameters_SVM = {'C' : hp.uniform('C', 0.01, 100),
                 'kernel': hp.choice('kernel', ['linear', 'rbf', 'sigmoid']),
                  'gamma': hp.uniform('gamma', 0.001, 1)
                 }
parameters_decision_tree = {'max_depth': hp.choice('max_depth', range(1,20)),
                             'criterion': hp.choice('criterion', ['gini', 'entropy']), 
                             'min_samples_split': hp.choice('min_samples_split', range(2,10)),
                              'min_samples_leaf': hp.choice('min_samples_leaf', range(1,10))}
parameters_NB = {'priors' : hp.choice('priors', [None])}
parameters_kNN = {'n_neighbors': hp.choice('n_neighbors', range(2, 30)),
                 'p': hp.choice('p', range(1, 6))}
parameters_LDA = {'solver': hp.choice('solver', ['svd', 'lsqr', 'eigen'])}
parameters_logistic = {'C': hp.uniform('C', 0.01, 10000),
                      'penalty': hp.choice('penalty', ['l1', 'l2'])}

models = {'SVM': [SVC(), parameters_SVM],
          'Logistic': [LogisticRegression(), parameters_logistic],
          'LDA': [LinearDiscriminantAnalysis(), parameters_LDA],
          'kNN': [KNeighborsClassifier(), parameters_kNN],
          'Decision Tree': [DecisionTreeClassifier(), parameters_decision_tree],
          'Naive Bayes' : [GaussianNB(), parameters_NB],
        'Random Forest': [RandomForestClassifier(), parameters_RFC],
         'XGBoost': [XGBClassifier(), parameters_XGBC]}

The relevant functions necessary for Hyperopt have been created, modified from the functions in the GridSearchCV case.

In [7]:
def f(params):
    global best
    acc=cross_val_score(a, b, c, cv=4).mean()
    if acc > best:
            best = acc
    return {'loss': -acc, 'status': STATUS_OK}

    
def best_classifier(diff, models):    
    global results, a, b, c, best
    results = {}
    results['Easy'] = {}
    results['Medium'] = {}
    results['Hard'] = {}
    for j in diff.keys():
        for i in models.keys():
            best = 0
            trials=Trials()
            a = models[i][0]
            b = diff[j][0]
            c = diff[j][1]
            best_result = fmin(f, models[i][1], algo=tpe.suggest, max_evals=250, trials=trials)
            print(f'{i} on table {j} scored {best}')
            results[j][i] = [best, best_result]
    print(results)
    return results

diff = {'Easy': [X_train_easy, y_train_easy], 'Medium': [X_train_medium, y_train_easy], 'Hard': [X_train_hard, y_train_hard]}

Then the models are run and trained, each one 250 times.

In [8]:
best_classifier(diff, models)

100%|████████████████████████████████████████████████| 250/250 [00:06<00:00, 36.24it/s, best loss: -0.8999999999999999]
SVM on table Easy scored 0.8999999999999999
100%|███████████████████████████████████████████████████████████████| 250/250 [00:04<00:00, 50.88it/s, best loss: -0.5]
Logistic on table Easy scored 0.5
100%|███████████████████████████████████████████████████████████████| 250/250 [00:04<00:00, 53.91it/s, best loss: -1.0]
LDA on table Easy scored 1.0
100%|██████████████████████████████████████████████████████████████| 250/250 [00:05<00:00, 48.15it/s, best loss: -0.95]
kNN on table Easy scored 0.95
100%|███████████████████████████████████████████████████████████████| 250/250 [00:04<00:00, 53.46it/s, best loss: -1.0]
Decision Tree on table Easy scored 1.0
100%|████████████████████████████████████████████████| 250/250 [00:03<00:00, 63.38it/s, best loss: -0.8999999999999999]
Naive Bayes on table Easy scored 0.8999999999999999
100%|███████████████████████████████████████████████

{'Easy': {'SVM': [0.8999999999999999, {'C': 28, 'gamma': 36, 'kernel': 2}],
  'Logistic': [0.5, {'C': 4, 'penalty': 0}],
  'LDA': [1.0, {'solver': 1}],
  'kNN': [0.95, {'n_neighbors': 9, 'p': 0}],
  'Decision Tree': [1.0,
   {'criterion': 1,
    'max_depth': 5,
    'min_samples_leaf': 6,
    'min_samples_split': 2}],
  'Naive Bayes': [0.8999999999999999, {'priors': 0}],
  'Random Forest': [1.0,
   {'criterion': 1,
    'max_depth': 15,
    'max_features': 2,
    'min_samples_leaf': 0,
    'min_samples_split': 5,
    'n_estimators': 239}],
  'XGBoost': [1.0,
   {'early_stopping_rounds': 0, 'learning_rate': 48, 'n_estimators': 419}]},
 'Medium': {'SVM': [0.7000000000000001, {'C': 5, 'gamma': 23, 'kernel': 1}],
  'Logistic': [0.65, {'C': 1, 'penalty': 1}],
  'LDA': [0.95, {'solver': 1}],
  'kNN': [0.9, {'n_neighbors': 25, 'p': 3}],
  'Decision Tree': [0.95,
   {'criterion': 0,
    'max_depth': 3,
    'min_samples_leaf': 6,
    'min_samples_split': 3}],
  'Naive Bayes': [0.8500000000000001,

The mean performance of each model is then calculated.

In [9]:
def find_average():
    averages = {}
    for i in results.keys():
        for j in results[i].keys():
            averages[j] = 0
    for i in results.keys():
        for j in results[i].keys():
            averages[j] += results[i][j][0]
    for x in averages.keys():
        averages[x] /= 3
    print(averages)

In [10]:
find_average()

{'SVM': 0.7666666666666667, 'Logistic': 0.5333333333333333, 'LDA': 0.9, 'kNN': 0.9333333333333332, 'Decision Tree': 0.9333333333333332, 'Naive Bayes': 0.8333333333333334, 'Random Forest': 0.9833333333333334, 'XGBoost': 0.9166666666666666}


As we can see, Random Forest appears to perform best. It has an identical score to hyperparameter optimisation using Grid Search, moreover, so this does not change our optimal model.