# Hyperopt Bayesian Hyperparameter Optimisation  - Table2

This Jupyter Notebook takes the data presented in Table2, and seeks an alternative form of hyperparameter optimisation. It was noted in the report that Bayesian optimisation can be an improvement over vanilla grid search and random search forms of hyperparameter optimisation, found within Scikit-learn. Bayesian optimisation is not found within Scikit-learn's API, however, so Hyperopt, an open-source Python library, was chosen. Hyperopt-sklearn is a python package that runs on top of this, although with hours of experimentation the API was not working, with unfixable bugs. Consequently, Hyperopt was used instead, with optimisation algorithms created from scratch.

Because the packages and data will be imported into this Jupyter notebook in the same way as in Table2, this process will be conducted without explaination.

In [1]:
# importing packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
import time
warnings.filterwarnings(action = 'ignore')
# ignoring warnings, to make the results simpler to read

In [2]:
# importing and modifying tables
new = pd.read_excel('Data2.xlsx', sheet_name = 'Table 2 - Games')
new.set_index('Patient No. (ID)', inplace=True)
new.drop(['# Possible Targets', 'Targets hit', 'Time per target / overall'], axis=1, inplace=True)
new.rename(columns={'Average hit time (s) (for successful hits)': 'Average hit time', 'Total time taken (s)': 'Time taken'}, inplace=True)
new = new.round({'Time taken' : 0, '% targets hit': 2, 'Average hit time' : 1})
easy_filter = new['Difficulty'] == 'Easy'
easy = new[easy_filter]
medium_filter = new['Difficulty'] == 'Medium'
medium = new[medium_filter]
hard_filter = new['Difficulty'] == 'Hard'
hard = new[hard_filter]
easy.drop(['Difficulty'], axis=1, inplace=True)
medium.drop(['Difficulty'], axis=1, inplace=True)
hard.drop(['Difficulty'], axis=1, inplace=True)

In [3]:
# splitting data
from sklearn.model_selection import train_test_split
X_easy = easy.drop(columns=['Output'])
y_easy = easy.Output
X_train_easy, X_test_easy, y_train_easy, y_test_easy = train_test_split(X_easy, y_easy, test_size=0.2, random_state=13)

X_medium = medium.drop(columns=['Output'])
y_medium = medium.Output
X_train_medium, X_test_medium, y_train_medium, y_test_medium = train_test_split(X_medium, y_medium, test_size=0.2, random_state=13)

X_hard = hard.drop(columns=['Output'])
y_hard = hard.Output
X_train_hard, X_test_hard, y_train_hard, y_test_hard = train_test_split(X_hard, y_hard, test_size=0.2, random_state=13)

We start by re-running the original gridsearchCV, and output the total time taken to train each algorithm, as well as the total time taken to run the entire grid search. Note that we do not care about the results, as these are already generated in Table2. Instead, we simply print the total time taken.

In [4]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression

In [5]:
parameters_RFC = {'n_estimators': [4, 6, 9], 
                  'max_features': ['log2', 'sqrt','auto'], 
                  'criterion': ['entropy', 'gini'],
                  'max_depth': [2, 3, 5, 10], 
                  'min_samples_split': [2, 3, 5],
                  'min_samples_leaf': [1,5,8]
                 }
parameters_XGBC = {'n_estimators': [400, 600, 800], 
                  'early_stopping_rounds': [3, 5, 7], 
                  'learning_rate': [0.05, 0.1, 0.3, 0.5]
                  } 
parameters_SVM = {'C' : [0.01, 0.1, 1, 10, 100],
                 'kernel': ['linear', 'rbf', 'sigmoid'],
                  'gamma': [0.001, 0.01, 0.1, 1]
                 }
parameters_decision_tree = {'criterion': ['entropy', 'gini'],
                             'max_depth': [2, 3, 5, 10],
                            'min_samples_split': [2, 3, 5],
                              'min_samples_leaf': [1,5,8]
                           }
parameters_NB = {'priors' : [None]}
parameters_kNN = {'n_neighbors': [3, 5, 10, 15],
                 'p': [1, 2, 3, 4]}
parameters_LDA = {'solver': ['svd', 'lsqr', 'eigen']}
parameters_logistic = {'C': np.logspace(-3, 3, 7),
                      'penalty': ['l1', 'l2']}

models = {'SVM': [SVC(), parameters_SVM],
          'Logistic': [LogisticRegression(), parameters_logistic],
          'LDA': [LinearDiscriminantAnalysis(), parameters_LDA],
          'kNN': [KNeighborsClassifier(), parameters_kNN],
          'Decision Tree': [DecisionTreeClassifier(), parameters_decision_tree],
          'Naive Bayes' : [GaussianNB(), parameters_NB],
        'Random Forest': [RandomForestClassifier(), parameters_RFC],
         'XGBoost': [XGBClassifier(), parameters_XGBC]}

In [6]:
def best_classifier(diff, models):
    global results
    results = {}
    results['Easy'] = {}
    results['Medium'] = {}
    results['Hard'] = {}
    for j in diff.keys():
        for i in models:
            start = time.time()
            CV = GridSearchCV(models[i][0], models[i][1], cv=4, iid=False, scoring='accuracy', n_jobs=1)
            CV = CV.fit(diff[j][0], diff[j][1])
            end = time.time()
            time_taken = end - start
            print(f'{i} on table {j} took {time_taken:.2f}s')
            results[j][i] = [CV.best_score_, CV.best_estimator_, time_taken]
    return results
diff = {'Easy': [X_train_easy, y_train_easy], 'Medium': [X_train_medium, y_train_easy], 'Hard': [X_train_hard, y_train_hard]}

In [7]:
best_classifier(diff, models)

SVM on table Easy took 0.92s
Logistic on table Easy took 0.29s
LDA on table Easy took 0.06s
kNN on table Easy took 0.32s
Decision Tree on table Easy took 1.07s
Naive Bayes on table Easy took 0.02s
Random Forest on table Easy took 27.25s
XGBoost on table Easy took 11.17s
SVM on table Medium took 1.14s
Logistic on table Medium took 0.43s
LDA on table Medium took 0.06s
kNN on table Medium took 0.34s
Decision Tree on table Medium took 1.12s
Naive Bayes on table Medium took 0.02s
Random Forest on table Medium took 26.30s
XGBoost on table Medium took 12.73s
SVM on table Hard took 1.29s
Logistic on table Hard took 0.43s
LDA on table Hard took 0.07s
kNN on table Hard took 0.45s
Decision Tree on table Hard took 1.40s
Naive Bayes on table Hard took 0.02s
Random Forest on table Hard took 25.18s
XGBoost on table Hard took 10.07s


{'Easy': {'SVM': [1.0,
   SVC(C=0.01, cache_size=200, class_weight=None, coef0=0.0,
     decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
     max_iter=-1, probability=False, random_state=None, shrinking=True,
     tol=0.001, verbose=False),
   0.9165854454040527],
  'Logistic': [0.95,
   LogisticRegression(C=10.0, class_weight=None, dual=False, fit_intercept=True,
             intercept_scaling=1, max_iter=100, multi_class='warn',
             n_jobs=None, penalty='l1', random_state=None, solver='warn',
             tol=0.0001, verbose=0, warm_start=False),
   0.29024410247802734],
  'LDA': [1.0,
   LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,
                 solver='svd', store_covariance=False, tol=0.0001),
   0.05580735206604004],
  'kNN': [1.0,
   KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
              metric_params=None, n_jobs=None, n_neighbors=3, p=2,
              weights='uniform'),
   0.32413

In [8]:
def find_total_time(results):
    averages = {}
    total = 0
    for i in results.keys():
        for j in results[i].keys():
            averages[j] = 0
    for i in results.keys():
        for j in results[i].keys():
            averages[j] += results[i][j][2]
            total += results[i][j][2]
    print(averages)
    print('\n' + f'Total time taken is {total:.2f}s')
    return total

In [9]:
total = find_total_time(results)

{'SVM': 3.341367721557617, 'Logistic': 1.146984338760376, 'LDA': 0.181473970413208, 'kNN': 1.1170148849487305, 'Decision Tree': 3.5927746295928955, 'Naive Bayes': 0.061865806579589844, 'Random Forest': 78.7322130203247, 'XGBoost': 33.9632089138031}

Total time taken is 122.14s


Next, we import Hyperopt.

In [10]:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

We modify the hyperparameter space. This looks similar to the way it was specified in the GridSearchCV cace above. This is deliberate, because we want a fair comparisation between the two. Note, however, that Hyperopt, rather than iterate through every combination of values, will select a random set of values initially, and then change the hyperparameter values, within the range provided, with the objective of maximising model accuracy.

In [11]:
parameters_RFC = {'max_depth': hp.choice('max_depth', range(1,20)),
                 'max_features': hp.choice('max_features', ['log2', 'sqrt','auto']),
                 'n_estimators': hp.choice('n_estimators', range(100,500)),
                 'criterion': hp.choice('criterion', ['gini', 'entropy']), 
                 'min_samples_split': hp.choice('min_samples_split', range(2,10)),
                  'min_samples_leaf': hp.choice('min_samples_leaf', range(1,10))}
parameters_XGBC = {'n_estimators': hp.choice('n_estimators', range(200, 800)), 
                  'early_stopping_rounds': hp.choice('early_stopping_rounds', range(2, 8)), 
                  'learning_rate': hp.uniform('learning_rate', 0.05, 0.5)
                  } 
parameters_SVM = {'C' : hp.uniform('C', 0.01, 100),
                 'kernel': hp.choice('kernel', ['linear', 'rbf', 'sigmoid']),
                  'gamma': hp.uniform('gamma', 0.001, 1)
                 }
parameters_decision_tree = {'max_depth': hp.choice('max_depth', range(1,20)),
                             'criterion': hp.choice('criterion', ['gini', 'entropy']), 
                             'min_samples_split': hp.choice('min_samples_split', range(2,10)),
                              'min_samples_leaf': hp.choice('min_samples_leaf', range(1,10))}
parameters_NB = {'priors' : hp.choice('priors', [None])}
parameters_kNN = {'n_neighbors': hp.choice('n_neighbors', range(2, 30)),
                 'p': hp.choice('p', range(1, 6))}
parameters_LDA = {'solver': hp.choice('solver', ['svd', 'lsqr', 'eigen'])}
parameters_logistic = {'C': hp.uniform('C', 0.01, 10000),
                      'penalty': hp.choice('penalty', ['l1', 'l2'])}

models = {'SVM': [SVC(), parameters_SVM],
          'Logistic': [LogisticRegression(), parameters_logistic],
          'LDA': [LinearDiscriminantAnalysis(), parameters_LDA],
          'kNN': [KNeighborsClassifier(), parameters_kNN],
          'Decision Tree': [DecisionTreeClassifier(), parameters_decision_tree],
          'Naive Bayes' : [GaussianNB(), parameters_NB],
        'Random Forest': [RandomForestClassifier(), parameters_RFC],
         'XGBoost': [XGBClassifier(), parameters_XGBC]}

The relevant functions necessary for Hyperopt have been created, modified from the functions in the GridSearchCV case.

In [12]:
def f(params):
    global best
    acc=cross_val_score(a, b, c, cv=4).mean()
    if acc > best:
            best = acc
    return {'loss': -acc, 'status': STATUS_OK}

    
def best_classifier(diff, models, max_evals):    
    global results2, a, b, c, best
    results2 = {}
    results2['Easy'] = {}
    results2['Medium'] = {}
    results2['Hard'] = {}
    for j in diff.keys():
        for i in models.keys():
            best = 0
            trials=Trials()
            a = models[i][0]
            b = diff[j][0]
            c = diff[j][1]
            start = time.time()
            best_result = fmin(f, models[i][1], algo=tpe.suggest, max_evals=max_evals, trials=trials)
            end = time.time()
            time_taken = end - start
            print(f'{i} on table {j} scored {best:.3f} and took {(time_taken):.2f}s')
            results2[j][i] = [best, best_result, time_taken]
    print(results2)
    return results2

diff = {'Easy': [X_train_easy, y_train_easy], 'Medium': [X_train_medium, y_train_easy], 'Hard': [X_train_hard, y_train_hard]}

Then the models are run and trained, each one 50 times.

In [13]:
best_classifier(diff, models, 50)

100%|██████████████████████████████████████████████████| 50/50 [00:00<00:00, 53.55it/s, best loss: -0.8999999999999999]
SVM on table Easy scored 0.900 and took 0.90s
100%|█████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 61.74it/s, best loss: -0.5]
Logistic on table Easy scored 0.500 and took 0.82s
100%|█████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 63.95it/s, best loss: -1.0]
LDA on table Easy scored 1.000 and took 0.79s
100%|████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 53.00it/s, best loss: -0.95]
kNN on table Easy scored 0.950 and took 0.96s
100%|█████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 56.73it/s, best loss: -1.0]
Decision Tree on table Easy scored 1.000 and took 0.88s
100%|██████████████████████████████████████████████████| 50/50 [00:00<00:00, 65.71it/s, best loss: -0.8999999999999999]
Naive Bayes on table Easy scored 0.

{'Easy': {'SVM': [0.8999999999999999,
   {'C': 79.58108840989637, 'gamma': 0.9682645046571987, 'kernel': 1},
   0.8976011276245117],
  'Logistic': [0.5, {'C': 8920.560047175679, 'penalty': 0}, 0.817847490310669],
  'LDA': [1.0, {'solver': 0}, 0.7878937721252441],
  'kNN': [0.95, {'n_neighbors': 21, 'p': 4}, 0.9564435482025146],
  'Decision Tree': [1.0,
   {'criterion': 1,
    'max_depth': 3,
    'min_samples_leaf': 5,
    'min_samples_split': 5},
   0.8776247501373291],
  'Naive Bayes': [0.8999999999999999, {'priors': 0}, 0.7669501304626465],
  'Random Forest': [1.0,
   {'criterion': 1,
    'max_depth': 18,
    'max_features': 1,
    'min_samples_leaf': 3,
    'min_samples_split': 3,
    'n_estimators': 357},
   3.8210830688476562],
  'XGBoost': [1.0,
   {'early_stopping_rounds': 5,
    'learning_rate': 0.37354141745490926,
    'n_estimators': 436},
   5.425570964813232]},
 'Medium': {'SVM': [0.7000000000000001,
   {'C': 80.07012820627709, 'gamma': 0.12034864392429889, 'kernel': 1},
  

The mean performance of each model is then calculated.

In [14]:
def find_average_score():
    averages = {}
    for i in results2.keys():
        for j in results2[i].keys():
            averages[j] = 0
    for i in results2.keys():
        for j in results2[i].keys():
            averages[j] += results2[i][j][0]
    for x in averages.keys():
        averages[x] /= 3
    print(averages)

In [15]:
find_average_score()

{'SVM': 0.7666666666666667, 'Logistic': 0.5333333333333333, 'LDA': 0.9, 'kNN': 0.9333333333333332, 'Decision Tree': 0.9333333333333332, 'Naive Bayes': 0.8333333333333334, 'Random Forest': 0.9666666666666667, 'XGBoost': 0.9166666666666666}


As we can see, Random Forest appears to perform best. The accuracy scores for most models are very similar to the grid search case.
What is more important here, however, is total time taken, for the efficiency of training is the major benefit of Bayesian Optimisation over grid search. Consequently, we output the total time taken, and difference and percentage difference compared to the grid search case.

In [16]:
total2 = find_total_time(results2)

{'SVM': 2.641939640045166, 'Logistic': 2.45457124710083, 'LDA': 2.9139018058776855, 'kNN': 3.0638532638549805, 'Decision Tree': 2.6558897495269775, 'Naive Bayes': 2.3068313598632812, 'Random Forest': 11.991602897644043, 'XGBoost': 17.45606756210327}

Total time taken is 45.48s


In [17]:
percentage_change = ((total2-total) / total) * 100
print(f'Bayesian Optimisation reduced training time by {total-total2:.2f}s, and by {-percentage_change:.2f}%')

Bayesian Optimisation reduced training time by 76.65s, and by 62.76%


As we can see, Bayesian Optimisation improves on training time significaintly, and produces similar results.