# Further Ensemble Methods - Table1

This Jypyter Notebook takes the first table, and runs ensemble methods to try to maximise model performance. The results of this will be used in the analyses of the report. Note that we already have 100% accuracy for table1, and so we cannot improve upon this by using more ensemble methods (note that random forests, which were chosen as the optimal model for table1, are an ensemble method themself). This notebook is produced to see whether model performance can be enhanced by running ensemble methods. In order to make a comparison between results, we do not use  use GridSearchCV, and instead try to see whether we can improve upon the standard results for several classifiers by using different ensemble methods.

First, we import the necessary packages, and view a summary of the data.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
import warnings
warnings.filterwarnings(action = 'ignore')
# ignoring warnings, to make the results simpler to read

In [3]:
new = pd.read_excel('Data1.xlsx', sheet_name = 'Table1 - Assessments')

In [4]:
new.set_index('Patient No. (ID)', inplace=True)

In [5]:
from sklearn.model_selection import train_test_split
X = new.drop(columns=['Output'])
y = new.Output
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=12)

In [6]:
def label_encoder(y):
    if y == 'Easy':
        return 1
    if y == 'Medium':
        return 2
    if y == 'Hard':
        return 3
y_train = y_train.apply(label_encoder)
y_test = y_test.apply(label_encoder)

In [7]:
X_train.sample(5)

Unnamed: 0_level_0,Average Shoulder abduction,Average Shoulder flexion,Average Elbow flexion,Total time taken
Patient No. (ID),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
40,97,74,83,101
12,56,55,57,121
53,122,91,92,94
56,125,94,94,93
65,134,101,97,89


In [8]:
y_train.sample(5)

Patient No. (ID)
22    1
54    3
21    1
7     1
28    2
Name: Output, dtype: int64

## Bagging

Bagging consists of making random samples of the dataset with replacement, and training a weak learner on each. A strong learner is then used, that is a combination of these weak learners. We firstly make a simple attempt at bagging, by applying Scikit-learn's BaggingClassifier() to the eight classification models seen in table 1.

In [9]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression

In [10]:
classifiers = [SVC(), LogisticRegression(), LinearDiscriminantAnalysis(), KNeighborsClassifier(), 
               DecisionTreeClassifier(), GaussianNB(), RandomForestClassifier(), XGBClassifier()]
names = ['SVC', 'Logistic Regression', 'LDA', 'KNN', 'Decision Tree', 'GaussianNB', 'Random Forest', 'XGBoost']
for i, classifier in enumerate(classifiers):
    first_score = cross_val_score(classifier, X_train, y_train, cv=5)
    bagging_classifier = BaggingClassifier(classifier)
    bagging_score = cross_val_score(bagging_classifier, X_train, y_train, cv=5)
    print(f'First accuracy for {names[i]}: mean: {first_score.mean()}, std: {first_score.std()}')
    print(f'Bagging accuracy for {names[i]}: mean: {bagging_score.mean()}, std: {bagging_score.std()}')

First accuracy for SVC: mean: 0.861888111888112, std: 0.12449603909768377
Bagging accuracy for SVC: mean: 0.8452214452214453, std: 0.10957483861906214
First accuracy for Logistic Regression: mean: 0.9636363636363636, std: 0.07272727272727271
Bagging accuracy for Logistic Regression: mean: 0.9636363636363636, std: 0.07272727272727271
First accuracy for LDA: mean: 0.9469696969696969, std: 0.0720294807515437
Bagging accuracy for LDA: mean: 0.9469696969696969, std: 0.0720294807515437
First accuracy for KNN: mean: 0.9469696969696969, std: 0.0720294807515437
Bagging accuracy for KNN: mean: 0.9469696969696969, std: 0.0720294807515437
First accuracy for Decision Tree: mean: 0.9818181818181818, std: 0.036363636363636376
Bagging accuracy for Decision Tree: mean: 0.9818181818181818, std: 0.036363636363636376
First accuracy for GaussianNB: mean: 0.9636363636363636, std: 0.07272727272727271
Bagging accuracy for GaussianNB: mean: 0.9636363636363636, std: 0.07272727272727271
First accuracy for Random

As we can see, however, the accuracy scores remained relatively consistent. Indeed, the bagging accuracy only increased for XGBoost. The standard deviation also remained relatively consistent.

We now employ Scikit-learn's VotingClassifier(), which produces an ensemble of the eight other classifications (weak learners) by taking a 'vote' of the eight for their classification, and selecting the output that has a majority. That is, given an input, if five of the classifiers output Easy, the Ensemble classifier will choose the classification Easy, because it is selected by a majority of weak learners. We create this strong learner, called Ensemble, and compare it to the performance of the individual weak learners.

Note that these weak learners have default hyperparameters, not optimised hyperparameters seen in the Table1 notebook. This is why the accuracy scores for the individual classification models are not 100%. This provides strong evidence that the hyperparameter optimisation used in Table1 and Table2 notebooks was working, and hence was a crucial step in optimally training each of the models.

In [11]:
from sklearn.ensemble import VotingClassifier
voting = VotingClassifier(estimators = [(names[i], classifiers[i]) for i in range (8)], voting='hard')
classifiers2 = [SVC(), LogisticRegression(), LinearDiscriminantAnalysis(), KNeighborsClassifier(), 
               DecisionTreeClassifier(), GaussianNB(), RandomForestClassifier(), XGBClassifier(), voting]
names2 = ['SVC', 'Logistic Regression', 'LDA', 'KNN', 'Decision Tree', 'GaussianNB', 'Random Forest', 'XGBoost', 'Ensemble']
for i, classifier in enumerate(classifiers2):
    bagging_score2 = cross_val_score(classifier, X_train, y_train, cv=5)
    print(f'Accuracy for {names2[i]}: mean: {bagging_score2.mean()}, std: {bagging_score2.std()}')

Accuracy for SVC: mean: 0.861888111888112, std: 0.12449603909768377
Accuracy for Logistic Regression: mean: 0.9636363636363636, std: 0.07272727272727271
Accuracy for LDA: mean: 0.9469696969696969, std: 0.0720294807515437
Accuracy for KNN: mean: 0.9469696969696969, std: 0.0720294807515437
Accuracy for Decision Tree: mean: 0.9484848484848485, std: 0.04215281134316231
Accuracy for GaussianNB: mean: 0.9636363636363636, std: 0.07272727272727271
Accuracy for Random Forest: mean: 0.9818181818181818, std: 0.036363636363636376
Accuracy for XGBoost: mean: 0.9484848484848485, std: 0.04215281134316231
Accuracy for Ensemble: mean: 0.9636363636363636, std: 0.07272727272727271


Accuracy scores for Ensemble did significantly better, and had the joint lowest standard deviation. This shows that bagging ensemble methods, in taking a majority vote among multiple weak learners, is beneficial in training the models. Random Forests provide a very similar method of bagging to fit the data, and is one of the reasons why it performed best in both workbooks: bagging is a widely successful ensemble method in machine learning models, and is especially useful in the case of smaller datasets.

## Boosting

Boosting, in contrast to bagging, works by iteratively creating multiple models using a weak learner. Each new model learns from the previous, and so a stronger model is created each time. In Table1 and Table2 XGBoost was used as the most popular boosting technique, from which we already have the results, and so no further analysis will be taken from Boosting.

## Stacking

Stacking seeks to combine multiple classification methods via a meta-classifier, which is trained on the outputs, or meta-features, of the individual classification models in the ensemble. The report sought to copare stacking with the other two ensemble methods, and so this is what will happen here. The below code uses the itertools library to create every possible combination of three classifiers from the eight used above, which are used for the first layer of the stacking process. The outputs of these three classifiers are then taken by a meta-cassifier (in this case a random forest classifier, as that is what has achieved best results in Table1 and Table2), and outputs from this are used to predict the outputs of the training data. The accuracy scores from each combination of classifiers are taken using a mean of cross-validation scors, and displayed below:

In [12]:
from itertools import combinations # importing modules
from mlxtend.classifier import StackingClassifier
def stacking(no_stacked):
    rf = RandomForestClassifier() # the meta-classifier to be applied to the outputs of the first layer
    classifiers = [SVC(), LogisticRegression(), LinearDiscriminantAnalysis(), KNeighborsClassifier(), 
                   DecisionTreeClassifier(), GaussianNB(), RandomForestClassifier(), XGBClassifier()] # list of eight classifiers
    names = ['SVC', 'Logistic Regression', 'LDA', 'KNN', 'Decision Tree', 'GaussianNB', 'Random Forest', 'XGBoost']
    combined = {} # creates a blank dictoinary
    for i, p in enumerate(names):
        combined[p] = classifiers[i] # creating a dictionary of name: algorithm pairs
    comb = list(combinations(combined.keys(), no_stacked)) # produces a list containing every possible combination of the eight
    # classification algorithms, with number no_stacked algorithms in the first layer
    for i in comb: # produces a stacking classifier on each combination of algorithms created in comb
        stacking = StackingClassifier(classifiers=[combined[i[k]] for k in range(no_stacked)], meta_classifier=rf)
        # the above line uses a list comprehension to obtain each combination created in comb
        stacking_score = cross_val_score(stacking, X_train, y_train, cv=5) # uses cross-validation to produce an accuracy score
        printing_names = [i[k] for k in range(no_stacked)] # lists the names of algorithms used in each combination
        print(f'{printing_names} stacked: mean: {stacking_score.mean()}, std: {stacking_score.std()}') # prints the scores

In [13]:
stacking(3)

['SVC', 'Logistic Regression', 'LDA'] stacked: mean: 0.9331002331002332, std: 0.06258035841187393
['SVC', 'Logistic Regression', 'KNN'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'Logistic Regression', 'Decision Tree'] stacked: mean: 0.9484848484848485, std: 0.06748805288278813
['SVC', 'Logistic Regression', 'GaussianNB'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'Logistic Regression', 'Random Forest'] stacked: mean: 0.9484848484848485, std: 0.06748805288278813
['SVC', 'Logistic Regression', 'XGBoost'] stacked: mean: 0.9818181818181818, std: 0.036363636363636376
['SVC', 'LDA', 'KNN'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'LDA', 'Decision Tree'] stacked: mean: 0.9164335664335665, std: 0.052890643218602304
['SVC', 'LDA', 'GaussianNB'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'LDA', 'Random Forest'] stacked: mean: 0.9469696969696969, std: 0.0720294807515437
['SVC', 'LDA', 'XGBoost'] sta

Stacking, here, increasres the model performance even further than bagging. Indeed, the combination of KNN, Random Forest and XGBoost (which itself is a boosting algorithm) achieves 100% accuracy. We also try different numbers of classifiers to be stacked simultaneously:

In [14]:
stacking(2)

['SVC', 'Logistic Regression'] stacked: mean: 0.9331002331002332, std: 0.06258035841187393
['SVC', 'LDA'] stacked: mean: 0.8785547785547786, std: 0.1357804355865562
['SVC', 'KNN'] stacked: mean: 0.8785547785547786, std: 0.1357804355865562
['SVC', 'Decision Tree'] stacked: mean: 0.95, std: 0.06666666666666667
['SVC', 'GaussianNB'] stacked: mean: 0.9164335664335665, std: 0.052890643218602304
['SVC', 'Random Forest'] stacked: mean: 0.9315850815850816, std: 0.06710655726354578
['SVC', 'XGBoost'] stacked: mean: 0.9164335664335663, std: 0.052890643218602304
['Logistic Regression', 'LDA'] stacked: mean: 0.9636363636363636, std: 0.07272727272727271
['Logistic Regression', 'KNN'] stacked: mean: 0.9636363636363636, std: 0.07272727272727271
['Logistic Regression', 'Decision Tree'] stacked: mean: 1.0, std: 0.0
['Logistic Regression', 'GaussianNB'] stacked: mean: 0.9636363636363636, std: 0.07272727272727271
['Logistic Regression', 'Random Forest'] stacked: mean: 0.9469696969696969, std: 0.072029480

In [15]:
stacking(4)

['SVC', 'Logistic Regression', 'LDA', 'KNN'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'Logistic Regression', 'LDA', 'Decision Tree'] stacked: mean: 0.9484848484848485, std: 0.06748805288278813
['SVC', 'Logistic Regression', 'LDA', 'GaussianNB'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'Logistic Regression', 'LDA', 'Random Forest'] stacked: mean: 0.9315850815850816, std: 0.06710655726354578
['SVC', 'Logistic Regression', 'LDA', 'XGBoost'] stacked: mean: 0.9303030303030303, std: 0.08549536957373262
['SVC', 'Logistic Regression', 'KNN', 'Decision Tree'] stacked: mean: 0.9469696969696969, std: 0.0720294807515437
['SVC', 'Logistic Regression', 'KNN', 'GaussianNB'] stacked: mean: 0.914918414918415, std: 0.07817724500234807
['SVC', 'Logistic Regression', 'KNN', 'Random Forest'] stacked: mean: 0.9636363636363636, std: 0.07272727272727271
['SVC', 'Logistic Regression', 'KNN', 'XGBoost'] stacked: mean: 0.9636363636363636, std: 0.072727272727272

The model performance with two models to be stacked at once is slightly worse than stacking three, and stacking four exhibits similar performance, with 100% accuracy with the combination of KNN, Decision Tree, Random Forest, and XGBoost.

As we can see, the mean scores are much higher than the default classifier case, and indeed the ensemble method case. Therefore we accept stacking as the best ensemble method characterised here. Because Table1 already has 100% accuracy, these results will not be integrated into the main models. Nevertheless, for the purposes of small dataset escalation methods, this notebook provides a detailed comparison.