# Aufgabe 7: AutoML

AutoML stellt Methoden und Prozesse zur Verfügung, um maschinelles Lernen auch Nicht-Machine-Learning-Experten zugänglich zu machen. Die Universität Freiburg hat eine Internetseite mit weiterführenden Informationen und einigen AutoML-Packages zusammengestellt, die unter folgenden Link zu erreichen ist: https://www.ml4aad.org/automl/.

###### (a) Wählen Sie ein AutoML Package. Begründen Sie Ihre Auswahl.

Wir haben uns für _H2O AutoML_ entschieden, weil es in Hinblick auf die Reproduzierbarkeit der Ergebnisse am leichtesten auf verschiedenen Plattformen installierbar ist.

Bei _auto-sklearn_ beispielsweise fehlten die Berechtigungen, um die benötigten Dependencies auf SDIL zu installieren.
Für andere AutoML-Bibliotheken wurde lediglich ein git-repo zur Verfügung gestellt, weshalb diese nicht einfach über eine `requirements.txt` installiert werden können.

Zudem fanden wir die Dokumentation zu _H2O AutoML_ auf den ersten Blick sehr übersichtlich.

###### (b) Führen Sie die Klassifikationsaufgabe von Aufgabe 3 mit AutoML durch. Vergleichen Sie die Ergebnisse mit den Ergebnissen aus Aufgabe 3.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html

In [1]:
# Import Libraries
import time
import multiprocessing

import pandas as pd
import numpy as np

from sklearn import svm
from sklearn import model_selection
from sklearn.model_selection import GridSearchCV

import h2o
from h2o.automl import H2OAutoML
from h2o.frame import H2OFrame

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

# Fix for display error of tables
from IPython.core.display import HTML
HTML("<style>.rendered_html table {width:auto !important;}</style>")

In [2]:
# initialize h2o
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "1.8.0_152-release"; OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12); OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)
  Starting server from /Users/d062356/miniconda3/envs/smart-data-analytics/h2o_jar/h2o.jar
  Ice root: /var/folders/g3/8_fmjvcd5t3fqv5m5c3zz20m0000gn/T/tmprqtgkwgs
  JVM stdout: /var/folders/g3/8_fmjvcd5t3fqv5m5c3zz20m0000gn/T/tmprqtgkwgs/h2o_d062356_started_from_python.out
  JVM stderr: /var/folders/g3/8_fmjvcd5t3fqv5m5c3zz20m0000gn/T/tmprqtgkwgs/h2o_d062356_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,Europe/Berlin
H2O data parsing timezone:,UTC
H2O cluster version:,3.18.0.2
H2O cluster version age:,"1 year, 2 months and 19 days !!!"
H2O cluster name:,H2O_from_python_d062356_5ql9sa
H2O cluster total nodes:,1
H2O cluster free memory:,3.556 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8


In [3]:
# Load dataset iris
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
iris = pd.read_csv(url, names=names)

In [4]:
# Split-out validation dataset
array = iris.values
X = array[:,0:4]
Y = array[:,4]
test_size = 0.20
seed = 7
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)

In [5]:
# prepare data by creating a train and test array and transforming it into an h2o frame
train_array = np.column_stack([X_train, y_train])
test_array = np.column_stack([X_test, y_test])

train = H2OFrame(python_obj=train_array, column_names=names)
test = H2OFrame(python_obj=test_array, column_names=names)

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [6]:
# Identify predictors and response
x = names
y = "class"
x.remove(y)

In [7]:
# set number of models to 20 and start training
aml = H2OAutoML(max_models=20, seed=1, nfolds=10)
aml.train(x=x, y=y, training_frame=train)                                 

AutoML progress: |████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [8]:
# View the AutoML Leaderboard
lb = aml.leaderboard
display(lb.head(rows=lb.nrows)) # Print all rows instead of default (10 rows)

model_id,mean_per_class_error
GBM_grid_0_AutoML_20190525_162151_model_15,0.011111
DRF_0_AutoML_20190525_162151,0.011111
XRT_0_AutoML_20190525_162151,0.011111
DeepLearning_grid_0_AutoML_20190525_162151_model_0,0.011111
GBM_grid_0_AutoML_20190525_162151_model_8,0.022222
GBM_grid_0_AutoML_20190525_162151_model_11,0.022222
GLM_grid_0_AutoML_20190525_162151_model_0,0.022222
GBM_grid_0_AutoML_20190525_162151_model_9,0.022222
StackedEnsemble_AllModels_0_AutoML_20190525_162151,0.022222
GBM_grid_0_AutoML_20190525_162151_model_2,0.033333




In [9]:
# The leader model is stored here
display(aml.leader)

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  GBM_grid_0_AutoML_20190525_162151_model_15


ModelMetricsMultinomial: gbm
** Reported on train data. **

MSE: 8.349901175356046e-06
RMSE: 0.002889619555470243
LogLoss: 0.0007618680879659699
Mean Per-Class Error: 0.0
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
29.0,0.0,0.0,0.0,0 / 29
0.0,30.0,0.0,0.0,0 / 30
0.0,0.0,30.0,0.0,0 / 30
29.0,30.0,30.0,0.0,0 / 89


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,1.0
2,1.0
3,1.0



ModelMetricsMultinomial: gbm
** Reported on validation data. **

MSE: 0.029858859578529552
RMSE: 0.17279716310903243
LogLoss: 0.10589490037197341
Mean Per-Class Error: 0.037037037037037035
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
14.0,0.0,0.0,0.0,0 / 14
0.0,8.0,0.0,0.0,0 / 8
0.0,1.0,8.0,0.1111111,1 / 9
14.0,9.0,8.0,0.0322581,1 / 31


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,0.9677419
2,1.0
3,1.0



ModelMetricsMultinomial: gbm
** Reported on cross-validation data. **

MSE: 0.011864144793886748
RMSE: 0.1089226550993261
LogLoss: 0.03798343286985158
Mean Per-Class Error: 0.011111111111111112
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
29.0,0.0,0.0,0.0,0 / 29
0.0,30.0,0.0,0.0,0 / 30
0.0,1.0,29.0,0.0333333,1 / 30
29.0,31.0,29.0,0.0112360,1 / 89


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,0.9887641
2,1.0
3,1.0


Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7,8,9,10,11,12
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid,cv_6_valid,cv_7_valid,cv_8_valid,cv_9_valid,cv_10_valid
accuracy,0.9888889,0.0235702,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8888889,1.0,1.0
err,0.0111111,0.0235702,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1111111,0.0,0.0
err_count,0.1,0.2121320,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
logloss,0.0375999,0.0299332,0.0370365,0.0000455,0.0000001,0.0738255,0.0766432,0.0125085,0.0000000,0.1327098,0.0397612,0.0034691
max_per_class_error,0.0166667,0.0353553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1666667,0.0,0.0
mean_per_class_accuracy,0.9944444,0.0117851,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.9444444,1.0,1.0
mean_per_class_error,0.0055556,0.0117851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0555556,0.0,0.0
mse,0.0117332,0.0110742,0.0087176,0.0000000,0.0000000,0.0252963,0.0251847,0.0011720,0.0000000,0.0491780,0.0077064,0.0000768
r2,0.9755917,0.0253430,0.9886109,1.0,1.0,0.9620555,0.9215399,0.9978425,1.0,0.8951733,0.9908204,0.999874


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_classification_error,validation_rmse,validation_logloss,validation_classification_error
,2019-05-25 16:22:49,37.129 sec,0.0,0.6666667,1.0986123,0.6292135,0.6666667,1.0986123,0.7419355
,2019-05-25 16:22:49,37.132 sec,5.0,0.5506816,0.8005047,0.0337079,0.5457821,0.7897308,0.0322581
,2019-05-25 16:22:49,37.135 sec,10.0,0.4380941,0.5758320,0.0,0.4346034,0.5698682,0.0322581
,2019-05-25 16:22:49,37.138 sec,15.0,0.3411544,0.4145133,0.0112360,0.3351579,0.4052222,0.0322581
,2019-05-25 16:22:49,37.141 sec,20.0,0.2762060,0.3176195,0.0,0.2755377,0.3145307,0.0322581
---,---,---,---,---,---,---,---,---,---
,2019-05-25 16:22:49,37.311 sec,240.0,0.0039662,0.0010404,0.0,0.1721981,0.1032757,0.0322581
,2019-05-25 16:22:49,37.315 sec,245.0,0.0037415,0.0009760,0.0,0.1718068,0.1015712,0.0322581
,2019-05-25 16:22:49,37.319 sec,250.0,0.0033777,0.0008805,0.0,0.1725126,0.1046043,0.0322581



See the whole table with table.as_data_frame()
Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
petal-length,249.9416351,1.0,0.4871671
petal-width,176.8065033,0.7073912,0.3446177
sepal-length,58.3064194,0.2332801,0.1136464
sepal-width,27.9966202,0.1120126,0.0545689




In [10]:
# predict on test data
predictions = aml.leader.predict(test).as_data_frame()['predict'].tolist()

gbm prediction progress: |████████████████████████████████████████████████| 100%


In [11]:
# get target for test data
y_test = test.as_data_frame()['class'].tolist()

In [12]:
# print results
print(accuracy_score(y_test, predictions))
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

0.8666666666666667
[[ 7  0  0]
 [ 0 10  2]
 [ 0  2  9]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         7
Iris-versicolor       0.83      0.83      0.83        12
 Iris-virginica       0.82      0.82      0.82        11

       accuracy                           0.87        30
      macro avg       0.88      0.88      0.88        30
   weighted avg       0.87      0.87      0.87        30



Der DeepLearning-Ansatz wird von AutoML als der geeignetste betrachtet. Dieser erreicht auf den Trainingsdaten eine Accuracy von 0.8667 (Deep Learning-Ansätze sind hier leider nicht reproduzierbar: "H2O Deep Learning models are not reproducible by default for performance reasons", http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#automl-interface)und liegt damit unterhalb der Accuracy des Models mit der höchsten Accuracy aus Aufgabe drei (0.967). 

# Zweiter Datensatz

In [13]:
# Load dataset
url_heart = "./data/heart.csv"
names_heart =  ['age', 'sex', 'chest_pain_type', 'resting_blood_pressure', 'cholesterol', 
                 'fasting_blood_sugar', 'rest_ecg', 'max_heart_rate_achieved', 'exercise_induced_angina',
                 'st_depression', 'st_slope', 'num_major_vessels', 'thalassemia', 'target']
heart = pd.read_csv(url_heart, names=names_heart)
heart.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,num_major_vessels,thalassemia,target
0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
1,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
2,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
3,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
4,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1


In [14]:
# drop first row
heart = heart.drop(heart.index[0])
heart.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,num_major_vessels,thalassemia,target
1,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
2,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
3,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
4,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
5,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [15]:
# Split-out validation dataset
array_heart = heart.values
X_heart = array_heart[:,0:13]
Y_heart = array_heart[:,13]
test_size_heart = 0.20
seed_heart = 10
X_heart_train, X_heart_test, y_heart_train, y_heart_test = model_selection.train_test_split(X_heart, Y_heart, test_size=test_size_heart, random_state=seed_heart)

In [16]:
# prepare data by creating a train and test array and transforming it into an h2o frame
train_heart_array = np.column_stack([X_heart_train, y_heart_train])
test_heart_array = np.column_stack([X_heart_test, y_heart_test])

train_heart = H2OFrame(python_obj=train_heart_array, column_names=names_heart)
test_heart = H2OFrame(python_obj=test_heart_array, column_names=names_heart)

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [17]:
# Identify predictors and response
x_heart = names_heart
y_heart = "target"
x_heart.remove(y_heart)

In [18]:
# set number of models to 20 and start training
aml_heart = H2OAutoML(max_models=20, seed=1, nfolds=10)
aml_heart.train(x=x_heart, y=y_heart, training_frame=train_heart)                                 

AutoML progress: |████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [19]:
# View the AutoML Leaderboard
lb_heart = aml_heart.leaderboard
display(lb_heart.head(rows=lb_heart.nrows)) # Print all rows instead of default (10 rows)

model_id,mean_residual_deviance,rmse,mae,rmsle
StackedEnsemble_AllModels_0_AutoML_20190525_162343,0.111374,0.333728,0.246481,0.234103
StackedEnsemble_BestOfFamily_0_AutoML_20190525_162343,0.112304,0.335118,0.248627,0.234248
DeepLearning_grid_0_AutoML_20190525_162343_model_0,0.121459,0.34851,0.261562,0.251839
XRT_0_AutoML_20190525_162343,0.123401,0.351284,0.253637,0.245572
GLM_grid_0_AutoML_20190525_162343_model_0,0.12545,0.35419,0.284097,0.24988
DeepLearning_grid_0_AutoML_20190525_162343_model_5,0.130154,0.360768,0.274528,0.257425
GBM_grid_0_AutoML_20190525_162343_model_5,0.130494,0.36124,0.29209,0.253636
DeepLearning_grid_0_AutoML_20190525_162343_model_2,0.131651,0.362837,0.286262,0.256
DRF_0_AutoML_20190525_162343,0.131691,0.362893,0.249952,0.251953
GBM_grid_0_AutoML_20190525_162343_model_2,0.133861,0.365871,0.270722,0.254411




In [20]:
# The leader model is stored here
display(aml_heart.leader)

Model Details
H2OStackedEnsembleEstimator :  Stacked Ensemble
Model Key:  StackedEnsemble_AllModels_0_AutoML_20190525_162343
No model summary for this model


ModelMetricsRegressionGLM: stackedensemble
** Reported on train data. **

MSE: 0.003931911832168075
RMSE: 0.06270495859314536
MAE: 0.04907276895578059
RMSLE: 0.0446905019968733
R^2: 0.9840227074756345
Mean Residual Deviance: 0.003931911832168075
Null degrees of freedom: 191
Residual degrees of freedom: 185
Null deviance: 47.25
Residual deviance: 0.7549270717762703
AIC: -502.54446726863966

ModelMetricsRegressionGLM: stackedensemble
** Reported on validation data. **

MSE: 0.12618346556015075
RMSE: 0.35522312081303314
MAE: 0.2586696685420625
RMSLE: 0.23477350712671397
R^2: 0.46441652988051463
Mean Residual Deviance: 0.12618346556015075
Null degrees of freedom: 49
Residual degrees of freedom: 43
Null deviance: 11.9453125
Residual deviance: 6.309173278007538
AIC: 54.392935560116854

ModelMetricsRegressionGLM: stackedensemble
** Repo



In [21]:
# predict on test data
predictions_heart = aml_heart.leader.predict(test_heart).as_data_frame()['predict'].tolist()

stackedensemble prediction progress: |████████████████████████████████████| 100%


In [22]:
# get target for test data
y_heart_test = test_heart.as_data_frame()['target'].tolist()

# predictions are continuous --> set threshold for final prediction to 0.5
predictions_heart = pd.DataFrame(predictions_heart)
mask = predictions_heart>0.5
predictions_heart_binary = mask.astype(int)

In [23]:
# for comparability with ex 3
def sensitivity_score(y_true, y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    sensitivity = tp / (tp + fn)
    return sensitivity

def specificity_score(y_true, y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    specificity = tn / (tn + fp)
    return specificity

In [24]:
# print results
print('accuracy: '+ str(accuracy_score(y_heart_test, predictions_heart_binary)))
print('sensitivity:' + str(sensitivity_score(y_heart_test, predictions_heart_binary)))
print('specificity:' + str(specificity_score(y_heart_test, predictions_heart_binary)))

#print(confusion_matrix(y_heart_test, predictions_heart_binary))
#print(classification_report(y_heart_test, predictions_heart_binary))

accuracy: 0.8032786885245902
sensitivity:0.8461538461538461
specificity:0.7714285714285715


Hier ist die Accuracy mit ca. 0,8033 identisch zu der in Aufgabe 3 erreichten. Sensitivity weicht mit 0,8462 relativ gering von der in Aufgabe 3 erreichten (0,8077) ab. Auch die Specificity weicht mit 0,0,7714 von der in Aufgabe 3 erreichten (0,8). Ohne dies mit einem statistischen Test zu belegen, werden die Ergebnisse als mit denen der Aufgabe drei vergleichbar eingeschätzt.

###### (c) Was ist Ihre Meinung zu AutoML?

- AutoML nimmt einem viel Arbeit ab, was gerade für Anfänger hilfreich ist. Allerdings verliert man hierdurch auch etwas Kontrolle. Die Ergebnisse sind scheinbar schlechter als in Aufgabe 3. Jedoch ist es schwer nachzuvollziehen wieso, da eben alles automatisch passiert. Zudem ist nicht klar, wieso welche Modelle getestet wurden. 

- "AutoML can only guarantee reproducibility under certain conditions. H2O Deep Learning models are not reproducible by default for performance reasons", dies ist natürlich sehr ungünstig.http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#automl-interface, die eingeschränkte Reproduzierbarkeit ist ein Problem.

- Das Training dauert relativ lange, weshalb es etwas enttäuschend ist, das beim Iris-Datensatz keine zu Aufgabe 3 vergleichbaren Ergebnisse erzielt werden konnten. 

- "goal of finding the “best” model without any prior knowledge", https://medium.com/analytics-vidhya/gentle-introduction-to-automl-from-h2o-ai-a42b393b4ba2 --> diese Erfahrung können wir zumindest hinsichtlich des Datensatzes Iris nicht teilen. Man kann mit wenig Ahnung ein akzeptables Ergebnis erreichen, jedoch waren unsere Ergebnisse nicht im sehr guten Bereich.

- "The current version of AutoML trains and cross-validates the following algorithms (in the following order): three pre-specified XGBoost GBM (Gradient Boosting Machine) models, a fixed grid of GLMs, a default Random Forest (DRF), five pre-specified H2O GBMs, a near-default Deep Neural Net, an Extremely Randomized Forest (XRT), a random grid of XGBoost GBMs, a random grid of H2O GBMs, and a random grid of Deep Neural Nets. In some cases, there will not be enough time to complete all the algorithms, so some may be missing from teh leaderboard. AutoML then trains two Stacked Ensemble models." http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html --> Unseres Erachtens nach sollten noch mehr verschiedene Modelle ausprobiert werden. 