# Machine Learning Lab - Hackathon

**Summer Term 2021**

- Julian Stier <julian.stier@uni-passau.de>
- Sahib Julka <sahib.julka@uni-passau.de>
- [StudIP Machine Learning Lab](https://studip.uni-passau.de/studip/dispatch.php/course/scm?cid=42befdd6822ee2029b26fa475cd02f60)
- [FimGIT repositories](https://fimgit.fim.uni-passau.de/groups/padas/21ss-mllab/)

**General Remarks**
- You have time from 09:00 AM until 03:00 PM to work on the hackathon task.
- Go through the notebook, answer questions, solve described tasks and fill out empty spaces or add cells based on your creativity.
- Re-use previous implementations (of your own!) by either importing according python modules or copying it into the notebook.
- Your overall git repository acts as the official submission. Put the hackathon notebook also into the git repository, alongside with any previous notebooks or python implementations you already uploaded.
- If one of your implementation required for this notebook has not been working previously, you can now work on that specifically and try to solve it within the given time frame.

In [405]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [406]:
import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

# Step I: Prepare Your Data

- Download the two datasets.
- Read it into memory.
- Understand the feature shape and number of targets.
- Split both datasets into three fixed train-validation-test sets with own chosen proportions. You can e.g. use 80% of the data for training, 10% of the data for the validation set and 10% for the test set. Make sure you shuffle the data in before once.

### UCI Dataset: Abalone
> https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/

In [407]:
!wget -P ./data/abalone/ https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data
!wget -P ./data/abalone/ https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names

--2021-07-20 14:44:05--  https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191873 (187K) [application/x-httpd-php]
Saving to: ‘./data/abalone/abalone.data.4’


2021-07-20 14:44:06 (354 KB/s) - ‘./data/abalone/abalone.data.4’ saved [191873/191873]

--2021-07-20 14:44:06--  https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4319 (4.2K) [application/x-httpd-php]
Saving to: ‘./data/abalone/abalone.names.4’


2021-07-20 14:44:07 (65.4 MB/s) - ‘./data/abalone/abalone.names.4’ saved [4319/4319]



In [408]:
!cat ./data/abalone/abalone.names

1. Title of Database: Abalone data

2. Sources:

   (a) Original owners of database:
	Marine Resources Division
	Marine Research Laboratories - Taroona
	Department of Primary Industry and Fisheries, Tasmania
	GPO Box 619F, Hobart, Tasmania 7001, Australia
	(contact: Warwick Nash +61 02 277277, wnash@dpi.tas.gov.au)

   (b) Donor of database:
	Sam Waugh (Sam.Waugh@cs.utas.edu.au)
	Department of Computer Science, University of Tasmania
	GPO Box 252C, Hobart, Tasmania 7001, Australia

   (c) Date received: December 1995


3. Past Usage:

   Sam Waugh (1995) "Extending and benchmarking Cascade-Correlation", PhD
   thesis, Computer Science Department, University of Tasmania.

   -- Test set performance (final 1044 examples, first 3133 used for training):
	24.86% Cascade-Correlation (no hidden nodes)
	26.25% Cascade-Correlation (5 hidden nodes)
	21.5%  C4.5
	 0.0%  Linear Discriminate Analysis
	 3.57% k=5 Nearest Neighbour
      (Problem encoded as a classificat

In [409]:
col_names = ["Sex", "Length", "Diameter", "Height", "Whole weight", "Shucked weight", "Viscera weight", "Shell weight", "Rings"]
df = pd.read_csv("./data/abalone/abalone.data", header=None, names=col_names)
df.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


In [410]:
df.describe()

Unnamed: 0,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
count,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0,4177.0
mean,0.523992,0.407881,0.139516,0.828742,0.359367,0.180594,0.238831,9.933684
std,0.120093,0.09924,0.041827,0.490389,0.221963,0.109614,0.139203,3.224169
min,0.075,0.055,0.0,0.002,0.001,0.0005,0.0015,1.0
25%,0.45,0.35,0.115,0.4415,0.186,0.0935,0.13,8.0
50%,0.545,0.425,0.14,0.7995,0.336,0.171,0.234,9.0
75%,0.615,0.48,0.165,1.153,0.502,0.253,0.329,11.0
max,0.815,0.65,1.13,2.8255,1.488,0.76,1.005,29.0


### Fashion-MNIST

In [411]:
!wget -P ./data/fashion/raw/ https://github.com/zalandoresearch/fashion-mnist/raw/master/data/fashion/train-images-idx3-ubyte.gz
!wget -P ./data/fashion/raw/ https://github.com/zalandoresearch/fashion-mnist/raw/master/data/fashion/train-labels-idx1-ubyte.gz

--2021-07-20 14:44:07--  https://github.com/zalandoresearch/fashion-mnist/raw/master/data/fashion/train-images-idx3-ubyte.gz
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/zalandoresearch/fashion-mnist/master/data/fashion/train-images-idx3-ubyte.gz [following]
--2021-07-20 14:44:08--  https://raw.githubusercontent.com/zalandoresearch/fashion-mnist/master/data/fashion/train-images-idx3-ubyte.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26421880 (25M) [application/octet-stream]
Saving to: ‘./data/fashion/raw/train-images-idx3-ubyte.gz.4’


2021-07-20 14:44:11 (8.72 MB/s) - ‘./data/fashion/r

In [412]:
uci_test_size = 800
label_col = 8

uci_data = df.values.copy()
np.random.shuffle(uci_data)
uci_features = uci_data[:, :label_col]
uci_labels = uci_data[:, label_col]

uci_features_train = uci_features[2*uci_test_size:]
uci_labels_train = uci_labels[2*uci_test_size:]

uci_features_valid = uci_features[:uci_test_size]
uci_labels_valid = uci_labels[:uci_test_size]

uci_features_test = uci_features[uci_test_size:2*uci_test_size]
uci_labels_test = uci_labels[uci_test_size:2*uci_test_size]

In [413]:
import urllib

reader_lib_url = 'https://raw.githubusercontent.com/zalandoresearch/fashion-mnist/master/utils/mnist_reader.py'
exec(urllib.request.urlopen(reader_lib_url).read(), globals())

In [414]:
fmnist_test_size = 2000

fmnist_features, fmnist_labels = load_mnist('data/fashion/raw', kind='train')
    
fmnist_features_train = fmnist_features[2*fmnist_test_size:]
fmnist_labels_train = fmnist_labels[2*fmnist_test_size:]

fmnist_features_valid = fmnist_features[:fmnist_test_size]
fmnist_labels_valid = fmnist_labels[:fmnist_test_size]

fmnist_features_test = fmnist_features[fmnist_test_size:2*fmnist_test_size]
fmnist_labels_test = fmnist_labels[fmnist_test_size:2*fmnist_test_size]

# Step II: Choose a Baseline Classifier

* Choose a baseline classifier - except the neural network classifier - you have been working with over the semester and let it learn based on the **small** dataset
* Provide some error measure or indicator whether your classifier learned, e.g. loss over multiple steps or the number of correctly classified samples on the training set or similar

```python
model_baseline = YourAlgorithm()
model_baseline.learn(uci_features_train, uci_labels_train)
```

Here, I choose Logistic Regression as my Baseline Classifier 
the loss decrease, but does not get a good result of f1 score

In [415]:
import numpy as np

from interfaces.base_model import BaseModel

def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

def softmax(z):
    exps = np.exp(z)
    return exps / np.sum(exps, axis=1, keepdims=True)


def logistic_regression_eval(X, w):
    return softmax(X@w)


def logistic_regression_loss(y, y_hat):
    m = y.shape[0]
    loss = np.sum(-y*np.log(y_hat), axis=1)
    return 1./m * np.sum(loss)

def logistic_regression_train(X, Y, learning_rate=0.1, iteration_count=1000, batch_size=None):
    m, n = X.shape
    _, c = Y.shape
    w = np.zeros((n, c))

    for i in range(iteration_count):

        X_chosen, Y_chosen = X, Y
        if batch_size != None:
            choices = np.random.choice(m, size=batch_size, replace=False)
            X_chosen, Y_chosen = X[choices, :], Y[choices, :]

        Y_hat = logistic_regression_eval(X_chosen, w)
        gradient = X_chosen.T @ (Y_hat - Y_chosen)
        w -= 1.0/(batch_size or m) * learning_rate * gradient
        # print(logistic_regression_loss(Y_chosen, Y_hat))
    return w


def logistic_regression_predict(X, w):
    Y_raw = logistic_regression_eval(X, w)
    Y_label = np.argmax(Y_raw, axis=1)
    Y_hat = (np.arange(Y_raw.shape[1]).reshape(1, -1) == Y_label.reshape(-1, 1)) * 1
    return Y_hat


class LogisticRegression (BaseModel):

    def learn(self, X, Y, learning_rate=0.1, iteration_count=1000, batch_size=None):
        self.w = logistic_regression_train(X, Y, learning_rate=learning_rate, iteration_count=iteration_count, batch_size=batch_size)

    def infer(self, X):
        return logistic_regression_predict(X, self.w)


In [416]:
baseline_model = LogisticRegression()

In [417]:
def features_to_vec(raw_X):
    X = raw_X.copy()
    column0_classes = np.unique(uci_features[:, 0])
    X0_indicator = column0_classes.reshape(1, -1) == X[:, 0].reshape(-1, 1)
    X[:, 0] = X0_indicator @ np.arange(column0_classes.shape[0])
    return X.astype(float)

def label_to_vec(raw_Y):
    Y = raw_Y.copy()
    Y_classes = np.unique(uci_labels)
    Y_indicator = (Y_classes.reshape(1, -1) == Y.reshape(-1, 1)) * 1
    return Y_indicator

def z_score_normalizer(of_X): # m*n
    mean = np.mean(of_X, axis=0)
    scale_range = np.std(of_X, axis=0)
    return lambda X: (X-mean) / scale_range

In [418]:
X_uci_train_raw = features_to_vec(uci_features_train)
Y_uci_train = label_to_vec(uci_labels_train)

normalizer =  z_score_normalizer(X_uci_train_raw)
X_uci_train_normalized = normalizer(X_uci_train_raw)
X_uci_train = np.insert(X_uci_train_normalized, 0, 1, axis=1)

In [419]:
baseline_model.learn(X_uci_train, Y_uci_train, learning_rate=3, iteration_count=1000)

In [420]:
from sklearn.metrics import f1_score

Y_uci_train_hat = baseline_model.infer(X_uci_train)
f1_score(Y_uci_train, Y_uci_train_hat, average=None, zero_division=1)

array([0.        , 1.        , 0.        , 0.51282051, 0.1038961 ,
       0.43361345, 0.01568627, 0.33948339, 0.29757785, 0.34173669,
       0.        , 0.        , 0.09210526, 0.06122449, 0.04      ,
       0.25531915, 0.09302326, 0.        , 0.        , 0.1       ,
       0.23529412, 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 1.        ])

# Step III: Provide Evaluation Metrics for the Classifier Interface

* Given the class interface for machine learning models, use the predicted target from the result of an model.infer()-invocation to calculate precision, recall and f1-score given the actual test-set targets.
* Do not use scikit-learn or similar libraries; but you can orientate on such interfaces or implementations.
* Note, that a model can return two or multiple classes based on the problem it learned.

```python
baseline_predicted = model_baseline.infer(features_test)
```

> https://en.wikipedia.org/wiki/Precision_and_recall

$precision = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}$

$recall = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}$

### Testing your implementation
You can use below vectors as a reference for testing the output of infer() and the target vector of a 10-class-classifier. The *f1_score* method of scikit learn gives you a reference on how the values need to look like. Using the function is of course not a valid solution.

In [421]:
X_uci_valid_raw = features_to_vec(uci_features_valid)
X_uci_valid_normalized = normalizer(X_uci_valid_raw)
X_uci_valid = np.insert(X_uci_valid_normalized, 0, 1, axis=1)

Y_uci_valid = label_to_vec(uci_labels_valid)

In [422]:
Y_uci_valid_predicted = baseline_model.infer(X_uci_valid)

In [436]:
from sklearn.metrics import f1_score

# from sklearn.metrics import f1_score
f1_score(Y_uci_valid, Y_uci_valid_predicted, average=None, zero_division=1)

array([1.        , 1.        , 0.        , 0.27586207, 0.12903226,
       0.34146341, 0.02816901, 0.35471698, 0.31386861, 0.31944444,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 1.        , 0.        , 1.        , 1.        ,
       1.        , 0.        , 1.        ])

# Step IV: Experiment (1) Hyperparameter Choice of Baseline Classifier

* Use one fixed train-validation-test split.
* Choose a hyperparameter of your baseline classifier.
* Conduct a grid search to find the best suitable value for it. Let the classifier learn on the training set and use an evaluation metric on the validation set (not the test set!) to find out which hyperparameter value works best for your classifier on the data.

```python
possible_hp_values = np.arange(1, 10, 0.1)
best_hp_value = None
best_f1_score = -np.infty
for hp_value in possible_hp_values:
    # 1. create a baseline classifier object with hp_value specified
    current_model = YourAlgorithm(hyperparam=hp_value)
    
    # 2. learn the classifier on the training set
    current_model.learn(uci_features_train, uci_labels_train)
    
    # 3. evaluate the model on the validation set
    prediction = current_model.infer(uci_features_valid)
    
    f1_score = compute_f1(uci_labels_valid, prediction)
    if f1_score > best_f1_score:
        best_f1_score = f1_score
        best_hp_value = hp_value

print("Found hyperparameter value", best_hp_value)
print("Best f1-score on validation set", best_f1_score)

test_model = YourAlgorithm(hyperparam=best_hp_value)
test_model.learn(uci_features_train, uci_labels_train)
prediction = test_model.infer(uci_features_test)
test_f1_score = compute_f1(uci_labels_test, prediction)
print("F1-Score on test set", test_f1_score)
```

# Step V: Use a Neural Network Classifier

- Let a neural network learn on the training set and report its evaluation metric on the **validation** set.

In [449]:
import numpy as np

def ReLU(z):
    return max(0, z)

def ReLU_gradient(z):
    return 1 if z > 0 else 0

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_gradient(z):
    return sigmoid(z) * (1-sigmoid(z))

def softmax(z):
    exps = np.exp(z)
    return exps / np.sum(exps, axis=1, keepdims=True)

def softmax_gradient(z):
    return softmax(z)*(1-softmax(z))

def tanh_gradient(z):
    return 1 - np.tanh(z)**2


def neural_network_loss(Y, Y_hat):
    m = Y.shape[0]
    err = -np.sum(Y*np.log(Y_hat))
    J = 1./m * err
    return J


def neural_network_forward_propagation(X, Ws, bs, activation_functions):
    no_layers = len(Ws)
    Zs, As = [None]*no_layers, [None]*no_layers
    Zs[0] = X@Ws[0] + bs[0]
    As[0] = activation_functions[0](Zs[0])

    for i in range(1, no_layers):
        Zs[i] = As[i-1]@Ws[i] + bs[i]
        As[i] = activation_functions[i](Zs[i])
    return Zs, As


def neural_network_backward_propagation(X, Y, Ws, bs, Zs, As, activation_gradient_functions):
    no_layers = len(activation_gradient_functions)
    n, m = X.shape
    dZs, dWs, dbs = [None]*no_layers, [None]*no_layers, [None]*no_layers

    current_layer = no_layers - 1
    dZs[current_layer] = As[current_layer] - Y
    dWs[current_layer] = 1./m * As[current_layer - 1].T @ dZs[current_layer]
    dbs[current_layer] = 1./m * np.sum(dZs[current_layer], axis=0)

    for i in range(1, no_layers - 1):
        current_layer = no_layers - 1 - i
        dZs[current_layer] = dZs[current_layer + 1] @ Ws[current_layer + 1].T \
                                * activation_gradient_functions[current_layer](Zs[current_layer])
        dWs[current_layer] = 1./m * As[current_layer - 1].T @ dZs[current_layer]
        dbs[current_layer] = 1./m * np.sum(dZs[current_layer], axis=0)

    dZs[0] = dZs[1] @ Ws[1].T * activation_gradient_functions[0](Zs[0])
    dWs[0] = 1./m * X.T @ dZs[0]
    dbs[0] = 1./m * np.sum(dZs[0], axis=0)

    return dZs, dWs, dbs


def neural_network_train(X, Y, layers, batch_size=None, iteration_count=1000, learning_rate=0.1):
    m, n = X.shape
    no_layers = len(layers)
    activation_functions = [l[1] for l in layers]
    activation_gradient_functions = [l[2] for l in layers]

    # initialize the parameter
    Ws, bs = [None]*no_layers, [None]*no_layers
    no_hidden_units, _, _ = layers[0]
    Ws[0] = np.random.randn(n, no_hidden_units)
    bs[0] = np.zeros((1, no_hidden_units))
    for i in range(1, no_layers):
        no_hidden_units, _, _ = layers[i]
        no_prev_hidden_units, _, _ = layers[i-1]
        Ws[i] = np.random.randn(no_prev_hidden_units, no_hidden_units)
        bs[i] = np.zeros((1, no_hidden_units))

    # gradient descent
    for i in range(iteration_count):

        X_chosen, Y_chosen = X, Y
        if batch_size != None:
            choices = np.random.choice(m, size=batch_size, replace=False)
            X_chosen, Y_chosen = X[choices, :], Y[choices, :]

        Zs, As = neural_network_forward_propagation(X_chosen, Ws, bs, activation_functions)
        dZs, dWs, dbs = neural_network_backward_propagation(X_chosen, Y_chosen, Ws, bs, Zs, As, activation_gradient_functions)

        for i in range(len(Ws)):
            Ws[i] -= learning_rate * dWs[i]
            bs[i] -= learning_rate * dbs[i]
        # print('loss', neural_network_loss(Y_chosen, As[no_layers - 1]))
    return Ws, bs


def neural_network_predict(X, Ws, bs, layers):
    activation_functions = [l[1] for l in layers]
    _, As = neural_network_forward_propagation(X, Ws, bs, activation_functions)
    return (As[-1] > 0.5) * 1


class NeuralNetwork (BaseModel):
    def learn(self, X, Y, layers, learning_rate=0.1, iteration_count=1000, batch_size=None):
        self.layers = layers
        Ws, bs = neural_network_train(X, Y, layers, learning_rate=learning_rate, iteration_count=iteration_count, batch_size=batch_size)
        self.Ws = Ws
        self.bs = bs

    def infer(self, X):
        return neural_network_predict(X, self.Ws, self.bs, self.layers)


### NN for uci dataset

the loss decrease, but does not get a good result in f1 score

In [450]:
nn_model = NeuralNetwork()

layers_uci = [(4, np.tanh, tanh_gradient), (28, softmax, softmax_gradient)]
np.random.seed(0)

nn_model.learn(X_uci_train, Y_uci_train, layers_uci, learning_rate=0.01, iteration_count=1000)
Y_uci_valid_hat = nn_model.infer(X_uci_valid)

In [451]:
f1_score(Y_uci_valid, Y_uci_valid_hat, average=None, zero_division=1)

array([1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 1., 0., 1., 1., 1., 1., 1.])

### NN for fmnist dataset

In [442]:
def lable_to_vec_fmnist(raw_Y):
    Y = raw_Y.copy()
    Y_classes = np.unique(fmnist_labels)
    Y_indicator = (Y_classes.reshape(1, -1) == Y.reshape(-1, 1)) * 1
    return Y_indicator

normalizer =  z_score_normalizer(fmnist_features_train)
X_fmnist_train = normalizer(fmnist_features_train)
X_fmnist_test = normalizer(fmnist_features_test)
X_fmnist_valid = normalizer(fmnist_features_valid)

Y_fmnist_train = lable_to_vec_fmnist(fmnist_labels_train)
Y_fmnist_test = lable_to_vec_fmnist(fmnist_labels_test)
Y_fmnist_valid = lable_to_vec_fmnist(fmnist_labels_valid)

In [443]:
layers = [(10, np.tanh, tanh_gradient), (10, softmax, softmax_gradient)]
np.random.seed(0)

Ws_fmnist, bs_fmnist = neural_network_train(X_fmnist_train, Y_fmnist_train, layers, learning_rate=0.03, iteration_count=1000)

KeyboardInterrupt: 

In [None]:
Y_fmnist_valid_hat = neural_network_predict(X_fmnist_valid, Ws_fmnist, bs_fmnist, layers)
f1_score(Y_fmnist_valid, Y_fmnist_valid_hat, average=None, zero_division=1)

# Step VI: Experiment (2) Hyperparameter Choice of Neural Net

- choose a hyperparameter of your neural net, e.g. the number of neurons in the first hidden layer or the learning rate for SGD
- (iteratively) create models for each hyperparameter setting, e.g. the number of neurons h=10,20,30,40,50,60,70,80,90,100
- train the neural net on the train set and evaluate it over your validation set
- keep all models with each hyperparameter setting and determine which is the best performing model on the validation set
- evaluate them also on the test set. is the best model on the validation set with its hyperparameter also the best model on the test set?

# Step VII: Experiment (3) Neural Net Stability on Shuffled Data

- over multiple runs $r \geq 5, r\in\mathbb{N}$ randomly shuffle your training data
- split it into a train-test, e.g. 90% of the data is for training, 10% for testing
- what are the mean and standard deviation of your models over multiple runs?
- plot a boxplot with matplotlib/seaborn of the stability of your model

In [441]:
import numpy as np


def get_k_folds(n, k, random=True):
    fold_size = n//k
    m = fold_size*k

    indices = np.random.permutation(m) if random else np.arange(m)
    indices_splits = indices.reshape(k, -1)

    folds = []
    fold_indices = np.arange(k)
    for i in range(k):
        train_indices = indices_splits[fold_indices != i].flatten()
        test_indices = indices_splits[fold_indices == i].flatten()
        folds.append((train_indices, test_indices))

    return folds



In [None]:
example_results = np.minimum(np.random.normal(0.8, 0.1, (100,)), 1)
sns.boxplot(data=example_results)
plt.title("Stability of My Model over 100 runs on test set")
plt.ylabel("Accuracy on test set")
plt.xlabel("My Model")
plt.show()

# Step VIII: Bonus: implement Momentum SGD / ADAM / ..

- this task is optional if you have time at the end
- inspect your stochastic gradient descent implementation
- have a look at online examples such as [wiseodd.github.com](https://wiseodd.github.io/techblog/2016/06/22/nn-optimization/) for implementations of variants on stochastic gradient such as with Nesterov Momentum or ADAM
- change your implementation of SGD to one or multiple of these variants and try a simple run of your neural net and compare it with previous results
- sketch a first design of an optimizer-class which is fed with parameters of your model and performs the update step 