<a href="https://colab.research.google.com/github/tallerzalan/Applied-Machine-Learning/blob/main/NNs/Exercise_2_nn_optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise - optimizers and activation functions

1. Use the $\texttt{fetch_california_housing}$ data (remember to split your data into a train and test data). Use the five optimizers presented in class to train five neural networks (identival aside from the optimizer used). How well does the networks perform on the test set, as measured by MSE and MAE? Rank the optimizers.
1. Select the best optimizer and use it for this exercise. Experiment with different activation functions, including at least sigmoid, tanh, and relu. Rank the activation functions you try. 
1. Using your findings, as well as experimenting with more layers, try to minimize the test MSE.

**Note**: You may want to use https://www.tensorflow.org/api_docs/python/tf/keras/activations and https://www.tensorflow.org/api_docs/python/tf/keras/optimizers.

**See slides for more details!**

# Exercise 1

Use the $\texttt{fetch_california_housing}$ data (remember to split your data into a train and test data). Use the five optimizers presented in class to train five neural networks (identival aside from the optimizer used). How well does the networks perform on the test set, as measured by MSE and MAE? Rank the optimizers.

In [None]:
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Activation, Dense
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

import pandas as pd
from matplotlib import pyplot as plt

seed = 42
tf.random.set_seed(seed)

In [None]:
X, y = fetch_california_housing(return_X_y = True)

# Use `train_test_split` to split your data into a train and a test set.
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size = 0.2,
                                                    random_state = seed
                                                    )

# Scale
scaler = StandardScaler()
Z_train = scaler.fit_transform(X_train)
Z_test = scaler.transform(X_test)

print(Z_train.shape, Z_test.shape, y_train.shape, y_test.shape)

(16512, 8) (4128, 8) (16512,) (4128,)


In [None]:
def build_nn(activation = 'sigmoid'):
    your_regression_nn = Sequential([
        Dense(64, activation = activation, input_shape = (8,)), # input_shape = 8 as there are 8 features
        Dense(1, activation = 'linear') # linear is used for regression. 1 node since 1 output (pr. observation)
        ])

    return your_regression_nn

In [None]:
# SGD
nn_sgd = build_nn(activation = 'sigmoid')

nn_sgd.compile(
    optimizer = 'sgd',
    loss = 'mse',
    metrics = ['mae'], # to also track MAE. MSE is "automatically" measured since it is the loss
    )

nn_sgd.fit(Z_train,
           y_train,
           epochs = 5
           )

mse, mae = nn_sgd.evaluate(Z_test,
                           y_test
                           )

print(f'Test MSE = {round(mse, 3)}, test MAE = {round(mae, 3)}.')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test MSE = 0.514, test MAE = 0.522.


**Important note**: Remember to use "mse" as your loss function! Now, it is okay to try something else, but at least do not use cross entropy (remember that is for classification.

Go through each of the five optimizers covered in class and rank their performance on this dataset.

In [None]:
activations = ['sigmoid', 'tanh', 'relu']
optimizers = ['sgd', 'adam', 'adagrad', 'adadelta', 'adamax', 'nadam', 'rmsprop']

results = []

for activation in activations:
  for optimizer in optimizers:
    nn = build_nn(activation)
    nn.compile(
        optimizer = optimizer,
        loss = 'mse',
        metrics = ['mae']
        )
    
    nn.fit(Z_train,
           y_train,
           epochs = 5
           )
    
    mse, mae = nn.evaluate(Z_test,
                           y_test
                           )
    
    results.append([activation, optimizer, mse, mae])

results = pd.DataFrame(results)
results.columns = ['Activation', 'Optimizer', 'MSE', 'MAE']
print(results)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
# Extract best parameters
results[results['MSE'] == results['MSE'].min()]

Unnamed: 0,Activation,Optimizer,MSE,MAE
15,relu,adam,0.384966,0.437916


# Exercise 2

Select the best optimizer and use it for this exercise. Experiment with different activation functions, including at least sigmoid, tanh, and relu. Rank the activation functions you try. 

In [None]:
nn_relu = build_nn('relu')

nn_relu.compile(
    optimizer = 'adam',
    loss = 'mse',
    metrics = ['mae']
    )

nn_relu.fit(Z_train,
            y_train,
            epochs = 10
            )

mse, mae = nn_relu.evaluate(Z_test,
                            y_test
                            )

print(f'Test MSE = {round(mse, 3)}, test MAE = {round(mae, 3)}.')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test MSE = 0.354, test MAE = 0.413.


# Exercise 3

Using your findings, as well as experimenting with more layers, try to minimize the test MSE.

In [None]:
def build_better_nn(activation):
    your_regression_nn = Sequential([
        Dense(32, activation = activation, input_shape = (8,)), # input_shape = 8 since 8 features
        Dense(64, activation = activation),
        Dense(128, activation = activation),
        Dense(1, activation = 'linear') # linear is used for regression. 1 node since 1 output
        ])

    return your_regression_nn

In [None]:
nn_final = build_better_nn('relu')

nn_final.compile(
    optimizer = 'adam',
    loss = 'mse',
    metrics = ['mae']
    )

nn_final.fit(Z_train,
             y_train,
             epochs = 10
             )

mse, mae = nn_final.evaluate(Z_test,
                             y_test
                             )

print(f'Final model test MSE = {round(mse, 3)}, test MAE = {round(mae, 3)}.')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Final model test MSE = 0.328, test MAE = 0.412.
