<a href="https://colab.research.google.com/github/tallerzalan/Applied-Machine-Learning/blob/main/NNs/Exercise_4_nn_lr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise - learning rate and batch size

1. Consider last exercise (i.e. the MNIST data). Suppose you are restricted to **training for only 2 epochs** but still want a good model. You recognize that finding the right learning rate is going to be very important. For this reason, you split your training data into a train and a validation set and use the validation set to find the optimal learning rate. Train a model with you optimized learning rate and evaluate it on your test data.
1. Recognizing that the batch size is also important for training speed, you decide to extend your above analysis to also find the optimal batch size. Once again, train a model with you optimized learning rate *and* batch size and evaluate it on your test data.
1. You have heard that momentum is important. You know that many optimizers already incorporate momentum by default, but you are now forced by your evil teacher to use SGD and otherwise repeat (1) and (2). You decide to extend your above analysis to also find the optimal momentum for SGD (see https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD for how to set it). Once again, train a model with you optimized learning rate, batch size, *and* momentum and evaluate it on your test data.

**See slides for more details!**

# Exercise 1

Consider last exercise (i.e. the MNIST data). Suppose you are restricted to **training for only 2 epochs** but still want a good model. You recognize that finding the right learning rate is going to be very important. For this reason, you split your training data into a train and a validation set and use the validation set to find the optimal learning rate. Train a model with you optimized learning rate and evaluate it on your test data.

In [None]:
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Activation, Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from tensorflow.keras.optimizers import SGD, Adam
from keras import Input
from keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

random_seed = 42
from tensorflow.random import set_seed
set_seed(random_seed)

from numpy.random import seed
seed(random_seed)

In [None]:
test_size = 0.2

# Parameters
num_classes = 10
input_shape = (28, 28, 1)
epochs = 2

In [None]:
# Load the data and split it between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Scale images to the [0, 1] range
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# Make sure images have shape (28, 28, 1)
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

# Split train data to also get val data
X_train, X_val, y_train, y_val = train_test_split(X_train,
                                                  y_train,
                                                  test_size = test_size,
                                                  random_state = random_seed
                                                  )

print(X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
(48000, 28, 28, 1) (48000, 10) (12000, 28, 28, 1) (12000, 10) (10000, 28, 28, 1) (10000, 10)


Here is (parts of) a model to get you started.

It is very helpful to wrap it inside a function since you want to call it multiple times in a loop.

Take note of the "Flatten" layer. This is important to reshape your data from (28, 28) to (784,).

Alternatively, you could reshape your data (the x's). This can be done using:

$\texttt{x = x.reshape(n, 784)}$ 

where $n$ is the number of samples (60k for training, 10k for test).

Then you don't need the Flatten layer, but remember to still specify an input shape of your first layer (i.e. 784 if you have done this reshaping).

**Note**: Do feel free to experiment with the number of layers, nodes per layer, and optimizer.

In [None]:
def build_model(learning_rate):
  model = Sequential([
      Flatten(input_shape = input_shape),
      Dense(128, activation = 'relu'),
      Dense(64, activation = 'relu'),
      Dense(32, activation = 'relu'),
      Dense(num_classes)
      ])
  
  optimizer = Adam(learning_rate = learning_rate)
  
  model.compile(
      loss = 'categorical_crossentropy',
      optimizer = optimizer,
      metrics = ['accuracy']
      )
  
  return model

Let us look at single run.

In [None]:
model = build_model(0.001) # insert desired learning rate

model.fit(X_train,
          y_train,
          validation_data = (X_val, y_val),
          epochs = epochs
          )

model.evaluate(X_test, y_test)

Epoch 1/2
Epoch 2/2


[10.389719009399414, 0.23070000112056732]

Now for the optimization.

In [None]:
learning_rates = [rate for rate in np.arange(0.0001, 1, 0.1)]

results = []

for learning_rate in learning_rates:
    model = build_model(learning_rate)
    model.fit(X_train,
              y_train,
              validation_data = (X_val, y_val),
              epochs = epochs,
              verbose = 0)
    
    loss, acc = model.evaluate(X_test, y_test)
    results.append((acc, learning_rate))
    
results = pd.DataFrame(results, columns = ['Accuracy', 'Learning rate'])
results



Unnamed: 0,Accuracy,Learning rate
0,0.1364,0.0001
1,0.0982,0.1001
2,0.1028,0.2001
3,0.0958,0.3001
4,0.0958,0.4001
5,0.0851,0.5001
6,0.1059,0.6001
7,0.0974,0.7001
8,0.1032,0.8001
9,0.101,0.9001


In [None]:
results[results['Accuracy'] == results['Accuracy'].max()]

Unnamed: 0,Accuracy,Learning rate
0,0.1364,0.0001


In [None]:
model = build_model(0.0001) # insert desired learning rate

model.fit(np.concatenate([X_train, X_val]),
          np.concatenate([y_train, y_val]),
          epochs = epochs
          )

model.evaluate(X_test, y_test)

Epoch 1/2
Epoch 2/2


[8.270132064819336, 0.1688999980688095]

# Exercise 2

Recognizing that the batch size is also important for training speed, you decide to extend your above analysis to also find the optimal batch size. Once again, train a model with you optimized learning rate *and* batch size and evaluate it on your test data.

In [None]:
learning_rates = [rate for rate in np.arange(0.0001, 1, 0.1)]
batch_sizes = [size for size in range(16, 65, 16)] # # must be positive ints. Default is 32

results = []

for learning_rate in learning_rates:
    for batch_size in batch_sizes:
        model = build_model(learning_rate)
        model.fit(X_train,
                  y_train,
                  validation_data = (X_val, y_val),
                  epochs = epochs,
                  batch_size = batch_size,
                  verbose = 0)
        loss, acc = model.evaluate(X_test, y_test)
        results.append((acc, learning_rate, batch_size))
    
results = pd.DataFrame(results, columns = ['Accuracy', 'Learning rate', 'Batch size'])
results



Unnamed: 0,Accuracy,Learning rate,Batch size
0,0.0884,0.0001,16
1,0.1265,0.0001,32
2,0.0982,0.0001,48
3,0.1042,0.0001,64
4,0.098,0.1001,16
5,0.1028,0.1001,32
6,0.0958,0.1001,48
7,0.0982,0.1001,64
8,0.1028,0.2001,16
9,0.0612,0.2001,32


In [None]:
results[results['Accuracy'] == results['Accuracy'].max()]

Unnamed: 0,Accuracy,Learning rate,Batch size
34,0.1766,0.8001,48


In [None]:
model = build_model(0.8001) # insert desired learning rate

model.fit(np.concatenate([X_train, X_val]),
          np.concatenate([y_train, y_val]),
          batch_size = 48, 
          epochs = epochs
          )

model.evaluate(X_test, y_test)

Epoch 1/2
Epoch 2/2


[9.508055686950684, 0.10090000182390213]

# Exericse 3

You have heard that momentum is important. You know that many optimizers already incorporate momentum by default, but you are now forced by your evil teacher to use SGD and otherwise repeat (1) and (2). You decide to extend your above analysis to also find the optimal momentum for SGD (see https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD for how to set it). Once again, train a model with you optimized learning rate, batch size, *and* momentum and evaluate it on your test data.

In [None]:
def build_model_with_momentum(learning_rate, momentum):
    model = Sequential([
        Flatten(input_shape = input_shape),
        Dense(128, activation = 'relu'),
        Dense(64, activation = 'relu'),
        Dense(32, activation = 'relu'),
        Dense(num_classes)
        ])
    
    optimizer = SGD(learning_rate = learning_rate,
                    momentum = momentum
                    )
    
    model.compile(
        loss = 'categorical_crossentropy',
        optimizer = optimizer,
        metrics = ['accuracy']
        )
    
    return model

In [None]:
learning_rates = [rate for rate in np.arange(0.0001, 1, 0.1)]
batch_sizes = [size for size in range(16, 65, 16)] # # must be positive ints. Default is 32
momentums = [moment for moment in np.arange(0, 1, 0.2)] # must be in [0, 1). Default (for SGD) is 0.0

results = []

for learning_rate in learning_rates:
    for batch_size in batch_sizes:
        for momentum in momentums:
            model = build_model_with_momentum(learning_rate, momentum)
            model.fit(X_train,
                      y_train,
                      validation_data = (X_val, y_val),
                      epochs = epochs,
                      batch_size = batch_size,
                      verbose = 0)
            loss, acc = model.evaluate(X_test, y_test)
            results.append((acc, learning_rate, batch_size, momentum))
    
results = pd.DataFrame(results, columns=['Accuracy', 'Learning rate', 'Batch size', 'Momentum'])
results



Unnamed: 0,Accuracy,Learning rate,Batch size,Momentum
0,0.0892,0.0001,16,0.0
1,0.1347,0.0001,16,0.2
2,0.1010,0.0001,16,0.4
3,0.0982,0.0001,16,0.6
4,0.0958,0.0001,16,0.8
...,...,...,...,...
195,0.0892,0.9001,64,0.0
196,0.0892,0.9001,64,0.2
197,0.0980,0.9001,64,0.4
198,0.0958,0.9001,64,0.6


In [None]:
results[results['Accuracy'] == results['Accuracy'].max()]

Unnamed: 0,Accuracy,Learning rate,Batch size,Momentum
10,0.192,0.0001,48,0.0


In [None]:
model = build_model_with_momentum(0.0001, 0) # insert desired learning rate

model.fit(np.concatenate([X_train, X_val]),
          np.concatenate([y_train, y_val]),
          batch_size = 48, 
          epochs = epochs
          )

model.evaluate(X_test, y_test)

Epoch 1/2
Epoch 2/2


[4.243669033050537, 0.09740000218153]