# Exercise 2 - Custom training loop

## Objective

In this exercise, you have to implement your first training and validation loops from scratch to train
the logistic model you implemented. To do so, you will also have to create an optimizer.

## Details

A training loop goes through element of the training dataset and uses it to update the model's weights.
A validation loop goes through each element of the validation dataset and uses it to calculate
the metrics (eg, accuracy). We call **epoch** an iteration of one training loop and one validation loop.

The input to your model should be normalized. You can do this by dividing them by 255: `X /= 255`.

You can run `python training.py` to train your first machine learning model!

You will need to specify the `--imdir`, e.g. `--imdir GTSRB/Final_Training/Images/`, using the provided GTSRB dataset.

## Tips

You don't need `tf.GradientTape` for the validation loop as you will not be updating gradients. 

The `assign_sub` Variable method will be useful to perform the weights update in the sgd optimizer.

Use the `tf.one_hot` function to get the one vector from the ground truth label.

In [7]:
from tensorflow.keras.preprocessing import image_dataset_from_directory

imdir = 'GTSRB/Final_Training/Images'
train_dataset = image_dataset_from_directory(
    imdir,
    image_size=(32, 32),
#     batch_size=256,
    batch_size=256,    
    validation_split=0.1,
    subset='training',
    seed=123
)

Found 4300 files belonging to 43 classes.
Using 3870 files for training.


In [8]:
train_dataset

<BatchDataset shapes: ((None, 32, 32, 3), (None,)), types: (tf.float32, tf.int32)>

In [23]:
i = 0
for X, Y in train_dataset:
    i += 1
    print('\n\nround {}'.format(i))
    print(X)
    print('\n\n')
    print(Y)



round 1
tf.Tensor(
[[[[ 47.664062   40.226562   44.382812 ]
   [ 31.646484   31.34375    21.612305 ]
   [ 42.216797   41.404297   41.20117  ]
   ...
   [ 43.308594   54.043945   62.960938 ]
   [ 46.23047    64.118164   76.06445  ]
   [ 15.2734375  27.476562   35.039062 ]]

  [[ 44.64453    37.566406   39.36328  ]
   [ 33.229492   32.995117   21.805664 ]
   [ 38.597656   38.2041     36.765625 ]
   ...
   [ 70.97754    83.24121    88.850586 ]
   [ 58.848633   78.038086   87.98047  ]
   [ 15.87793    27.146484   34.515625 ]]

  [[ 44.478516   38.041016   36.092773 ]
   [ 53.617188   54.958984   42.20996  ]
   [ 36.84961    36.646484   34.316406 ]
   ...
   [ 78.73047    93.69238    93.44238  ]
   [ 26.042969   45.05957    52.137695 ]
   [ 11.983398   23.11914    29.524414 ]]

  ...

  [[ 22.182617   19.426758   24.71875  ]
   [ 17.141602   17.014648   19.205078 ]
   [ 17.78125    18.1875     17.654297 ]
   ...
   [ 23.038086   27.038086   28.038086 ]
   [ 33.527344   37.527344   38.1523

In [31]:
(32**2) * 3

3072

In [33]:
i = 1
print(f"is {i}"), print('is {}'.format(i))

is 1
is 1


(None, None)

In [38]:
a = []
a.append([3, 4])
a

[[3, 4]]

In [75]:
import argparse
import logging

import tensorflow as tf

from dataset import get_datasets
# from logistic import softmax, cross_entropy, accuracy


def get_module_logger(mod_name):
    logger = logging.getLogger(mod_name)
    handler = logging.StreamHandler()
    formatter = logging.Formatter('%(asctime)s %(levelname)-8s %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.DEBUG)
    return logger

def softmax(logits):
    """
    softmax implementation
    args:
    - logits [tensor]: 1xN logits tensor
    returns:
    - soft_logits [tensor]: softmax of logits
    """
    # IMPLEMENT THIS FUNCTION
    logits_exp = tf.math.exp(logits)
      
    # remember to use axis=1
    logits_sum = tf.math.reduce_sum(logits_exp, axis=1, keepdims=True)
#     print('logits_sum {}'.format(logits_sum))
    
    soft_logits = logits_exp / logits_sum
    return soft_logits

def model(X, W, b):
    """
    logistic regression model
    args:
    - X [tensor]: input HxWx3
    - W [tensor]: weights
    - b [tensor]: bias
    returns:
    - output [tensor]
    """
    # IMPLEMENT THIS FUNCTION
    X_flatten = tf.reshape(X, [-1,  W.shape[0]])
    logits = tf.matmul(X_flatten, W) + b
    output_probs = softmax(logits)
    return output_probs

def cross_entropy(scaled_logits, one_hot):
    """
    Cross entropy loss implementation
    args:
    - scaled_logits [tensor]: NxC tensor where N batch size / C number of classes
    - one_hot [tensor]: NxC one hot tensor
    returns:
    - loss [tensor]: cross entropy 
    """
    # IMPLEMENT THIS FUNCTION
    masked_prob = tf.boolean_mask(scaled_logits, one_hot)
    cross_entropy_val = -1 * tf.math.log(masked_prob)
    return cross_entropy_val

def accuracy(y_hat, Y):
    """
    calculate accuracy
    args:
    - y_hat [tensor]: NxC tensor of models predictions
    - y [tensor]: N tensor of ground truth classes
    returns:
    - acc [tensor]: accuracy
    """
    # IMPLEMENT THIS FUNCTION
    pred_class = tf.cast(tf.argmax(y_hat, axis=1), Y.dtype)
    compare_result = tf.cast(pred_class == Y, tf.int32)
    acc = tf.math.reduce_sum(compare_result) / Y.shape[0]
    return acc

def sgd(params, grads, lr, bs):
    """
    stochastic gradient descent implementation
    args:
    - params [list[tensor]]: model params
    - grad [list[tensor]]: param gradient such that params[0].shape == grad[0].shape
    - lr [float]: learning rate
    - bs [int]: batch_size
    """
    # IMPLEMENT THIS FUNCTION
#     print('\n\n params', params)  # [W, b]  
#     print('\n\n grads', grads)    # [W_grad, b_grad]
    
    i = -1
    for param, grad in zip(params, grads):
        i += 1
#         print('\n\n element {}'.format(i))
#         print('\n\n param', param)    
#         print('\n\n grad', grad)

        param.assign_sub( lr * grad / bs)
    
def training_loop(train_dataset, model, W, b, optimizer):
    """
    training loop
    args:
    - train_dataset: 
    - model [func]: model function
    - loss [func]: loss function
    - optimizer [func]: optimizer func
    returns:
    - mean_loss [tensor]: mean training loss
    - mean_acc [tensor]: mean training accuracy
    """
    losses = []
    accuracies = []
    i = -1
    for X, Y in train_dataset:  # get one batch in multiple batches
        i += 1
#         print('batch {}'.format(i))
        
        with tf.GradientTape() as tape:
            # IMPLEMENT THIS FUNCTION
            X = X / 255.0
            Y_hat = model(X, W, b)
            
            Y_one_hot = tf.one_hot(Y, 43)            
            loss = cross_entropy(Y_hat, Y_one_hot)
#             print('loss {}'.format(loss))
                        
            losses.append(tf.math.reduce_mean(loss))

            acc = accuracy(Y_hat, Y)
            accuracies.append(acc)
            
            grads = tape.gradient(loss, [W, b]) # grads = [W_grad, b_grad]

            lr = 0.1
            optimizer([W, b], grads, lr, X.shape[0])
            
    mean_loss = tf.math.reduce_mean(losses)
    mean_acc = tf.math.reduce_mean(tf.concat(accuracies, axis=0))
    
    return mean_loss, mean_acc


def validation_loop(val_dataset, model, W, b):
    """
    training loop
    args:
    - train_dataset: 
    - model [func]: model function
    - loss [func]: loss function
    - optimizer [func]: optimizer func
    returns:
    - mean_acc [tensor]: mean validation accuracy
    """
    # IMPLEMENT THIS FUNCTION
    accuracies = []
    i = -1
    for X, Y in val_dataset:  # get one batch in multiple batches
        i += 1
#         print('\n\n batch {}'.format(i))
        
        X = X / 255.0
        Y_hat = model(X, W, b)
        
        acc = accuracy(Y_hat, Y)
        accuracies.append(acc)
            
    mean_acc = tf.math.reduce_mean(tf.concat(accuracies, axis=0))    
    return mean_acc


if __name__  == '__main__':
#     logger = get_module_logger(__name__)
#     parser = argparse.ArgumentParser(description='Download and process tf files')
#     parser.add_argument('--imdir', required=True, type=str,
#                         help='data directory')
#     parser.add_argument('--epochs', default=10, type=int,
#                         help='Number of epochs')
#     args = parser.parse_args()    
# 
#     logger.info(f'Training for {args.epochs} epochs using {args.imdir} data')
    
    epochs = 100
    imdir = 'GTSRB/Final_Training/Images'
              
    print(f'Training for {epochs} epochs using {imdir} data')
        
              
    # get the datasets
    train_dataset, val_dataset = get_datasets(imdir)

    # set the variables
    num_inputs = (32**2) *3  # image is resized in get_datasets() above as 32 * 32
    num_outputs = 43 # 43 classes
    
    W = tf.Variable(
        tf.random.normal(
            shape=(num_inputs, num_outputs),
            mean=0,
            stddev=0.01
        )
    )
    b = tf.Variable(
        tf.zeros(num_outputs)
    )

    # training! 
    for epoch in range(epochs):
        print(f'\nEpoch {epoch}')
        
        loss, acc = training_loop(train_dataset, model, W, b, sgd)        
        print(f'Mean training loss: {loss:1f}, mean training accuracy {acc:1f}')
        
        acc = validation_loop(val_dataset, model, W, b)
        print(f'Mean validation accuracy {acc:1f}')        

Training for 100 epochs using GTSRB/Final_Training/Images data
Found 4300 files belonging to 43 classes.
Using 3870 files for training.
Found 4300 files belonging to 43 classes.
Using 430 files for validation.

Epoch 0
Mean training loss: 3.634902, mean training accuracy 0.072428
Mean validation accuracy 0.033787

Epoch 1
Mean training loss: 3.495463, mean training accuracy 0.163021
Mean validation accuracy 0.031026

Epoch 2
Mean training loss: 3.556461, mean training accuracy 0.256950
Mean validation accuracy 0.040454

Epoch 3
Mean training loss: 3.189094, mean training accuracy 0.287907
Mean validation accuracy 0.057000

Epoch 4
Mean training loss: 2.927907, mean training accuracy 0.367057
Mean validation accuracy 0.161683

Epoch 5
Mean training loss: 2.541489, mean training accuracy 0.440820
Mean validation accuracy 0.231681

Epoch 6
Mean training loss: 2.465732, mean training accuracy 0.476074
Mean validation accuracy 0.180765

Epoch 7
Mean training loss: 2.415739, mean training ac

Mean training loss: 0.926785, mean training accuracy 0.874333
Mean validation accuracy 0.751549

Epoch 77
Mean training loss: 0.895532, mean training accuracy 0.891325
Mean validation accuracy 0.793148

Epoch 78
Mean training loss: 0.872416, mean training accuracy 0.895492
Mean validation accuracy 0.714664

Epoch 79
Mean training loss: 0.878034, mean training accuracy 0.890592
Mean validation accuracy 0.758217

Epoch 80
Mean training loss: 0.876826, mean training accuracy 0.892432
Mean validation accuracy 0.742008

Epoch 81
Mean training loss: 0.859812, mean training accuracy 0.893294
Mean validation accuracy 0.748675

Epoch 82
Mean training loss: 0.858156, mean training accuracy 0.895117
Mean validation accuracy 0.784640

Epoch 83
Mean training loss: 0.847535, mean training accuracy 0.900374
Mean validation accuracy 0.794877

Epoch 84
Mean training loss: 0.859733, mean training accuracy 0.893766
Mean validation accuracy 0.748451

Epoch 85
Mean training loss: 0.859687, mean training ac