In [None]:
import tensorflow as tf
import numpy as np

Regularization is a method to prevent overfitting the model by adding a constraint to optimization function. 

Types:

    1. Weight Decay (L1, L2)
    2. Ensemble Methods
    3. Dropout
    4. Early stopping
    5. Dataset Augmentation
    6. Adding Noise 


#### Weight Decay (L1, L2)

The L2 regularization penalty is computed as: loss = l2 * reduce_sum(square(x))

In [None]:
# x has a shape of (2, 3) (two rows and three columns):
x = tf.constant([[1, 1, 1], [1, 1, 1]])
print(x.numpy())

print(tf.math.reduce_sum(x))
# sum all the elements
# 1 + 1 + 1 + 1 + 1+ 1 = 6

# loss = 2. * sum all the elements

regularizer = tf.keras.regularizers.L2(2.)
regularizer(x)

In [None]:
layer = tf.keras.layers.Dense(
    5, input_dim=5,
    kernel_initializer='ones',
    kernel_regularizer=tf.keras.regularizers.L1(0.01),
    activity_regularizer=tf.keras.regularizers.L2(0.01))

tensor = tf.ones(shape=(5, 5)) * 2.0
out = layer(tensor)

#### Dropout

Intuition: Can't rely on any one feature, so have to spread out weights. It means randomly removes connections in NN.

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

rate => Float between 0 and 1. Fraction of the input units to drop.

In theoretical perspective L2 is same as dropout. In both approaches, weights shrinks. 


Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer.

(This is in contrast to setting trainable=False for a Dropout layer. trainable does not affect the layer's behavior, as Dropout does not have any variables/weights that can be frozen during training.)

In [None]:
# A Python integer to use as random seed.
tf.random.set_seed(0)

layer = tf.keras.layers.Dropout(.2, input_shape=(2,))
data = np.arange(10).reshape(5, 2).astype(np.float32)
print(data)

outputs = layer(data, training=True)
print(outputs)

#### Early Stopping

Stop training when a monitored metric has stopped improving.

Assuming the goal of a training is to minimize the loss. With this, the metric to be monitored would be 'loss', and mode would be 'min'. 

A model.fit() training loop will check at end of every epoch whether the loss is no longer decreasing, considering the min_delta and patience if applicable. Once it's found no longer decreasing, model.stop_training is marked True and the training terminates.

The quantity to be monitored needs to be available in logs dict. To make it so, pass the loss or metrics at model.compile().


    min_delta	=> Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change 
                   of less than min_delta, will count as no improvement.
    
    patience	=> Number of epochs with no improvement after which training will be stopped.

In [None]:
tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', min_delta=0, patience=0, verbose=0,
    mode='auto', baseline=None, restore_best_weights=False
)

In [None]:
# Here, just showing how to use it. DO NOT RUN THIS.
history = model.fit(
    train_ds, 
    validation_data=val_ds,  
    epochs=EPOCHS,
    callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2),
)