# Custom Models and Training with TensorFlow

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
tf.range(10)

<tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)>

In [3]:
tf.constant(np.arange(10))

<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>

# Exercises

## 1.
TensorFlow is a powerful library for numerical computations, particularly well suited for large-scale Machine Learning. It is similar to NumPy at its core, but with GPU support. Its main features include support for distributed computing, computation graph analysis and optimization, an optimization API based on reverse-mode autodiff, and several powerful APIs. Other popular Deep Learning libraries include Theano, Caffe and PyTorch.

## 2.
TensorFlow offers most of the functionalities provided by NumPy, but it is not a drop-in replacement: the functions are not always named the same, some functions do not behave in the same way, and NumPy arrays are mutable, while TensorFlow tensors are not (although mutable objects can be created with `tf.Variable`).

## 3.
The tensor created by `tf.range(10)` uses 32-bit integer (TensorFlow defaults to 32 bits), while `tf.constant(np.arange(10))` will create a tensor containing 64-bit integers (which is NumPy's default).

## 4.
1. Ragged tensors
2. Sparse tensors
3. Sets
4. Queues
5. String tensors
6. Tensor arrays

## 5.
Subclassing the `keras.losses.Loss` class allows us to use custom loss function hyperparameters, and save them along with the model by implementing the `get_config()` method. When this is not the case, a regular Python function will suffice.

## 6.
Custom metrics can also be defined as regular Python functions. However, in order for a custom metric to support hyperparameters, it should subclass the `keras.metrics.Metric` class, which gives more flexibility and control.

## 7.
The internal components of the model should be distinguished from the model itself. To achieve this, the former should subclass the `keras.layers.Layer` class, while the latter should subclass the `keras.models.Model` class.

## 8.
This should be done only when its strictly necessary to have full control over the training loop, as writing a custom training loop is more error-prone. In most cases, there are more than enough tools to customize the training without having to write a custom training loop.

## 9.
Custom Keras components should be convertible to TF Function, meaning that they should stick to TF operations as much as possible, and respect the rules discussed in the book. If there is no way around it, arbitrary Python can be included, but this will reduce performance and limit the model's portability.

## 10.
Refer the "TF Function Rules" section of the book for the complete list ;)

## 11.
A dynamic Keras model can be created by setting `dynamic=True` when creating it, or by setting `run_eagerly=True` when calling the model's `compile()` method. A dynamic model can be useful for debugging, since it will not compile any custom component to a TF Function. However, making a model dynamic prevents Keras from using TensorFlow's graph features, so it will slow down training and inference, and limit the model's portability.

## 12.

In [4]:
from tensorflow import keras

class LayerNormalization(keras.layers.Layer):
    def __init__(self, eps=0.001, **kwargs):
        self.eps = eps
        super().__init__(**kwargs)
        
    def build(self, input_shape):
        self.alpha = self.add_weight(name='alpha', shape=input_shape[-1:], initializer='ones', dtype='float32')
        self.beta = self.add_weight(name='beta', shape=input_shape[-1:], initializer='zeros', dtype='float32')
        super().build(input_shape)
        
    def call(self, X):
        mean, variance = tf.nn.moments(X, axes=-1, keepdims=True)
        return self.alpha * (X - mean) / (tf.sqrt(variance + self.eps)) + self.beta
    
    def get_config(self):
        config = super().get_config()
        return {**config, 'eps': self.eps}
    
# Generate random data
X = np.random.rand(1000, 2).astype(np.float32)

custom_layer = LayerNormalization()
keras_layer = keras.layers.LayerNormalization()

# Compute the MAE for the output of the two layers.
# We can verify that the outputs are similar since the error is very low
tf.reduce_mean(keras.losses.mean_absolute_error(keras_layer(X), custom_layer(X)))

<tf.Tensor: shape=(), dtype=float32, numpy=6.700959e-08>

## 13.

### a.

In [5]:
keras.backend.clear_session()

(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

X_train_full = X_train_full.astype(np.float32) / 255
X_test = X_test.astype(np.float32) / 255

X_val, X_train = X_train_full[:5000], X_train_full[5000:]
y_val, y_train = y_train_full[:5000], y_train_full[5000:]

In [6]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [7]:
# Training parameters
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.sparse_categorical_crossentropy
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

In [8]:
from collections import OrderedDict

for epoch in range(1, n_epochs + 1):
    
    print(f"Epoch {epoch}/{n_epochs}")
    
    for step in range(1, n_steps + 1):
        batch_idx = np.random.randint(len(X_train), size=batch_size)
        X_batch, y_batch = X_train[batch_idx], y_train[batch_idx]
        
        with tf.GradientTape(persistent=True) as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
            
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
            
        status = OrderedDict()
        mean_loss(loss)
        status['loss'] = mean_loss.result().numpy()
        
        for metric in metrics:
            metric(y_batch, y_pred)
            status[metric.name] = metric.result().numpy()
            
        print(f"\rIteration {step}/{n_steps} - loss: {status['loss']:.4f} - {metric.name}: {status[metric.name]:.4f}", end='')
            
    y_pred = model(X_val)
    status['val_loss'] = np.mean(loss_fn(y_val, y_pred))
    status['val_accuracy'] = np.mean(keras.metrics.sparse_categorical_accuracy(
        tf.constant(y_val, dtype=np.float32), y_pred))
    
    print(f" - val_loss: {status['val_loss']:.4f} - val_accuracy: {status['val_accuracy']:.4f}")
    
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5
Iteration 1718/1718 - loss: 0.5052 - sparse_categorical_accuracy: 0.8195 - val_loss: 0.3969 - val_accuracy: 0.8608
Epoch 2/5
Iteration 1718/1718 - loss: 0.4131 - sparse_categorical_accuracy: 0.8530 - val_loss: 0.4044 - val_accuracy: 0.8562
Epoch 3/5
Iteration 1718/1718 - loss: 0.3808 - sparse_categorical_accuracy: 0.8627 - val_loss: 0.4585 - val_accuracy: 0.8536
Epoch 4/5
Iteration 1718/1718 - loss: 0.3702 - sparse_categorical_accuracy: 0.8639 - val_loss: 0.3827 - val_accuracy: 0.8658
Epoch 5/5
Iteration 1718/1718 - loss: 0.3599 - sparse_categorical_accuracy: 0.8702 - val_loss: 0.3920 - val_accuracy: 0.8642


### b.

In [9]:
keras.backend.clear_session()

lower_layers = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="relu"),
])
upper_layers = keras.models.Sequential([
    keras.layers.Dense(10, activation="softmax"),
])
model = keras.models.Sequential([
    lower_layers, upper_layers
])

In [10]:
lower_optimizer = keras.optimizers.SGD(lr=1e-4)
upper_optimizer = keras.optimizers.Nadam(lr=1e-3)

In [11]:
for epoch in range(1, n_epochs + 1):
    
    print(f"Epoch {epoch}/{n_epochs}")
    
    for step in range(1, n_steps + 1):
        batch_idx = np.random.randint(len(X_train), size=batch_size)
        X_batch, y_batch = X_train[batch_idx], y_train[batch_idx]
        
        with tf.GradientTape(persistent=True) as tape:
            y_pred = model(X_batch)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
            
        for layers, optimizer in ((lower_layers, lower_optimizer), (upper_layers, upper_optimizer)):
            gradients = tape.gradient(loss, layers.trainable_variables)
            optimizer.apply_gradients(zip(gradients, layers.trainable_variables))
        del tape
        
        for variable in model.variables:
            if variable.constraint is not None:
                variable.assign(variable.constraint(variable))
            
        status = OrderedDict()
        mean_loss(loss)
        status['loss'] = mean_loss.result().numpy()
        
        for metric in metrics:
            metric(y_batch, y_pred)
            status[metric.name] = metric.result().numpy()
            
        print(f"\rIteration {step}/{n_steps} - loss: {status['loss']:.4f} - {metric.name}: {status[metric.name]:.4f}", end='')
            
    y_pred = model(X_val)
    status['val_loss'] = np.mean(loss_fn(y_val, y_pred))
    status['val_accuracy'] = np.mean(keras.metrics.sparse_categorical_accuracy(
        tf.constant(y_val, dtype=np.float32), y_pred))
    
    print(f" - val_loss: {status['val_loss']:.4f} - val_accuracy: {status['val_accuracy']:.4f}")
    
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5
Iteration 1718/1718 - loss: 1.0429 - sparse_categorical_accuracy: 0.6892 - val_loss: 0.6890 - val_accuracy: 0.7810
Epoch 2/5
Iteration 1718/1718 - loss: 0.6449 - sparse_categorical_accuracy: 0.7821 - val_loss: 0.5885 - val_accuracy: 0.8020
Epoch 3/5
Iteration 1718/1718 - loss: 0.5771 - sparse_categorical_accuracy: 0.8008 - val_loss: 0.5477 - val_accuracy: 0.8124
Epoch 4/5
Iteration 1718/1718 - loss: 0.5441 - sparse_categorical_accuracy: 0.8104 - val_loss: 0.5253 - val_accuracy: 0.8128
Epoch 5/5
Iteration 1718/1718 - loss: 0.5285 - sparse_categorical_accuracy: 0.8143 - val_loss: 0.5140 - val_accuracy: 0.8218
