In [1]:
# suppress tensorflow logging, usually not useful unless you are having problems with tensorflow or accessing gpu
# it seems necessary to have this environment variable set before tensorflow is imported, or else it doesn't take effect
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

# imports generally useful throughout the notebook
# usually all imports should happen at the top of a notebook, but in
# these notebooks where the purpose is to show how to use the Keras API
# the relevant imports will happen in the cells where the API is discussed
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# global settings for notebook output and images
plt.rcParams['figure.figsize'] = (8, 8) # set default figure size, 10in by 8in
np.set_printoptions(precision=4, suppress=True)

E0000 00:00:1749169445.913689   54301 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749169445.918276   54301 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749169445.934487   54301 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749169445.934526   54301 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749169445.934528   54301 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749169445.934530   54301 computation_placer.cc:177] computation placer already registered. Please check linka

In [2]:
# import project defined modules / functions used in this notebook
# ensure that the src directory where project modules are found is on
# the PYTHONPATH
import sys
sys.path.append("../src")

# assignment function imports for doctests and github autograding
# these are required for assignment autograding
from nndl import vectorize_samples, plot_history

In [3]:
# if want to restrict to cpu or gpu, configure visible device for rest of notebook to use
dev = tf.config.list_physical_devices()
print('Physical Devices : ', dev)

#tf.config.set_visible_devices(dev[0])
#tf.config.set_visible_devices(dev[1])
#dev = tf.config.list_logical_devices()
print('Available Devices : ', dev)

Physical Devices :  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Available Devices :  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


# Chapter 7: Working with Keras: A deep dive

Supporting materials for:

Chollet (2021). *Deep Learning with Python*. 2nd ed. Manning Publications Co.
[Amazon](https://www.amazon.com/Learning-Python-Second-Fran%C3%A7ois-Chollet/dp/1617296864/ref=sr_1_1?crid=32NFM2SBCJVQQ)

After the three practical examples from the previous unit, you should be starting to feel
familiar with how to approach classification and regression problems using neural networks.

You've also experienced some discussion on the central problem of machine learning: **overfitting**.

In this unit, we are going to take a more detailed look at the Keras API.  A better understanding of some
of the details of the different `keras` workflows will help you to better understand the examples
and models we will develop in upcoming chapters.


In [4]:
# need to reuse function from section 7.3 below
from tensorflow.keras.datasets import mnist

# create a model, separate function for reuse, example
# here again of using Functional API to create the model, though
# it is a simple sequential multi-class classification with a dense
# layer and some dropout regularization
def get_mnist_model():
    inputs = keras.Input(shape=(28 * 28,))
    features = layers.Dense(512, activation="relu")(inputs)
    features = layers.Dropout(0.5)(features)
    outputs = layers.Dense(10, activation="softmax")(features)
    model = keras.Model(inputs, outputs)
    return model

# load and normalize the mnist data, reserviing some of the 60000
# in train set for validation
(images, labels), (test_images, test_labels) = mnist.load_data()
images = images.reshape((60000, 28 * 28)).astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28)).astype("float32") / 255
train_images, val_images = images[10000:], images[:10000]
train_labels, val_labels = labels[10000:], labels[:10000]

## 7.4 Writing your own Training and Evaluation loops

As a reminder, the contents of a typical training loop for supervised learning look like this:

1. Run the forward pass (compute the model’s output) inside a gradient tape to
   obtain a loss value for the current batch of data.
2. Retrieve the gradients of the loss with regard to the model’s weights.
3. Update the model’s weights so as to lower the loss value on the current batch
   of data.

These steps are repeated for as many batches as neessary.  This is essentially what the `fit()`
method does under the hood.

In this section we look at reimplementing `fit()` from scratch, so that if needed you can add
bells and whistles or implement any training algorithm you may need to.

### 7.4.1 Training versus inference

Some Keras layers, such as the `Dropout` layer that you saw, have different behaviors during *training* and during
*inference*.  For such layers you need to set the `training=True` argument of the `call()` function when performing
the forward pass, and `training=False` during inference.  Thus generically the forward pass if we are writing it
by hand is usually something like

```python
predictions = model(inputs, training=True)
```

In addition when retrieving the gradients to perform the backward pass you should use:

```python
tape.gradients(loss, model.trainable_weights)
```

because layers and models own two kinds of weights:

- **Trainable weights** These are meant to be updated via backpropagation to minimize
  the loss of the model, such as the `Dense` layers weights and biases.
- **Non-trainable weights** These are meant to be updated during the forward pass
  by the layers that own them.

So given these ideas, a supervised-learning step in Keras ends up looking like the following:

```python
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_fn(targets, predictions)
        gradients = tape.gradients(loss, model.trainable_weights)
        optimizer.apply_gradients(zip(model.trainable_weights, gradients))
```

### 7.4.2 Low-level usage of metrics

In a low-level training loop, you will probably want to leverage Keras metrics. 
To use a metric by hand, simply call `update_state(y_true, y_pred)` on the metric object
for each batch of targets and predictions, and then use `result()` to query the current metric value:

In [5]:
metric = keras.metrics.SparseCategoricalAccuracy()
targets = [0, 1, 2]
predictions = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
metric.update_state(targets, predictions)
current_result = metric.result()
print(f"result: {current_result:.2f}")


result: 1.00


I0000 00:00:1749169911.405308   54301 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9706 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6


You may also need to track the average of a scalar value, such as the model's loss.  You
can do this using `keras.metrics.Mean` metric:

In [6]:
values = [0, 1, 2, 3, 4]
mean_tracker = keras.metrics.Mean()
for value in values:
    mean_tracker.update_state(value)
print(f"Mean of values: {mean_tracker.result():.2f}")

Mean of values: 2.00


Remember to use `metric.reset_state()` when you want to reset the current results at the start of a training
epoch or at the start of evaluation.

### 7.4.3 A complete training and evaluation loop

As an example, lets combine the forward pass, backward pass and metrics tracking into a `fit()` like
training step function that takes a batch of data and targets and returns the logs that
would get displayed by the `fit()` progress bar.

In [7]:
model = get_mnist_model()

# prepare the loss function, optimizer and list of metrics to monitor
loss_fn = keras.losses.SparseCategoricalCrossentropy()
optimizer = keras.optimizers.RMSprop()
metrics = [keras.metrics.SparseCategoricalAccuracy()]

# use a metrice.mean to track the loss average
loss_tracking_metric = keras.metrics.Mean()


def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_fn(targets, predictions)
    gradients = tape.gradient(loss, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

    # keep track of metrics
    logs = {}
    for metric in metrics:
        metric.update_state(targets, predictions)
        logs[metric.name] = metric.result()

    # keep track of the loss average
    loss_tracking_metric.update_state(loss)
    logs["loss"] = loss_tracking_metric.result()

    # return the current values of the metrics and loss
    return logs

We need to reset the state of our metrics at the start of each epoch (the above function is called to update
for each batch created during an epoch of training).

In [8]:
def reset_metrics():
    for metric in metrics:
        metric.reset_state()
        loss_tracking_metric.reset_state()

We can now lay out our complete training loop.  Not that we use a `tf.data.Dataset` object to turn
our NumPy data into an iterator that iterates over the data in batches of size 32.

In [9]:
training_dataset = tf.data.Dataset.from_tensor_slices(
    (train_images, train_labels))
training_dataset = training_dataset.batch(32)
epochs = 3

for epoch in range(epochs):
    reset_metrics()
    for inputs_batch, targets_batch in training_dataset:
        logs = train_step(inputs_batch, targets_batch)
    print(f"Results at the end of epoch {epoch}")
    for key, value in logs.items():
        print(f"...{key}: {value:.4f}")

Results at the end of epoch 0
...sparse_categorical_accuracy: 0.9136
...loss: 0.2886
Results at the end of epoch 1
...sparse_categorical_accuracy: 0.9540
...loss: 0.1623
Results at the end of epoch 2
...sparse_categorical_accuracy: 0.9639
...loss: 0.1291


And to complete the by hand example, similarly we need an evaluation loop.  A simple loop that repeatedly
calls a `test_step()` function, which processes a single batch of data.  

In [10]:
# this is really a subset of the train_step, just omitting the
# calculation of gradients and weight update.  Notice that
# training=false
def test_step(inputs, targets):
    predictions = model(inputs, training=False)
    loss = loss_fn(targets, predictions)
    logs = {}
    for metric in metrics:
        metric.update_state(targets, predictions)
        logs["val_" + metric.name] = metric.result()
    loss_tracking_metric.update_state(loss)
    logs["val_loss"] = loss_tracking_metric.result()
    return logs


val_dataset = tf.data.Dataset.from_tensor_slices((val_images, val_labels))
val_dataset = val_dataset.batch(32)
reset_metrics()

for inputs_batch, targets_batch in val_dataset:
    logs = test_step(inputs_batch, targets_batch)
print("Evaluation results:")
for key, value in logs.items():
    print(f"...{key}: {value:.4f}")

Evaluation results:
...val_sparse_categorical_accuracy: 0.9666
...val_loss: 0.1241


You may notice that performance is pretty bad for the previous examples.  `fit()` and `evaluate()` support many more features,
including large-scale distributed computation, which requires a bit more work.

### 7.4.4 Make it fast with tf.function

You may have noticed that your custom loops are running significantly slower than the
built-in fit() and evaluate(), despite implementing essentially the same logic.
That’s because, by default, TensorFlow code is executed line by line, *eagerly*,
e.g. it is interpreted.

It is more performant to **compile** your TensorFlow code into a
**computation graph** that can be globally optimized in a way that code 
interpreted line by line cannot.  Luckily we can simply add the `@tf.function`
decorator to any function to indicate this should be done.

In [13]:
@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_fn(targets, predictions)
    gradients = tape.gradient(loss, model.trainable_weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

    # keep track of metrics
    logs = {}
    for metric in metrics:
        metric.update_state(targets, predictions)
        logs[metric.name] = metric.result()

    # keep track of the loss average
    loss_tracking_metric.update_state(loss)
    logs["loss"] = loss_tracking_metric.result()

    # return the current values of the metrics and loss
    return logs
    
val_dataset = tf.data.Dataset.from_tensor_slices((val_images, val_labels))
val_dataset = val_dataset.batch(32)
reset_metrics()

for epoch in range(epochs):
    reset_metrics()
    for inputs_batch, targets_batch in training_dataset:
        logs = train_step(inputs_batch, targets_batch)
    print(f"Results at the end of epoch {epoch}")
    for key, value in logs.items():
        print(f"...{key}: {value:.4f}")

Results at the end of epoch 0
...sparse_categorical_accuracy: 0.9757
...loss: 0.0910
Results at the end of epoch 1
...sparse_categorical_accuracy: 0.9780
...loss: 0.0824
Results at the end of epoch 2
...sparse_categorical_accuracy: 0.9794
...loss: 0.0781


On my system, this goes from taking many seconds to run, to returning almost immediately
when add in the compile decorator.

Remember, while you are debugging your code, prefer running it eagerly, without
any @tf.function decorator. It’s easier to track bugs this way. Once your code is working
and you want to make it fast, add a @tf.function decorator to your training step
and your evaluation step—or any other performance-critical function.

### 7.4.5 Leveraging `fit()` with a custom training loop

What if you need a custom training algorithm, but you still want to leverage the
power of the built-in Keras training logic? There’s actually a middle ground between
fit() and a training loop written from scratch: you can provide a custom training
step function and let the framework do the rest.

Here is a simple example

- We create a new class that subclasses keras.Model.
- We override the method train_step(self, data).
- We implement a metrics property that tracks the model’s Metric instances.

In [14]:
# the loss_tracker will be used to track the average of per-batch losses during training
loss_fn = keras.losses.SparseCategoricalCrossentropy()
loss_tracker = keras.metrics.Mean(name="loss")

# our own Model so can override train_step
class CustomModel(keras.Model):

    # override the train_step method, this is basically the same as was developed
    # in previous example
    def train_step(self, data):
        inputs, targets = data
        with tf.GradientTape() as tape:
            # but using self(inputs, training=True) instead of model() since our
            # model is this class instance itself
            predictions = self(inputs, training=True)
            loss = loss_fn(targets, predictions)

        # NOTE: typo in textbook, these need to be self.trainable_weights(),
        # and self.optimizer.apply_gradients()
        # using model.trainable_weights() is referring to an external variable
        gradients = tape.gradient(loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_weights))

        # update the loss tracker metric that tracks the average of the loss
        loss_tracker.update_state(loss)
        # return the average loss so far by querying the loss tracker metric
        return {"loss": loss_tracker.result()}

    # any metric you would like to reset across epochs should be listed here
    @property
    def metrics(self):
        return [loss_tracker]

We can now instantiate our custom model, compile it, and train it using `fit()` as usual

In [15]:
# todo getting "numpy() is only available when eager execution is enabled.", though using TensorFlow 2.x and it appears enabled
# need to debug this example further.
#import tensorflow as tf
#tf.compat.v1.enable_eager_execution()

inputs = keras.Input(shape=(28 * 28,))
features = layers.Dense(512, activation="relu")(inputs)
features = layers.Dropout(0.5)(features)
outputs = layers.Dense(10, activation="softmax")(features)
model = CustomModel(inputs, outputs)

model.compile(optimizer=keras.optimizers.RMSprop())
model.fit(train_images, train_labels, epochs=3)

Epoch 1/3


I0000 00:00:1749170920.788243   55664 service.cc:152] XLA service 0x7cdd480225d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1749170920.788284   55664 service.cc:160]   StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
I0000 00:00:1749170920.913773   55664 cuda_dnn.cc:529] Loaded cuDNN version 90300


[1m  68/1563[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3s[0m 2ms/step - loss: 1.2701

I0000 00:00:1749170922.369365   55664 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 3ms/step - loss: 0.4455
Epoch 2/3
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - loss: 0.1641
Epoch 3/3
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - loss: 0.1260


<keras.src.callbacks.history.History at 0x7cde39541160>

A couple of points to note:

- This pattern does not prevent you from building models with the Functional
  API. You can do this whether you’re building Sequential models, Functional
   API models, or subclassed models.
- You don’t need to use a @tf.function decorator when you override train_
  step—the framework does it for you.

Now what about metrics and what about configuring the loss via `compile()?

After calling `compile()` you get access to the following:

- **self.compiled_loss** The loss function you passed to compile().
- **self.compiled_metrics** A wrapper for the list of metrics you passed, which
  allows you to call self.compiled_metrics.update_state() to update all of
  your metrics at once.
- **self.metrics** The actual list of metrics you passed to compile(). Note that it
  also includes a metric that tracks the loss, similar to what we did manually with
  our loss_tracking_metric earlier.

We can thus modify to more tightly integreate with the Keras workflow:

In [None]:
class CustomModel(keras.Model):
    def train_step(self, data):
        inputs, targets = data
        with tf.GradientTape() as tape:
            predictions = self(inputs, training=True)
            # compute the loss via the one given to the compile() method
            loss = self.compiled_loss(targets, predictions)
            
        # same note as before, need to be self.trainable_weights and self.compiled_metrics,
        # looks like bug in book
        gradients = tape.gradient(loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_weights))
        
        # likewise use the metrics from compile() and update
        self.compiled_metrics.update_state(targets, predictions)
            
        # return a dict mapping metric names to their current values
        return {m.name: m.result() for m in self.metrics}

Let's try it

In [16]:
# todo getting "numpy() is only available when eager execution is enabled.", though using TensorFlow 2.x and it appears enabled
# need to debug this example further.
inputs = keras.Input(shape=(28 * 28,))
features = layers.Dense(512, activation="relu")(inputs)
features = layers.Dropout(0.5)(features)
outputs = layers.Dense(10, activation="softmax")(features)
model = CustomModel(inputs, outputs)

# a more normal looking invocation of compile, give the optimizer, loss
# and metrics 
model.compile(optimizer=keras.optimizers.RMSprop(),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

model.fit(train_images, train_labels, epochs=3)

Epoch 1/3
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - loss: 0.4491
Epoch 2/3
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - loss: 0.1629
Epoch 3/3
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - loss: 0.1333


<keras.src.callbacks.history.History at 0x7cde38997aa0>

## Summary

Some of the important things to keep in mind about writing your own training and evaluation loops:

<font color='blue'>

- The contents of a typical training loop is to run the forward pass on a batch inside a gradient tape to obtain loss, retrieve the gradients of
  the loss with respect to the model's weights, update the weights to lower the loss on current batch of data.
- Layers and models have trainable and non-trainable weights
- You need to *compile* your TensorFlow code into a **computation graph** after debugging it in order to get expected performance.  Use `@tf.function` decorator.
- A good middle ground if you need something special in your training loop is to provide a custom training step function and let the framework do the rest.

# Chapter Summary

Things to remember about advanced features of working with the Keras API

<font color='blue'>

- Keras offers a spectrum of different workflows.  They all should smoothly inter-operate together.
- You can build models via the `Sequential` class, via the Functional API, or by
  subclassing the `Model` class.  Most of the time you'll be using the Function API as it gives a
  good balance between convenience and ability to modify for specific needs.
- The simplest way to train and evaluate a model is via the default fit() and
  evaluate() methods.
- Keras callbacks provide a simple way to monitor models during your call to
  fit() and automatically take action based on the state of the model.
- You can also fully take control of what fit() does by overriding the train_
  step() method.
