# Lab 8
## Implementing the Softmax Regression
In this lab we will apply softmax regression to the MNIST data.

In a first step let’s import all needed packages, load and prepare the MNIST dataset.


In [None]:
import numpy as np
import tensorflow as tf

# Load MNIST data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()


### Prepare images
The images come in matrix form of 28x28 uint8 values. Moreover, there exists no
validation data split. Therefore, we first convert the images into arrays of shape (784,)
and datatype float32 whose values lie between 0.0 and 1.0. Additionally, we split off a validation dataset from the training dataset.

In [None]:
# Scale images
x_train = x_train / 255.0
x_test = x_test / 255.0

# Flatten images
x_train = x_train.reshape([len(x_train), -1]).astype("float32")
x_test = x_test.reshape([len(x_test), -1]).astype("float32")

# Split off validation dataset from training dataset
indices = np.random.choice(len(y_train), 5000, replace=False)
x_valid = x_train[indices, :]
y_valid = y_train[indices]
x_train = np.delete(x_train, indices, axis=0)
y_train = np.delete(y_train, indices, axis=0)

### Create Datasets
Next we have to convert the image labels (y_train, y_valid and y_test) into one-
hot vectors as described in subsection 2.1. All three splits are then converted to a
tf.data.Dataset which allows us to apply some preprocessing operations (such as
batching and shuffling) and lets us efficiently loop over the dataset split. For more infor-
mation about TensorFlow input pipelines see [here](https://www.tensorflow.org/guide/data).

In [None]:
# Convert labels to one-hot tensor
y_train = tf.one_hot(y_train, 10)
y_test = tf.one_hot(y_test, 10)
y_valid = tf.one_hot(y_valid, 10)

# Create datasets
BATCH_SIZE = 32
train_ds = (
    tf.data.Dataset.from_tensor_slices((x_train, y_train))
    .shuffle(len(y_train), reshuffle_each_iteration=True)
    .batch(BATCH_SIZE)
)
valid_ds = tf.data.Dataset.from_tensor_slices((x_valid, y_valid)).batch(BATCH_SIZE)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)


### Define Model
As you can see in the above code fragment, here we use a mini-batch size of 32.
Now we finally get to build our softmax regression model. In contrast to TensorFlow 1,
in TensorFlow 2 it is very simple to define a model. In fact, there are many different ways
6
to define a model in TensorFlow 2 (see [here](https://www.tensorflow.org/guide/keras/sequential_model) for more information). Here we will use the
subclassing method, whereby we inherit from the class tf.keras.Model.

In [None]:
class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.d1 = tf.keras.layers.Dense(10, use_bias=True)
        self.s1 = tf.keras.layers.Softmax()

    def call(self, x):
        x = self.d1(x)
        return self.s1(x)

# Create an instance of the model
model = MyModel()

Thanks to this inheritance concept we can use the methods of the parent class tf.keras.Model
(you might know this concept from Python programming). In the class constructor
__init__() we define all the layers which we need in order to build the model. In
case of softmax regression we need a Dense layer (for the matix multiplication) and a
Softmax layer (for probability conversion). These methods are then invoked in the cor-
rect order in the call method of the same class. Next we instantiate an object of this
child class. Note that call( ) is an overridden parent class method which is called
when the instance model is called.

### Define Loss, Optimizer and Metrics
Next we need to define a loss function (cross-entropy in our case) and a way to minimize
this loss function (Stochastic gradient decent with a learning-rate of 0.01 in our case).
Furthermore, we will instantiate two metric classes for each dataset split, which lets us
easily accumulate the results over all iterations.

In [None]:
# Choose an optimizer and loss function for training:
loss_object = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

# Select metrics to measure the loss and the accuracy of the model.
# These metrics accumulate the values over epochs and then print the overall result.
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')

valid_loss = tf.keras.metrics.Mean(name='valid_loss')
valid_accuracy = tf.keras.metrics.CategoricalAccuracy(name='valid_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')

### Training, Validation and Testing step
The training, validation and testing process is performed with mini-batches of 32 data
points. In a next step we have to specify the operations which need to be carried out on
such a mini-batch.
A training step looks similar to what we’ve seen in last week’s lab. We have to calculate
the gradients which are subtracted from all the trainable variables of our model. Addi-
tionally, we accumulate the loss and accuracy value of each training step by calling the
7
train_loss and train_accuracy object respectively.
In a validation and testing step we simply calculate the loss and accuracy and accumulate
the values.

In [None]:
# Use tf.GradientTape to train the model
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
        # training=True is only needed if there are layers with different
        # behavior during training versus inference (e.g. Dropout).
        predictions = model(images, training=True)
        loss = loss_object(labels, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss(loss)
    train_accuracy(labels, predictions)

# Validate the model
@tf.function
def valid_step(images, labels):
    # training=False is only needed if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
    predictions = model(images, training=False)
    t_loss = loss_object(labels, predictions)

    valid_loss(t_loss)
    valid_accuracy(labels, predictions)

# Test the model
@tf.function
def test_step(images, labels):
    # training=False is only needed if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
    predictions = model(images, training=False)
    t_loss = loss_object(labels, predictions)

    test_loss(t_loss)
    test_accuracy(labels, predictions)

### Training
Now we finally get to train, validate and test our model. Specifically, we train and validate
our model for 25 epochs (in each epoch we run over the whole training dataset and
validation dataset). At the end we test whether our model also generalizes to previously
unseen data points by evaluating the loss and accuracy of the test dataset.

In [None]:
EPOCHS = 25
for epoch in range(EPOCHS):
    # Reset the metrics at the start of the next epoch
    train_loss.reset_state()
    train_accuracy.reset_state()
    valid_loss.reset_state()
    valid_accuracy.reset_state()

    for images, labels in train_ds:
        train_step(images, labels)

    for valid_images, valid_labels in valid_ds:
        valid_step(valid_images, valid_labels)

    print(
        "Epoch {:2d}: ".format(epoch + 1),
        "Train Loss: {:3.3f}, ".format(train_loss.result()),
        "Train Accuracy: {:3.3f}%, ".format(train_accuracy.result() * 100),
        "Validation Loss: {:3.3f}, ".format(valid_loss.result()),
        "Validation Accuracy: {:3.3f}%".format(valid_accuracy.result() * 100),
    )

# Test resulting classifier
test_loss.reset_state()
test_accuracy.reset_state()
for test_images, test_labels in test_ds:
    test_step(test_images, test_labels)

print(
    "\nTesting result: ",
    "Test Loss: {:3.3f}, ".format(test_loss.result()),
    "Test Accuracy: {:3.3f}%".format(test_accuracy.result() * 100),
)