# SIT744 Practical 4: Second look at TensorFlow and Keras


*Prof. Antonio Robles-Kelly*




<div class="alert alert-info">
We suggest that you run this notebook using Google Colab.
</div>

## Pre-practical readings

- [TensorFlow tensors](https://www.tensorflow.org/guide/tensor)
- [TensorFlow Variables](https://www.tensorflow.org/guide/variable)



## Task 1. Low-level tensor manipulation in TensorFlow

TensorFlow APIs cover three deep-learning components:

- Tensors, including variables for keeping layer weights
- Tensor operations, including tensor multiplication,  addition, and the activation functions
- Backpropagation, implemented through the GradientTape object

In this task, we will learn how to use these TensorFlow components. 

### Task 1.1 Tensors and tensor operations



#### tf.Tensor vs tensors

We mentioned that tensors (with lower-case t) are like NumPy arrays. However, [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) (with upper-case T) is different. A tf.Tensor defines a function (or computation) that, when called, produces a tensor. Such a function is called an **Operator** in the TensorFlow nomenclature.

Therefore, a TensorFlow program forms a computation graph that chains a collection of tf.Tensor's together. You can think that the program  itself contains no data. When we invoke the program, data (tensors) are generated or passed in and flow through operators in the program. But how can we store the model parameters and train the model parameters?



#### Constant tensors and Variables

Assuming that we separate data from computation. Then `tf.constant` and `tf.Variable` are the two main `tf.Tensor` that directly deal with data. Data in `tf.constant` is immutable and data in `tf.Variable` can be changed.

Constant tensors return the same data at every invocation of the computation graph.


In [43]:

import tensorflow as tf

a = tf.constant([[1, 2],
                 [3, 4]])
print(type(a))


<class 'tensorflow.python.framework.ops.EagerTensor'>


As you can see that `tf.constant` returns a tensor. There are some other functions that return constant tensors.

In [None]:
x = tf.ones(shape=(2, 2))
print(x)
x = tf.zeros(shape=(2, 2))
print(x)

tf.Tensor(
[[1. 1.]
 [1. 1.]], shape=(2, 2), dtype=float32)
tf.Tensor(
[[0. 0.]
 [0. 0.]], shape=(2, 2), dtype=float32)


A particularly important class of functions are those used for generating random initial weights in a network. They produced constant tensors as the initial values for variables.  

In [41]:
## Normal initialiser
w_init = tf.random_normal_initializer(
    mean=10.0, stddev=0.05, seed=None
)

initial_value=w_init(shape=(2, 3),
                     dtype='float32')
print(type(initial_value))
print(initial_value)


<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[ 9.93228  10.035228 10.001833]
 [10.043459 10.021921  9.97328 ]], shape=(2, 3), dtype=float32)


In [45]:
initial_value

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 0.04221221,  0.01397283],
       [ 0.02129208, -0.02218554]], dtype=float32)>

In [65]:
## Uniform initialiser
w_init = tf.random_uniform_initializer(
    minval = -0.05, maxval = 0.05, seed=None
)

initial_value=w_init(shape=(2, 2),
                     dtype='float32')

print(initial_value)




tf.Tensor(
[[-0.0188207   0.03263413]
 [ 0.01849456 -0.04932909]], shape=(2, 2), dtype=float32)


**exercise** Try to assign a new value to a constant tensor. Can you do that?

In comparison to constants, variables can be assigned a different value. They are required to represent trainable network/layer weights

In [66]:
w = tf.Variable(initial_value=initial_value,
                         trainable=False)
print(w)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[-0.0188207 ,  0.03263413],
       [ 0.01849456, -0.04932909]], dtype=float32)>


In [67]:
# w = w.assign_sub(tf.constant(0.1, shape=(2, 2)))

In [68]:
with tf.GradientTape() as paper:
   z = tf.sqrt(w)

dz_dx = paper.gradient(z, [w])
print(dz_dx)

[None]


You can use the `assign` function to change the value in a variable.

In [26]:
print(w[0,0])


w[0,0].assign(0)
print(w[0,0])

tf.Tensor(-0.018820703, shape=(), dtype=float32)


InvalidArgumentError: Cannot assign a device for operation ResourceStridedSliceAssign: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ResourceStridedSliceAssign: CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  ref (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  ResourceStridedSliceAssign (ResourceStridedSliceAssign) /job:localhost/replica:0/task:0/device:GPU:0

Op: ResourceStridedSliceAssign
Node attrs: Index=DT_INT32, shrink_axis_mask=3, new_axis_mask=0, begin_mask=0, ellipsis_mask=0, end_mask=0, T=DT_FLOAT
Registered kernels:
  device='XLA_CPU_JIT'; Index in [DT_INT32, DT_INT64]; T in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64, DT_FLOAT8_E5M2, DT_FLOAT8_E4M3FN, DT_INT4, DT_UINT4]
  device='DEFAULT'; T in [DT_INT32]
  device='CPU'; T in [DT_UINT64]
  device='CPU'; T in [DT_INT64]
  device='CPU'; T in [DT_UINT32]
  device='CPU'; T in [DT_UINT16]
  device='CPU'; T in [DT_INT16]
  device='CPU'; T in [DT_UINT8]
  device='CPU'; T in [DT_INT8]
  device='CPU'; T in [DT_INT32]
  device='CPU'; T in [DT_HALF]
  device='CPU'; T in [DT_BFLOAT16]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_COMPLEX64]
  device='CPU'; T in [DT_COMPLEX128]
  device='CPU'; T in [DT_BOOL]
  device='CPU'; T in [DT_STRING]
  device='CPU'; T in [DT_RESOURCE]
  device='CPU'; T in [DT_VARIANT]
  device='CPU'; T in [DT_QINT8]
  device='CPU'; T in [DT_QUINT8]
  device='CPU'; T in [DT_QINT32]
  device='CPU'; T in [DT_FLOAT8_E5M2]
  device='CPU'; T in [DT_FLOAT8_E4M3FN]

	 [[{{node ResourceStridedSliceAssign}}]] [Op:ResourceStridedSliceAssign] name: strided_slice/_assign

### Task 1.2 Math operations in TensorFlow

The transformation of tensors are achieved by matrix multiplications, additions, reshaping, and activation functions.

In [None]:
a = tf.ones((2, 2))
print(f'a: {a}')
b = tf.square(a)
print(f'b: {b}')
c = tf.sqrt(a)
print(f'c: {c}')
d = b + c
print(f'd = b + c: {d}')
e = tf.matmul(a, b)
print(f'e = tf.matmul(a, b): {e}')
e *= d
print(f'e *= d: {e}')

a: [[1. 1.]
 [1. 1.]]
b: [[1. 1.]
 [1. 1.]]
c: [[1. 1.]
 [1. 1.]]
d = b + c: [[2. 2.]
 [2. 2.]]
e = tf.matmul(a, b): [[2. 2.]
 [2. 2.]]
e *= d: [[4. 4.]
 [4. 4.]]


### Task 1.3 Performing differentiation with tensor operations

Tensor operations come with the ability to perform automatic differentiation.

In [70]:
x = tf.Variable(initial_value= tf.ones((2,2)))

print('x:\n', x)
with tf.GradientTape() as tape:
   y = tf.square(x)
print('y:\n', y)
dy_dx = tape.gradient(y, x)
print('dy_dx:\n', dy_dx)



x:
 <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 1.],
       [1., 1.]], dtype=float32)>
y:
 tf.Tensor(
[[1. 1.]
 [1. 1.]], shape=(2, 2), dtype=float32)
dy_dx:
 tf.Tensor(
[[2. 2.]
 [2. 2.]], shape=(2, 2), dtype=float32)


In [71]:
tape.gradient(y, x)

RuntimeError: A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)

In [33]:
with tf.GradientTape() as paper:
   z = tf.sqrt(x)

dz_dx = paper.gradient(z, [x])


print(dz_dx)

[<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0.5, 0.5],
       [0.5, 0.5]], dtype=float32)>]


**exercise** Modify the code above to compute the gradient of the exponential function $y=e^{x}$.

**question** Can you call `tape.gradient` twice?


## Task 2  Reimplementing a Keras model in TensorFlow

Last week, we used a Keras model to run through the MNIST example. In this practical, we learn how to reimplement the model without using Keras. This will deepen your understanding of some key concepts.



### Task 2.1 A simple Dense class

We know that a dense layer is essentially performing an affine transformation followed by an activation function.

> output = activation(dot(W, input) + b)



We can define a class for network layers.


In [79]:
class NaiveDense:

    def __init__(self, units, input_dim, activation):
        self.activation = activation

        W_init = tf.random_normal_initializer()
        self.W = tf.Variable(initial_value=W_init(shape=(input_dim, units),
                                              dtype='float32'),
                         trainable=True)

        b_init = tf.zeros_initializer()
        self.b = tf.Variable(initial_value=b_init(shape=(units,),
                                              dtype='float32'),
                         trainable=True)
        
    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)

    @property
    def weights(self):
        return [self.W, self.b]

Let's try to pass a tensor through the layer.

In [80]:
relu_layer = NaiveDense(units=10, input_dim=2, activation = tf.nn.relu)

x = tf.ones((2, 2))
y = relu_layer(x)
print(y)



tf.Tensor(
[[0.00330251 0.0982748  0.         0.         0.0186949  0.
  0.04624053 0.11196592 0.         0.        ]
 [0.00330251 0.0982748  0.         0.         0.0186949  0.
  0.04624053 0.11196592 0.         0.        ]], shape=(2, 10), dtype=float32)


You can compare this with the original implementation from Keras. The difference in values is due to random initialisation.

In [81]:
keras_layer = tf.keras.layers.Dense(units=10, activation = tf.nn.relu)
y = keras_layer(x)
print(y)

tf.Tensor(
[[0.         0.         0.32536697 0.         0.         0.
  0.         0.01500052 0.         0.        ]
 [0.         0.         0.32536697 0.         0.         0.
  0.         0.01500052 0.         0.        ]], shape=(2, 10), dtype=float32)


**question** Do you see negative values in the output? Why? Are we using the 10 output units effectively?

### Task 2.2 A simple Sequential class

Once we have defined some layers, we can chain them together. Let's define a class similar to the Sequential Model in Keras.







In [82]:
class NaiveSequential:

    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x

    @property
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

It takes a list of layers and returns a model.

In [83]:
model = NaiveSequential([
    NaiveDense(units=512, input_dim=28 * 28, activation=tf.nn.relu),
    NaiveDense(units=10, input_dim=512,  activation=tf.nn.softmax)
])


Let's try to feed the model two identical "images".


In [84]:
model(tf.ones((2, 28 * 28)))

<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[0.03235335, 0.19014378, 0.07470462, 0.14178684, 0.03212496,
        0.05677064, 0.2146402 , 0.16345944, 0.03473785, 0.05927834],
       [0.03235335, 0.19014378, 0.07470462, 0.14178684, 0.03212496,
        0.05677064, 0.2146402 , 0.16345944, 0.03473785, 0.05927834]],
      dtype=float32)>

### Task 2.3 A batch generator

To run stochastic gradient-descent, we need to feed the model with mini-batches of the input data. Later on, we will learn how to build TensorFlow input pipelines with `tf.data`. Here we will create a simple iterator for retrieving training batches.

In [85]:
class BatchGenerator:

    def __init__(self, images, labels, batch_size=128):
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size

    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

Let's try it on the MNIST data.

In [86]:
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

batch_generator = BatchGenerator(train_images, train_labels)
x, y = batch_generator.next()
print(f'x: {x.shape}')
print(f'y: {y.shape}')

x: (128, 784)
y: (128,)


## Task 3 Training the model

As mentioned in the lecture. training a neural network involves a loop with the following steps:

1. Compute the predictions of the examples in the batch
2. Compute the loss value for these predictions given the actual labels
3. Compute the gradient of the loss with regard to the model’s weights
4. Move the weights by a small amount in the direction opposite to the gradient

These four steps comprise one **training step**.

In [87]:
learning_rate = 1e-3

def update_weights(gradients, weights):
    for g, w in zip(gradients, model.weights):
        w.assign_sub(g * learning_rate)

def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
      predictions = model(images_batch) ## 1. Compute the predictions of the examples in the batch 
      per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
          labels_batch, predictions) 
      average_loss = tf.reduce_mean(per_sample_losses) ## 2. Compute the loss value for these predictions given the actual labels
    gradients = tape.gradient(average_loss, model.weights) ## 3. Compute the gradient of the loss with regard to the model’s weights
    update_weights(gradients, model.weights) ## 4. Move the weights by a small amount in the direction opposite to the gradient
    return average_loss

Let's test-run it with a training batch.

In [88]:
one_training_step(model, x, y)

<tf.Tensor: shape=(), dtype=float32, numpy=2.3479156>

Knowing that it is working, we can add a for-loop. Actually, we will use two nested for-loops: the outer for-loop to keep track of the number of epochs and the inner for-loop to iterate through the whole training data (multiple mini-batches). 



In [89]:
def fit(model, images, labels, epochs, batch_size=64):
    for epoch_counter in range(epochs):
      print('Epoch %d' % epoch_counter)
      batch_generator = BatchGenerator(images, labels)
      for batch_counter in range(len(images) // batch_size):
          images_batch, labels_batch = batch_generator.next()
          loss = one_training_step(model, images_batch, labels_batch)
          if batch_counter % 100 == 0:
              print('loss at batch %d: %.2f' % (batch_counter, loss))

Now we are ready to train the model. Specify 5 epochs.

In [90]:
fit(model, train_images, train_labels, epochs=50, batch_size=128)

Epoch 0
loss at batch 0: 2.35
loss at batch 100: 2.24
loss at batch 200: 2.15
loss at batch 300: 2.03
loss at batch 400: 1.96
Epoch 1
loss at batch 0: 1.86
loss at batch 100: 1.83
loss at batch 200: 1.71
loss at batch 300: 1.64
loss at batch 400: 1.61
Epoch 2
loss at batch 0: 1.50
loss at batch 100: 1.51
loss at batch 200: 1.38
loss at batch 300: 1.35
loss at batch 400: 1.35
Epoch 3
loss at batch 0: 1.23
loss at batch 100: 1.27
loss at batch 200: 1.13
loss at batch 300: 1.14
loss at batch 400: 1.16
Epoch 4
loss at batch 0: 1.04
loss at batch 100: 1.08
loss at batch 200: 0.95
loss at batch 300: 0.99
loss at batch 400: 1.02
Epoch 5
loss at batch 0: 0.90
loss at batch 100: 0.94
loss at batch 200: 0.82
loss at batch 300: 0.87
loss at batch 400: 0.92
Epoch 6
loss at batch 0: 0.80
loss at batch 100: 0.84
loss at batch 200: 0.72
loss at batch 300: 0.79
loss at batch 400: 0.85
Epoch 7
loss at batch 0: 0.73
loss at batch 100: 0.76
loss at batch 200: 0.65
loss at batch 300: 0.72
loss at batch 40

KeyboardInterrupt: 

**exercise** 

1. Write a program to evaluate the accuracy of the model and evaluate the accuracy on both training and test datasets.

2. Modify the code above so that you collect the gradients at each layer and each epoch.


## Additional resources

- [Tensorflow, The Confusing Parts (1)](https://jacobbuckman.com/2018-06-25-tensorflow-the-confusing-parts-1/). Don't worry if some parts are still "confusing" after reading this.