## Performing a single backpropagation step to updata the parameter values once

In this notebook you will see how to use tensorflow to do a single update step based on stochastic gradient descent with one data point. You will do one forward pass and one backward pass and extract the gradients of intermediate terms in the computational graph. You use them for computing the gradients of the loss w.r.t. the parameters (slope and intercept) which are needed to do one updatestep.

**Dataset:** You work with a single datapoint of the systolic blood pressure and age data of 33 American women, which is generated in the upper part of the notebook . 

**Content:**
* use the tensorflow library to set up the model 
    * define a computational graph containing all intermediate terms and local gradients 
    * do a single forward pass and compute all intermediate terms
    * do a single backward pass and compute all local gradients and use them to compute the gradients of the loss w.r.t. the parameters via chain rule
    * do a single update step of the parameter values
    * verify that the computed values for the gradients and the updated parameter values are the same when you do it by hand


In [1]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')
import tensorflow as tf
print('TF Version:', tf.__version__)

TF Version: 2.1.0


#### Blood Pressure data

Here we read in the systolic blood pressure and the age of the 33 American women in our dataset.



In [0]:
# Blood Pressure data
x = [22, 41, 52, 23, 41, 54, 24, 46, 56, 27, 47, 57, 28, 48, 58,  9, 
     49, 59, 30, 49, 63, 32, 50, 67, 33, 51, 71, 35, 51, 77, 40, 51, 81]
y = [131, 139, 128, 128, 171, 105, 116, 137, 145, 106, 111, 141, 114, 
     115, 153, 123, 133, 157, 117, 128, 155, 122, 183,
     176,  99, 130, 172, 121, 133, 178, 147, 144, 217] 
x = np.asarray(x, np.float32) 
y = np.asarray(y, np.float32)

###  Doing the back propagation by hand for the example

In the next cell we take only one woman of the dataset, because we want to calculate the gradients with only one datapoint. The woman is 58 years old and has a sbp value of 153.

In [4]:
x = x[14]
y = y[14]
print(x)
print(y)

58.0
153.0


Here we define the computational graph with all the intermediate values and gradients in between, because we need them to apply the the chain rule and do the backpropagation.

In [0]:
# Defining the graph (construction phase)

a_  = tf.Variable(0.0, name='a_var')                       # Variables, with starting values, will be optimized later
b_  = tf.Variable(139.0, name='b_var')                     # we name them so that they look nicer in the graph
x_  = tf.constant(x, name='x_const')                       # Constants, these are fixed tensors holding the data values and cannot be changed by the optimization
y_  = tf.constant(y, name='y_const')  


# We now do it step by step so that we can calculate the intermediate values and gradients
def my_func():
  ax_ = a_* x_
  abx_ = ax_ + b_
  r_ = abx_ - y_
  s_ = tf.square(r_)
  mse_ = tf.reduce_mean(s_)                                 
  return([a_,b_,x_,y_,ax_,abx_,r_,s_,mse_])

#### Simple forward pass

Now, let's do a simple forward pass and print the resulting values for ax, abx, r, s, and the mse.

In [6]:
a_,b_,x_,y_,ax_,abx_,r_,s_,mse_=my_func()
vals = (ax_,abx_,r_,s_,mse_)
vals

(<tf.Tensor: shape=(), dtype=float32, numpy=0.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=139.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=-14.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=196.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=196.0>)

#### Extracting the gradients and the updated values

In the next two cells we will extract all gradients of the graph in a backward pass and save all single gradients into a variable. We will also calculate the gradients of our tensorflow variables a and b w.r.t the loss (mean squared error) and do one update("apply_gradients") of the slope and the intercept (we set the learning rate to 0.00002). 

In [7]:
optimizer = tf.keras.optimizers.SGD(0.00002)
with tf.GradientTape(persistent=True) as tape:
  ### get all single gradients of the backward pass
  a_,b_,x_,y_,ax_,abx_,r_,s_,mse_=my_func()        
  grad_mse_s = tape.gradient(mse_, s_)
  grad_s_r = tape.gradient(s_, r_)
  grad_r_abx_= tape.gradient(r_, abx_)
  grad_abx_b = tape.gradient(abx_, b_)
  grad_abx_ax = tape.gradient(abx_, ax_)
  grad_ax_a = tape.gradient(ax_, a_)
  ### get the gradients of a and b w.r.t the loss, here the mean squared error
  gradients = tape.gradient(mse_, [a_,b_])
  ### update the values of the slope a and the intercept b with the learning rate 
  optimizer.apply_gradients(zip(gradients,[a_,b_]))  




In [8]:
print(grad_mse_s.numpy(),grad_s_r.numpy(),grad_r_abx_.numpy(),grad_abx_b.numpy(),grad_abx_ax.numpy(),grad_ax_a.numpy())
print(a_.numpy(),b_.numpy())

1.0 -28.0 1.0 1.0 1.0 58.0
0.032479998 139.00056


<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/ch03_12.pdf.png" width="800" align="left" />  
Compare the results of tensorflow with the results form the lecture where we did the forward and the backward pass by hand. The forward pass in blue and the backward pass in red.

In [9]:
a_  = tf.Variable(0.0, name='a_var')                       # Variables, with starting values, will be optimized later
b_  = tf.Variable(139.0, name='b_var')                     # we name them so that they look nicer in the graph

with tf.GradientTape(persistent=True) as tape:
  a_,b_,x_,y_,ax_,abx_,r_,s_,mse_=my_func()        
  grads_mse_a_b = tape.gradient(mse_, [a_,b_])
grads_mse_a_b



[<tf.Tensor: shape=(), dtype=float32, numpy=-1624.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=-28.0>]

#### Compute the gradient of the mse w.r.t to a via the chain rule 

In [10]:
#grad_mse_a 
print(grad_mse_s.numpy()*grad_s_r.numpy()*grad_r_abx_.numpy()*grad_abx_ax.numpy()*grad_ax_a.numpy())

-1624.0


#### Compute the gradient of the mse w.r.t to b via the chain rule 

In [11]:
#grad_mse_b 
print(grad_mse_s.numpy()*grad_s_r.numpy()*grad_r_abx_.numpy()*grad_abx_b.numpy())

-28.0


#### Update Formula
Verify that we get the same if we do the upate "by hand".


a_new=a_old - learning_rate * grad_mse_a  
b_new=b_old - learning_rate * grad_mse_b   

In [12]:
a0=0
b0=139
eta=0.00002
print(a0-eta*grads_mse_a_b[0].numpy())
print(b0-eta*grads_mse_a_b[1].numpy())

0.03248
139.00056
