## Performing a single backpropagation step to updata the parameter values once

**Goal:** In this notebook you will see how to use tensorflow to do a single update step based on stochastic gradient descent with one data point. You will do one forward pass and one backward pass and extract the gradients of intermediate terms in the computational graph. You use them for computing the gradients of the loss w.r.t. the parameters (slope and intercept) which are needed to do one updatestep.

**Usage:** The idea of the notebook is that you try to understand the provided code by running it, checking the output and playing with it by slightly changing the code and rerunning it. 

**Dataset:** You work with a single datapoint of the systolic blood pressure and age data of 33 American women, which is generated in the upper part of the notebook . 

* read book chapter 3.4.1 check how the provided code corresponds to the step by step computations in this chapter. 

* use the tensorflow library to set up the model 
    * define a computational graph containing all intermediate terms and local gradients 
    * do a single forward pass and compute all intermediate terms
    * do a single backward pass and compute all local gradients and use them to compute the gradients of the loss w.r.t. the parameters via chain rule
    * do a single update step of the parameter values
    * verify that the computed values for the gradients and the updated parameter values correspond to the values in chapter 3.4.1.



[open in colab](https://colab.research.google.com/github/tensorchiefs/dl_book/blob/master/chapter_03/nb_ch03_04.ipynb)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')
import tensorflow as tf
print('TF Version:', tf.__version__)

TF Version: 1.13.1


#### Blood Pressure data

Here we read in the systolic blood pressure and the age of the 33 American women in our dataset.



In [0]:
# Blood Pressure data
x = [22, 41, 52, 23, 41, 54, 24, 46, 56, 27, 47, 57, 28, 48, 58,  9, 
     49, 59, 30, 49, 63, 32, 50, 67, 33, 51, 71, 35, 51, 77, 40, 51, 81]
y = [131, 139, 128, 128, 171, 105, 116, 137, 145, 106, 111, 141, 114, 
     115, 153, 123, 133, 157, 117, 128, 155, 122, 183,
     176,  99, 130, 172, 121, 133, 178, 147, 144, 217] 
x = np.asarray(x, np.float32) 
y = np.asarray(y, np.float32)

###  Doing the back propagation by hand for the example

In the next cell we take only one woman of the dataset, because we want to calculate the gradients with only one datapoint. The woman is 58 years old and has a sbp value of 153.

In [3]:
x = x[14]
y = y[14]
print(x)
print(y)

58.0
153.0


Here we define the computational graph with all the intermediate values and gradients in between, because we need them to apply the the chain rule and do the backpropagation. (see figure 3.12 in the book)

In [4]:
# Defining the graph (construction phase)

tf.reset_default_graph()                                   # “Wipe the blackboard”, construct a new graph
a_  = tf.Variable(0.0, name='a_var')                       # Variables, with starting values, will be optimized later
b_  = tf.Variable(139.0, name='b_var')                     # we name them so that they look nicer in the graph
x_  = tf.constant(x, name='x_const')                       # Constants, these are fixed tensors holding the data values and cannot be changed by the optimization
y_  = tf.constant(y, name='y_const')  


# We now do it step by step so that we can calculate the intermediate values and gradients
ax_ = a_* x_
abx_ = ax_ + b_
r_ = abx_ - y_
s_ = tf.square(r_)
mse_ = tf.reduce_mean(s_)                                 

grad_mse_s_ = tf.gradients(mse_, [s_])                      # gradient of mse_ w.r.t s_
grad_s_r_ = tf.gradients(s_, [r_])                          # gradient of s_ w.r.t r_
grad_r_abx_ = tf.gradients(r_, [abx_])                      # gradient of r_ w.r.t abx_
grad_abx_b_ = tf.gradients(abx_, [b_])                      # gradient of abx_ w.r.t b_
grad_abx_ax_ = tf.gradients(abx_, [ax_])                    # gradient of abx_ w.r.t ax_
grad_ax_a_ = tf.gradients(ax_, [a_])                        # gradient of ax_ w.r.t a_

grads_mse_a_b_ = tf.gradients(mse_, [a_,b_])                # gradient of mse_ w.r.t a_ and b_ (what we actually want)


writer = tf.summary.FileWriter("linreg/", tf.get_default_graph())
writer.close()

Instructions for updating:
Colocations handled automatically by placer.


#### Simple forward pass

Now, let's do a simple forward pass and print the resulting values for ax, abx, r, s, and the mse.

In [5]:
with tf.Session() as sess: 
    vals = sess.run([ax_,abx_,r_,s_,mse_], {a_:0,b_:139}) # Letting the variables a=3 b=1 flow through the graph
    for p in vals:
      print(p)

0.0
139.0
-14.0
196.0
196.0


#### Extracting the gradients and the updated values

In [0]:
# We add an addtional operation to the graph optimizing the mse_
train_op_ = tf.train.GradientDescentOptimizer(learning_rate=0.00002).minimize(mse_) 
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) #Doing the initialization on the concrete realization of the graph
    for i in range(1):
      _, grad_mse_s,grad_s_r, grad_r_abx_,grad_abx_b, grad_abx_ax, grad_ax_a,a,b = sess.run([train_op_, grad_mse_s_, grad_s_r_, grad_r_abx_, grad_abx_b_, grad_abx_ax_,grad_ax_a_,a_,b_])   #fetch all the gradients here 


In [7]:
print(grad_mse_s,grad_s_r,grad_r_abx_,grad_abx_b,grad_abx_ax,grad_ax_a)
print(a,b)

[1.0] [-28.0] [1.0] [1.0] [1.0] [58.0]
0.032479998 139.00056


<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/ch03_12.pdf.png" width="800" align="left" />  
Compare the results of tensorflow with the results form the book where we did the forward and the backward pass by hand. The forward pass in blue and the backward pass in red.

In [8]:
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) #Doing the initialization on the concrete realization of the graph
    for i in range(1):
      grads_mse_a_b = sess.run(grads_mse_a_b_)   #fetch the gradients of mse w.r.t a and b  
print(grads_mse_a_b)

[-1624.0, -28.0]


#### Compute the gradient of the mse w.r.t to a via the chain rule 

In [9]:
#grad_mse_a 
print(grad_mse_s[0]*grad_s_r[0]*grad_r_abx_[0]*grad_abx_ax[0]*grad_ax_a[0])

-1624.0


In [10]:
#grad_mse_b 
print(grad_mse_s[0]*grad_s_r[0]*grad_r_abx_[0]*grad_abx_b[0])

-28.0


#### Update Formula
Verify that we get the same if we do the upate "by hand". 

In [11]:
a0=0
b0=139
eta=0.00002
print(a0-eta*grads_mse_a_b[0])
print(b0-eta*grads_mse_a_b[1])

0.03248
139.00056
