In [1]:
# relevant imports 
import numpy as np
import tensorflow as tf

- gradient tape is a way that let us optimize a function with respect to certain parameter(s)
    - its name is derived from the old casette tapes where you record the events and then it has the ability to navigate forward or backward 
    - so we only record the forward propagation and it will record the operations in order and figure out by itself how to perform backprop (mine: using computation graph) and compute the gradients in the way

In [2]:
# define the parameters we have 
w = tf.Variable(0, dtype = tf.float32) # we defined a variable with type float and initialized it with a value of 0
# define the optimization algorithm that we will use 
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.0>


In [3]:
# dfine the loop function 
def single_train_loop():
    # open the tape to record the sequence of opetations needed to compute the function (forward prop)
    with tf.GradientTape() as tape:
        # in here we write how the function is calculated w.r.t its parameters (the forward prop step) for the gradient_tape to record them 
        cost = w ** 2 - 10 * w + 25
    # we have to define what are the trainable variables (that we want to calculate their gradients)
    trainable_variables = [w] # that  is why we had to define the w as a variable above 
    # Then we compute the gradients using the tape we opened, we give to it both the function and its trainable variables  
    grads = tape.gradient(cost, trainable_variables)
    # After calculating the gradients, we use the optimizer to carry out the update on the variables 
    optimizer.apply_gradients(zip(grads, trainable_variables)) # we zip them to pair the list of gradients with their corresponding list of variables 

In [4]:
a = [1,2,3]
b = ['a','b','c']
for item in zip(a,b):
    print(item)

(1, 'a')
(2, 'b')
(3, 'c')


In [5]:
# let's run one step of the loop and print the w
single_train_loop()
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.09999931>


- we see that it increased a little bit (mine: the gradient tape carried out the backward prop to calculate the gradients based on the trainable parameters and update them)

In [6]:
# lets run it for 1000 steps 
num_iterations = 1000
for i in range(num_iterations):
    single_train_loop()
    
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=5.000001>


- we see that w is nearly 5 (which we knew was the minimum of the cost function)
    - all we needed is to specify the cost function and the sequence in which the cost is calculated for the gradient tape -forward prop- and it will carry out the backward prop to calculate the gradients then update the parameters based on our specified optimization algorithm 
- that is why in TF we only have to implement the forward prop step 

### another method 
- we will see another syntax to do the exact same thing above 
- and this time our cost function will not only be a function of the learnable parameters, but also of other parameters (like in real life where the cost function is a function of the learnable parameters, hyper parameters and the data)
    - so if they change they will affect the cost function and we may end up with another values that minimize the cost (just like real life)

In [7]:
# define the parameters we have 
w = tf.Variable(0, dtype = tf.float32) # we defined a variable with type float and initialized it with a value of 0
# define the optimization algorithm that we will use 
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
# define another variable that exists in the cost function 
x = np.array([1.0, -10.0, 25.0], dtype = np.float32) # we defined it as a numpy array and not a tf variable 
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.0>


In [8]:
def cost_fun():
    return x[0] * w**2 + x[1] * w + x[2]
    
optimizer.minimize(cost_fun, [w]) # this is equivalent to the above single_train_loop
print(w)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.09999931>


In [11]:
# lets encapsulate the above in a function 
def training(x,w,optimizer,num_iterations = 1000):
    def cost_fun():
        return x[0] * w**2 + x[1] * w + x[2]
    for i in range(num_iterations):
        optimizer.minimize(cost_fun,[w])
        
    return w

In [12]:
w = training(x,w,optimizer)
w

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=5.000001>

we see it worked, let's try different values of x and see how they affect the minimum of the cost function

In [14]:
w = tf.Variable(0, dtype = tf.float32) # we defined a variable with type float and initialized it with a value of 0
# define the optimization algorithm that we will use 
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
# define another variable that exists in the cost function 
x = np.array([5.0, -7.0, 13.0], dtype = np.float32) # we defined it as a numpy array and not a tf variable

In [15]:
w = training(x,w,optimizer)
w

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.7>

- hopefully we have a sense of what TF can do, and all what we need is to specify the cost function (or the forward prop steps in general) and what are the variables that we want to optimise the function with respect to them, and it will figure out the backward prop 
- what actually happens is that it constructs a computation graph when we specify the forward prop sequence of operations 