# TensorFlow Automatic Differentiation 

TensorFlow provides functions to compute the derivatives for a given TensorFlow computation graph, adding operations to the graph. The optimizer classes automatically compute derivatives on your graph, but creators of new Optimizers or expert users can call the lower-level functions below.

In [1]:
from datetime import datetime
import tensorflow as tf

### Gradient Computation

In order to compute gradient of function with respect to a variable you have to define both. Also you have to specify value at which you want to compute the gradient. 

<code>GradientTape</code> records operations for automatic differentiation.

In [2]:
x = tf.Variable(5.0)

#compute gradient of y=x**2
with tf.GradientTape() as tape:
    y = x * x

grad = tape.gradient(y, x)
print(grad)

tf.Tensor(10.0, shape=(), dtype=float32)


In [3]:
x = tf.Variable([[5.0, 10, 15.23232]])

#compute gradient of y=x**3
with tf.GradientTape() as tape:
    y = x * x * x

grad = tape.gradient(y, x)
print(grad)

tf.Tensor([[ 75.      300.      696.07074]], shape=(1, 3), dtype=float32)


In [4]:
x = tf.Variable(3.0)

#compute gradient of y=x**2+x+1 with respect to x at 3
with tf.GradientTape() as tape:
    y = x**2 + x - 1

grad = tape.gradient(y, x)
print(grad) 

tf.Tensor(7.0, shape=(), dtype=float32)


### Optimization

TensorFlow uses reverse mode automatic differentiation for it's gradients operation and finite difference method for tests that check validity of gradient operation. [Reverse mode automatic differentiation](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation)  uses an extension of the forward mode computational graph to enable the computation of a gradient by a reverse traversal of the graph.

Optimize the following:  $min (x + 1)^2$

$\frac{d}{dx} (x+1)^2 = 2*(x+1)$

In [5]:
x

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=3.0>

In [3]:
from tensorflow.python.training import gradient_descent

x = tf.Variable(3.0, trainable=True)

epochs = 100

@tf.function
def f_x():
    return x**2 + x - 1

for epoch in range(epochs):
    print("Epoch :", epoch, [x.numpy(), f_x().numpy()])
    opt = gradient_descent.GradientDescentOptimizer(0.01).minimize(f_x)
    #tf.summary.scalar('loss', f_x().numpy(), step=epoch)


Epoch : 0 [3.0, 11.0]
Epoch : 1 [2.93, 10.514901]
Epoch : 2 [2.8614001, 10.049011]
Epoch : 3 [2.794172, 9.601569]
Epoch : 4 [2.7282887, 9.171848]
Epoch : 5 [2.663723, 8.759144]
Epoch : 6 [2.6004486, 8.362782]
Epoch : 7 [2.5384398, 7.982116]
Epoch : 8 [2.477671, 7.6165237]
Epoch : 9 [2.4181175, 7.26541]
Epoch : 10 [2.3597553, 6.9282007]
Epoch : 11 [2.30256, 6.6043434]
Epoch : 12 [2.2465088, 6.293311]
Epoch : 13 [2.1915786, 5.9945955]
Epoch : 14 [2.137747, 5.7077093]
Epoch : 15 [2.0849922, 5.432184]
Epoch : 16 [2.0332923, 5.1675696]
Epoch : 17 [1.9826264, 4.913434]
Epoch : 18 [1.9329739, 4.669362]
Epoch : 19 [1.8843144, 4.434955]
Epoch : 20 [1.8366281, 4.2098308]
Epoch : 21 [1.7898955, 3.9936216]
Epoch : 22 [1.7440976, 3.785974]
Epoch : 23 [1.6992157, 3.5865495]
Epoch : 24 [1.6552314, 3.3950224]
Epoch : 25 [1.6121267, 3.2110791]
Epoch : 26 [1.5698842, 3.0344205]
Epoch : 27 [1.5284865, 2.8647575]
Epoch : 28 [1.4879167, 2.7018127]
Epoch : 29 [1.4481584, 2.545321]
Epoch : 30 [1.4091952, 2.3