# TensorFlow: Static Graphs

PyTorch autograd looks a lot like TensorFlow: in both frameworks we define a computational graph, and use automatic differentiation to compute gradients. The biggest difference between the two is that TensorFlow ='s computational graphs are *static* and PyTorch uses *dynamic* computational graphs

In TensorFlow, we define the computational graph once then execute the same graph over and over again possibly feeding different input data to the graph. In PyTorch, each forward pass defines a new computational graph. 

Static graphs are nice because you can optimize the graph up frontl for example a framework might decide to fuse some graph operations for efficience, or to come up with a strategy for distributing the graph across many GPUs or many machines. If you are reusing the same graph over and over the this potentially costly up front optimization can be amortized as the same grapg is rerun over and over. 

Ine aspect where static and dynamic graohs differ is control flow. For some models we may wish to perform different computation for each data point; for example a recurrent network might be unrolled for different numbers of time steps for each data point; this unrolling can be implemented as a loop. With a static graph the loop construct needs to be a part of the graph; for this reason TensorFlow provides operations such as tf.scan for embedding loops into the graph. With dynamic graphs the situation is simpler: since we build graphs on the fly for each example, we can use normal imperative flow control to perform computation that differs dor each input. 

To contrast with the PyTorch autograd example committed previously, here we use TensorFlow to fit a simple two-layer net:

In [9]:
import tensorflow as tf
import numpy as np

#First we setup the computational graph

batch_size = 64
input_dimension = 1000
hidden_dimension = 100
output_dimension = 10

# Create placeholders for the input and target data; 
# these will be filled with real data when we execute the graph
x = tf.placeholder(tf.float32, shape=(None, input_dimension))
y = tf.placeholder(tf.float32, shape=(None, output_dimension))

#Create variables for the weights and initialize them with random data 
# A tf.Variable persists its value across executions of the graph
weight1 = tf.Variable(tf.random_normal((input_dimension, hidden_dimension)))
weight2 = tf.Variable(tf.random_normal((hidden_dimension, output_dimension)))

# Forward pass: set up the computational graph that will execute numeric operations
h = tf.matmul(x, weight1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, weight2)

#Loss computation using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

#Compute gradient of the loss with respect to weight1 and weight2
grad_weight1, grad_weight2 = tf.gradients(loss, [weight1, weight2])

#Update the weights using gradient descent. Tp actually update the weights we need to
# evaluate new_weight1 and new_weight2 when executing the graph. Note that in TenorFlow
#the act of updating the value of the weights is part of the computational graph;
# in PyTorch this happens outside of the computational graph 
learning_rate = 1e-6
new_weight1 = weight1.assign(weight1 - learning_rate * grad_weight1)
new_weight2 = weight2.assign(weight2 - learning_rate * grad_weight2)

# Now we have built our computational graph, so we enter a TensorFlow session 
# to actually execute the graph
with tf.Session() as sess:
    #Run the graph once to initialize the Variables weight1 and weight2
    sess.run(tf.global_variables_initializer())
    #Create numpy arrays holding the actual data for the inputs x and targets y
    x_value = np.random.randn(batch_size, input_dimension)
    y_value = np.random.randn(batch_size, output_dimension)
    for _ in range(500):
        #Execute the graph many times. Each time it executes we want to bind 
        #x_value to x and y_value to y, specified with the feed_dict argument.
        #Each time we execute the graph we want to compute the values for loss
        # new_weight1 and new_weight2; the values of these Tensors are returned as numpy arrays
        loss_value, _, _ = sess.run([loss, new_weight1, new_weight2], 
                                   feed_dict = {x: x_value, y: y_value})
        print(loss_value)


38111696.0
36732130.0
37021190.0
32755602.0
25652180.0
21555562.0
15462438.0
9580216.0
5501326.0
3127626.8
1877278.2
1225871.5
874920.9
671038.8
541408.7
451354.72
384103.06
331183.03
288067.66
252198.92
221976.14
196160.86
174002.75
154844.19
138198.34
123665.11
110922.22
99707.19
89810.64
81056.945
73318.49
66431.35
60287.72
54796.273
49876.383
45459.406
41489.555
37912.203
34683.47
31767.062
29125.812
26731.418
24558.738
22587.277
20792.37
19156.37
17663.379
16299.621
15052.058
13909.186
12861.594
11900.797
11018.814
10209.779
9465.892
8780.506
8149.1284
7566.9746
7029.796
6533.708
6075.248
5651.412
5259.451
4896.5684
4560.5186
4249.5205
3961.3206
3693.97
3445.8787
3215.5454
3001.5747
2802.7188
2618.0635
2446.0996
2286.1157
2137.1987
1998.5311
1869.2654
1748.8064
1636.5515
1531.8401
1434.1288
1342.969
1257.8574
1178.3678
1104.1436
1034.8031
970.09283
909.5482
852.91125
799.932
750.41516
704.0683
660.6957
620.11163
582.10815
546.51276
513.1854
481.95038
452.69458
425.287
399.71954
37