# Up and Running with TensorFlow

TensorFlow is a powerful open source software library for numerical computation, particularry well suited and fine-tuned for large-scale Machine Learning. Its basic principle is simple: you first define in Python a graph of computations to perform, and then TensorFlow takes that graph and runs it efficiently using optimized C++ code.

Most importantly, it is possible to break up the grapth into several chunks and run them in parallel across multiple CPUs or GPUs. TensorFlow also supports distributed computing. 

In [36]:
import tensorflow as tf

def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

In [5]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x * x * y + y + 2

This code does not actually perform any computation. It just creates a computation graph. To evaluate the graph, we need to open TensorFlow _session_ and use it to initialize the variables and evaluate f. 

In [7]:
print(x)
print(y)
print(f)

<tf.Variable 'x:0' shape=() dtype=int32_ref>
<tf.Variable 'y:0' shape=() dtype=int32_ref>
Tensor("add_1:0", shape=(), dtype=int32)


In [8]:
sess = tf.Session()

sess.run(x.initializer)
sess.run(y.initializer)

result = sess.run(f)

In [9]:
result

42

In [10]:
sess.close()

Better way:

In [11]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

In [12]:
result

42

`x.initializer.run()` is equivalent to calling `tf.get_default_session().run(x.initializer)`, and similarly `f.eval()` is equivalent to calling `tf.get_default_session().run(f)`. 

Insteat of manually running the initializer for every single variable, we can use the `global_variables_initializer()` function. 

In [13]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run() # actually initialize all the variables
    result = f.eval()

In [14]:
result

42

Inside Jupyter or within a Python shell we may prefer to create an `InteractiveSession`. The only difference from a regular `Session` is that when an `InteractiveSession` is created it automatically sets itself as the default session, so we don't need a `with` block (but we need to close the session manually when you are done with it).

In [15]:
sess= tf.InteractiveSession()
init.run()
result = f.eval()
print(result)

42


A TensorFlow program is typically split into two parts: the first part (_construction phase_) builds a computation graph, and the second (_execution phase_) part runs it. The construnction phase typically builds a computation grapth representing the ML model and computations required to train it. The execution phase generally runs a loop that evaluates a training step repeatedly. 

## Managing Graphs

Any node we create is automatically added to the default graph.

In [16]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

We can manage multiple independent graphs by creating a new `Graph` and temporarily making in the default graph inside a `with` block. 

In [20]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


## Lifecycle of a Node Value

When we evaluate a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first.

In [21]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


Fitst, the code defines a very simple graph. Then it starts a session and runs the graph to evaluate y: TensorFlow automatically detects that y depends on w, which depends on x, so it first evaluates w, then x, the y, and returns the value of y. Finally, the code runs the graph to evaluate z. Once again, TensorFlow detects that it must first evaluate w and x. Note that it will not reuse the result of previous evaluation of w and x. The precending code evaluates w and x twice. 

If we want to evaluate y and z efficiently, without evaluationg w and x twice as in the previous code, we must ask TensorFlow to evaluate both y and z in just one graph run.

In [22]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


## Linear Regression with TensorFlow

TensorFlow operations can take any number of inputs and produce any number of outputs. Constants and variables take no input. The inputs and outputs are muultidimentional array, called _tensors_.

In [24]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

In [25]:
theta_value

array([[-3.7185181e+01],
       [ 4.3633747e-01],
       [ 9.3952334e-03],
       [-1.0711310e-01],
       [ 6.4479220e-01],
       [-4.0338000e-06],
       [-3.7813708e-03],
       [-4.2348403e-01],
       [-4.3721911e-01]], dtype=float32)

The main benefit of this code versus computing the Normal Equation directly using NumPy is that TensorFlow will automatically run this on your GPU card if you have one. 

## Implementing Gradient Descent

We will use autodff feature feature to let TensorFlow compute the gradient automatically, and we will use a couple of TensorFlow's out-of-the-box optimizer. 

### Manually Computing the Gradients 

* The `random_uniform()` function creates a node in the graph that will generate a tensor containing random values, given its shape and value range, much like NumPy's `rand()` function
* The `assign()` function creates a node that will assign a new value to a variable. 
* The main loop executes the training step over and over again (`n_epochs` times) and every 100 iterations it prints out the current Mean Squared Error. 

In [31]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [35]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch ', epoch, 'MSE = ', mse.eval())
        sess.run(training_op)
        
    best_theta = theta.eval()

Epoch  0 MSE =  7.1134706
Epoch  100 MSE =  0.73504937
Epoch  200 MSE =  0.62731713
Epoch  300 MSE =  0.59963655
Epoch  400 MSE =  0.5802501
Epoch  500 MSE =  0.56601477
Epoch  600 MSE =  0.5555218
Epoch  700 MSE =  0.54776496
Epoch  800 MSE =  0.5420134
Epoch  900 MSE =  0.53773445


### Using autodiff

It can automatically and efficiently compute the gradients. The `gradients()` function takes an op and a list of variables, and it creates a list of ops to compute the gradients of the op with regards to each variable. 

### Using an Optimizer

In [46]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), dtype=tf.float32, name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y - y_pred
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = tf.gradients(mse, [theta])[0]

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
        
    for epoch in range(n_epochs):
        sess.run(training_op)
            
    best_theta = theta.eval()
    print(best_theta)

[[ 2.068558  ]
 [ 0.8296286 ]
 [ 0.11875337]
 [-0.26554456]
 [ 0.3057109 ]
 [-0.00450251]
 [-0.03932662]
 [-0.89986444]
 [-0.87052065]]


## Feeding Data to the Training Algorithm

Let's try to modify the previous code to implement Mini-batch Gradient Descent. For this, we need a way to replace X and y at every iteration with the next mini-batch. The simplest way to do this is to use placeholder nodes. These nodes are special because they don't actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to TensorFlow during training. If we don't specify a value at runtime for a placeholder, we get an exception. 

In [48]:
A = tf.placeholder(tf.float32, shape=(None, 3)) # if we specify None for a dimension, it means "any size"
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

In [50]:
print(B_val_1)

[[6. 7. 8.]]


In [51]:
print(B_val_2)

[[ 9. 10. 11.]
 [12. 13. 14.]]


In [52]:
X = tf.placeholder(tf.float32, shape=(None, n + 1), name='X')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

batch_size = 100
n_batches = int(np.ceil(m / batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)
    X_batch = housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]
    return X_batch, y_batch

In [63]:
%%time
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
    best_theta = theta.eval()

CPU times: user 2min 3s, sys: 8.99 s, total: 2min 12s
Wall time: 1min 50s


In [64]:
print(best_theta)

[[ 2.068558  ]
 [ 0.82962054]
 [ 0.11875187]
 [-0.26552895]
 [ 0.30569792]
 [-0.00450293]
 [-0.03932633]
 [-0.8998828 ]
 [-0.8705383 ]]


## Saving and Restoring Models

TensorFlow makes saving and restoring a model very easy. Just create a `Saver` node at the end of the construction phase; then, in the execution phase, just call its `save()` method whenever you want to save the model. 

In [67]:
import os

os.mkdir('models')

In [72]:
reset_graph()

n_epochs = 1000                                                                       # not shown in the book
learning_rate = 0.01                                                                  # not shown

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            # not shown
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            # not shown
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                      # not shown
error = y_pred - y                                                                    # not shown
mse = tf.reduce_mean(tf.square(error), name="mse")                                    # not shown
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            # not shown
training_op = optimizer.minimize(mse)                                                 # not shown

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, "models/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "models/tmp/my_model_final.ckpt")

Epoch 0 MSE = 9.161543
Epoch 100 MSE = 0.7145006
Epoch 200 MSE = 0.566705
Epoch 300 MSE = 0.5555719
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436362
Epoch 600 MSE = 0.5396294
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.5340678
Epoch 900 MSE = 0.5321474


In [69]:
with tf.Session() as sess:
    saver.restore(sess, 'models/tmp/my_model_final.ckpt')

INFO:tensorflow:Restoring parameters from models/tmp/my_model_final.ckpt


## Visualizing the Graph and Training Curves Using TensorBoard

In [70]:
from datetime import datetime

now = datetime.utcnow().strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)

In [71]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

The first line creates a node in the graph that will evaluate the MSE value and write it to a TensorBoard-compatible binary log string called a summary. The second line creates a `FileWriter` that we will use to write summaries to logfiles in the log directory.

In [73]:
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [74]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [75]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [76]:
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    best_theta = theta.eval()

In [77]:
file_writer.close()