# Chapter 9 - Up and Running with TensorFlow

## Basics

TensorFlow computations are defined in terms of graphs. The following does not actually evaluate the expression assigned to `f`, rather it defines a graph for performing the computation:

In [1]:
import tensorflow as tf

x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

To actually evaluate that expression, we need to create a session:

In [2]:
sess = tf.Session()

The variables `x` and `y` also do not yet exist. What we have created are graph nodes that define a default value for those variables. Before we can run the graph, we need to initialize them:

In [3]:
sess.run(x.initializer)
sess.run(y.initializer)

Finally, we can evaluate the graph:

In [4]:
result = sess.run(f)
sess.close()
result

42

### Shortcuts

The TensorFlow API allows us to define a *default session*, so that we do not need to qualify all API calls with `sess.`, or similar. This also allows us to call `run()` directly on a graph node:

In [5]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

result

42

Finally, we can save some ourselves some effort in initializing variables, by automatically initializing global variables:

In [6]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    
result

42

### Graph management

Newly created modes are automatically added to the default graph:

In [7]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

However, multiple graphs are supported, and we can temporarily change the default graph within a block:

In [8]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
x2.graph is graph

True

In [9]:
x2.graph is tf.get_default_graph()

False

### Node evaluation

When a node is evaluated directly, TensorFlow will re-evaluate all of the nodes that it depends on. In the following example, the evaluation of `y` and `z` will cause the `x` to be evaluated twice:

In [10]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


We can avoid redundant calculations by calling `sess.run()` with an array of nodes that we would like to evaluate:

In [11]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


## Linear Regression

For this example, we'll use the California housing dataset, which we can fetch using scikit-learn:

In [12]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
m, n

(20640, 8)

Add a column of 1's for the bias term:

In [13]:
housing_data_w_bias = np.c_[np.ones((m, 1)), housing.data]
housing_data_w_bias.shape

(20640, 9)

Reshape the target values:

In [14]:
target = housing.target.reshape(-1, 1)
target.shape

(20640, 1)

### Normal equation

In [15]:
# Prepare inputs
X = tf.constant(housing_data_w_bias, dtype=tf.float32, name="X")
y = tf.constant(target, dtype=tf.float32, name="y")
XT = tf.transpose(X)

# Define normal equation
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

# Evaluate
with tf.Session() as sess:
    theta_value = theta.eval()
    
theta_value

array([[-3.7465141e+01],
       [ 4.3573415e-01],
       [ 9.3382923e-03],
       [-1.0662201e-01],
       [ 6.4410698e-01],
       [-4.2513184e-06],
       [-3.7732250e-03],
       [-4.2664889e-01],
       [-4.4051403e-01]], dtype=float32)

### Gradient descent

Recall that for gradient descent, we can improve performance of the algorithm by scaling and centering the data:

In [27]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)

scaled_housing_data_w_bias = np.c_[np.ones((m, 1)), scaled_housing_data]
scaled_housing_data_w_bias.shape

(20640, 9)

Now we can fit the model using TensorFlow:

In [42]:
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)
    
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_w_bias, dtype=tf.float32, name="X")
y = tf.constant(target, dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
        
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.5396291
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


In [43]:
best_theta

array([[ 2.0685523 ],
       [ 0.8874027 ],
       [ 0.14401656],
       [-0.34770882],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.6614529 ],
       [-0.6375279 ]], dtype=float32)

### Autodiff

TensorFlow supports auto-differentiation, which means that it is not necessary to find the gradient manually, as we did above. Using this, we redefine part of our graph using the following:

In [44]:
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
        
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.71450037
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.54881126
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962916
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


It is interesting to note that when I tried this with `learning_rate = 0.1`, the two methods yield different loss values.

### Using an optimizer

Built-in `GradientDescentOptimizer`:

In [46]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
optimizer

<tensorflow.python.training.gradient_descent.GradientDescentOptimizer at 0x7f6c8cfb15c0>

In [49]:
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
        
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.54881126
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962916
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


Alternatives exist, such as `MomentumOptimizer`:

In [50]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
        
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.53056407
Epoch 200 MSE = 0.52501124
Epoch 300 MSE = 0.5244107
Epoch 400 MSE = 0.5243329
Epoch 500 MSE = 0.52432257
Epoch 600 MSE = 0.5243212
Epoch 700 MSE = 0.524321
Epoch 800 MSE = 0.524321
Epoch 900 MSE = 0.52432096


### Placeholder nodes

### Saving and restoring models

### TensorBoard

### Name scope and modularity