[Tensorflow](https://www.tensorflow.org) is an open source library for numerical computation.
Tensorflow implements a symbolic math library that is used to build a computation graph.
This graph is a representation of the computations we wish to perform.
When we write a graph with TensorFlow it is like writing an equation in paper, there are no actual computations occurring.
TensorFlow uses the graph to automatically compute the derivatives required for stochastic gradient descent.
When it is time to actually perform the computations, TensorFlow uses algorithms in C and C++ for it.




# Install



Tensorflow can be installed with `pip`.
After activating your environment in the terminal, run:



In [1]:
pip install tensorflow

Now `numpy` is available for Python and can be imported with:



In [1]:
import tensorflow

All of the Tensorflow functionality can now be accessed via the object `tensorflow`.

You will often see the following command being used:



In [1]:
import tensorflow as tf

Which makes the `tensorflow` functionality available via `tf`.
This is often used because it requires less typing to use numpy.




# Building the Graph



First import TensorFlow:



In [1]:
import tensorflow as tf

A default graph which is empty is automatically created when we first import TensorFlow. We can access this default graph via:



In [1]:
graph = tf.get_default_graph()

We can see what operations are attached to it (none in this case) with:



In [1]:
graph.get_operations()

Remember that when we add operations to this graph (think about the nodes and connections in a neural network graph) nothing is actually being computed, we are just creating the structure of the function.

The simplest operation we can add is a constant:



In [1]:
tf.constant(3.1415)             # creates a constant in the default graph
graph.get_operations()

Notice that by calling `tf.constant`, TensorFlow automatically adds a constant to the default graph, which we can see was added with the `get_operations` method.
We did not need to explicitly tell TensorFlow which graph to alter.
We can specify which graph to alter, but usually the default graph will suffice.

Each operation in TensorFlow takes an input and generates an output.
Think about the signal $s_k$ in the neural network graph. It takes as input the features $x_1,\dots,x_n$, and outputs the value of the signal $s_k$.
A constant has no inputs, but only an output.
We can see those values via:



In [1]:
constant = graph.get_operations()[0]
print(f'Constant: number of inputs = {len(constant.inputs)}')
print(f'Constant: number of outputs = {len(constant.outputs)}')

The output has a special type:



In [1]:
print(constant.outputs)

The type is `tf.Tensor`, which can represent data in a scalar, vector, matrix or a multidimensional array.
The `shape` is `()` which implies this tensor represents a constant.

All tensors are outputs of some operation in the graph.
In this case, the tensor above is the output of a "constant operation".

Let's add in another constant:



In [1]:
tf.constant(2.5)
another_constant = graph.get_operations()[1]

Now there are two operations in the graph.
Notice each has a name, "Const" and "Const<sub>1</sub>".
Also observe the types are `tf.Operation`.

Let's create a new operation that will sum up the two constants.
First, we need to give names to the outputs of the two constants, so that the addition knows what to add:



In [1]:
first_constant_tensor = constant.outputs[0]
second_constant_tensor = another_constant.outputs[0]
print(first_constant_tensor, second_constant_tensor)

The naming of the tensors follows `operation_name:output_number`.
So "Const:0" means the 0th output (first output) coming out of the operation "Const".

We can combine these outputs:



In [1]:
addition_tensor = tf.add(first_constant_tensor, second_constant_tensor)
graph.get_operations()

Important, when we create a new operation and assign it to a variable, like `sum_tensor` above, the variable gets the output of the operation, not the operation itself. That is, `sum_tensor` is already the output of `tf.add`.

We can verify that the addition operation takes two inputs and generates one output:



In [1]:
addition_operation = graph.get_operations()[2]
print(f'Inputs: {len(addition_operation.inputs)}\n'
      f'Outputs: {len(addition_operation.outputs)}')


# Running the Computations



Let's now run this computation.
To do so, we need to create a `Session` object.



In [1]:
session = tf.Session()          # uses the default graph

The Session object stores a reference to the graph we just constructed, determines how much memory it will need and sets up some other configurations.

We can see that the default graph is indeed being used via:



In [1]:
session.graph == graph          # true, same graph
session.graph is graph          # they are the same object!

There is a method `Session.run` which sends the graph to the computation engine to be executed.
The method takes as input the tensor you want to compute.
If we want to compute the tensor of the addition, we would run:



In [1]:
session.run(addition_tensor)

You can also pass a list of tensors to compute.
This is more efficient, because each time we call the `Session.run` method all computations are performed again.



In [1]:
session.run([first_constant_tensor, second_constant_tensor, addition_tensor])

After we are done with the computations, we can close the session to free resources:



In [1]:
session.close()


# Placeholders



The graph we built uses the same two constant values as inputs to the addition operation.
Let's modify it so that it can take variable inputs.
In tensorflow variables are created with `tf.placeholder`:



In [1]:
x = tf.placeholder(tf.float32)
print(x)

The command above adds a placeholder to the default graph.
It takes as input the type of the variable, in this case it is a 32 bytes float.
Other possible types are: `tf.int32` and `tf.bool`.
A complete list is available [here](https://www.tensorflow.org/api_docs/python/tf/DType).

Let's add a second placeholder and create a new addition operation:



In [1]:
y = tf.placeholder(tf.float32)
flexible_addition = tf.add(x, y)
print(graph.get_operations())

We can now open a `Session` and `Session.run` the `flexible_addition` tensor to get its value.
However, now we need to pass an additional input, the actual values we want for `x` and `y` at the time of computation:



In [1]:
session = tf.Session()
session.run(flexible_addition,
            feed_dict={x: 10.0, y:25.0})

We can also give multiple inputs to be evaluated:



In [1]:
session.run(flexible_addition, feed_dict={
    x: [1.0, 2.0, -10.0],
    y: [-1.0, 3.0, 20.0]
})
session.close()

We can fetch the value of a tensor multiple times, and each time we can give a new value to the placeholder.
If we give it a list of values, then it will use that list as an input.

Notice that the tensor we created for addition uses the `tf.add` method, not the Python `+` method.
In Python, using `+` to "add" two lists actually contatenate them:



In [1]:
print([0, 1] + [2, 3])

The `tf.add` implements vector addition (actually tensor addition).

It is possible to add multiple operations at once:



In [1]:
distance_from_origin = tf.sqrt(tf.add(tf.pow(x, 2), tf.pow(y, 2)))
graph.get_operations()

We can get the value of this tensor for multiple inputs via:



In [1]:
session = tf.Session()
session.run(distance_from_origin, feed_dict={
    x: [0.0, 5.0, 10.0],
    y: [-1.0 , 3.0, 0.0]
})
session.close()

This operation of opening a `Session` performing a computation and then closing it is recurrent in programming in general.
Python has a special syntax to reduce the number of lines it takes to write the above:



In [1]:
with tf.Session() as session:
    session.run(distance_from_origin, feed_dict={
      x: [0.0, 5.0, 10.0],
      y: [-1.0 , 3.0, 0.0]
    })

The `with` takes care of opening the `Session` and then closing it when the block of indented code finished running.

We can clear the default graph with:



In [1]:
tf.reset_default_graph()
graph = tf.get_default_graph()
graph.get_operations()


# Variables



Variables are just like placeholders in the sense that they are filled with different values. Unlike placeholders, however, the value of variables are persistent across calls to `session.run`.
TensorFlow uses `tf.Variable` to hold the value of parameters over which we optimize the value of some loss function.

The `tf.Variable` constructor takes one input as argument. This input defines the type of the variable, like an integer, floar or even a list.
Let's create a new variable:



In [1]:
print(graph.get_operations())
x = tf.Variable(33)             # variable type is integer
print(graph.get_operations())

Notice that 4 new operations were added to the graph. These operations provide ways to assign value to the variable and to read the value from it.

If we try to run a session to get the value of `x` we will get an error:



In [1]:
with tf.Session() as session:
    session.run(x)

The error says we are "Attempting to use uninitialized value Variable<sub>1</sub>:0".
The value `33` is used to determine the type of the variable, but is not automatically set as the default value of the variable.
Thus, the variable does not have a value when we try `Session.run` on it.
To assign the number `33` as the default value of `x` we need to initialize it:



In [1]:
with tf.Session() as session:
    print(session.run(x.initializer), session.run(x))

The first call to the initializer sets the default value of the Variable.
All variables have an `initialize` method to initialize its default value.
Now, because `x` is a Variable, when we call `session.run` again its value is still `33`.
This contrasts to placeholders.
A placeholder needs its value to be set every time we call `session.run`.
We will use placeholders as the input of the neural network.

There is a helper function that initializes all of the variables to their default values at once:



In [1]:
initialize_all = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(initialize_all)
    session.run(x)

The method `tf.global_variables_initializer` initializes all of the variables in the graph.




# Computing Gradients



One of the main reasons for using TensorFlow is the ability to automatically compute the gradient of a loss function defined on top of a graph.

To do so, we use the method `tf.gradients`.
This method takes two arguments:

1.  `ys`: a tensor or a list of tensors we would like to derivate
2.  `xs`: a tensor or a list of tensors to derivate with respect to

The `tf.gradients` method follows the graph starting at `ys` to compute the derivatives.



In [1]:
tf.reset_default_graph()
graph = tf.get_default_graph()
x = tf.Variable(0.0)            # float variable
y = tf.pow(x, 2)
gradient = tf.gradients(y, x)
initializer = tf.global_variables_initializer()

Let's execute the graph to get the actual value:



In [1]:
with tf.Session() as session:
    session.run(initializer)
    session.run(gradient, feed_dict={x:10.0})

Returns the derivative of $x^2$ with respect to $x$ evaluated at $x=10$.




# Optimization



We will now implement a linear regression using stochastic gradient descent with TensorFlow.

Let's generate some fake data:



In [1]:
import numpy as np
noise = np.random.normal(size=(1000, 1))
x = np.random.normal(size=(1000, 10))
beta = np.random.uniform(size=(10, 1))
y = x@beta + noise

Let's divide the data into training, validation and test sets:



In [1]:
import collections
Data = collections.namedtuple('Data', ['x', 'y'])
train = Data(x[:800], y[:800])
validate = Data(x[800:900], y[800:900])
test = Data(x[900:], y[900:])

We will use the training data to estimate the parameters, and the validate data to estimate the out of sample loss as we improve the parameter estimates.
Now, let's build the linear model graph:



In [1]:
tf.reset_default_graph()
graph = tf.get_default_graph()
# create a placeholder for the input to the linear function
# linear function: y = x*b + constant*b0
x_input = tf.placeholder(tf.float32, [None, 10])
constant = tf.constant(1.0)     # add a constant to the linear model
# create a variable placeholder for the betas
b = tf.Variable(tf.ones((10, 1)))
b0 = tf.Variable(1.0)        # we created a b0 for the constant
# compute the prediction for y
y_pred = tf.add(tf.matmul(x_input, b), tf.multiply(constant, b0))

The definition of the placeholder `x_input` takes two arguments: the data type (float) and the shape.
The shape is `None` lines by 10 columns, and the `None` tells TensorFlow that the number of lines can be anything.

If we pass in a vector for `x_input`, then we can obtain the value of the tensor `y_pred`:



In [1]:
initializer = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(initializer)
    session.run(y_pred, feed_dict={x_input: np.ones((1, 10))})

If multiples lines are passed to `x_input`, then we get an array of predictions for $y$:



In [1]:
initializer = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(initializer)
    session.run(y_pred, feed_dict={x_input: np.ones((20, 10))})

Now, let's add the loss function to the graph:



In [1]:
y_input = tf.placeholder(tf.float32, [None, 1])
loss = tf.reduce_mean(tf.pow(tf.subtract(y_input, y_pred), 2))

We can now compute the mean squared error on the training set:



In [1]:
with tf.Session() as session:
    session.run(initializer)
    session.run(loss, feed_dict={
        x_input: train.x,
        y_input: train.y
    })

To obtain the value of the parameters (betas) we will use stochastic gradient descent.
TensorFlow has us covered with the operation `tf.train.GradientDescentOptimizer`.
We can use it to directly compute one step of the gradient descent.
The gradient descent method takes as input the learning rate, and provides a method called `minimize`.
This `minimize` method takes as input a tensor representing the loss function, and it then takes one step on the gradient descent algorithm using the gradient values of the loss function.



In [1]:
train_one_step = tf.train.GradientDescentOptimizer(learning_rate=0.005).minimize(loss)

Each time we fetch the operation above, it takes a step following the gradient descent evaluated on whatever data was passed as input to the model.
Because we do not have a lot of data, we can run the gradient descent using the entire training set.



In [1]:
epochs = 5000
beta_hat, b0_hat, MSE = None, None, None
with tf.Session() as session:
    session.run(initializer)
    for i in range(epochs):
        [_, MSE, b0_hat, beta_hat] = session.run(
            [train_one_step, loss, b0, b], feed_dict={
                x_input: train.x,
                y_input: train.y
            })
        if i % 500 == 0:
            MSE_validate = session.run(loss, feed_dict={
                x_input: validate.x,
                y_input: validate.y
            })
            print(f'MSE at step {i} = {MSE_validate}')

We can compare the betas estimated with only part of the sample to the true betas:



In [1]:
print(f'{"True Value":^20}|{"Estimated Value":^20}')
for true_value, estimated_value in zip(beta[:,0], beta_hat[:,0]):
      print(f'{true_value:^20.4f}|{estimated_value:^20.4f}')

We can evaluate how well this model does on the test set.



In [1]:
y_hat = test.x@beta_hat + b0_hat
MSE_test_set = np.mean((test.y - y_hat)**2)
print(f'MSE on test set = {MSE_test_set}')

If we were to implement the stochastic gradient descent, then we would feed a single random data point and then update the weights with the gradient descent on the loss function:



In [1]:
# stochastic gradient descent
epochs = 5000
beta_hat, b0_hat, MSE = None, None, None
with tf.Session() as session:
    session.run(initializer)
    for i in range(epochs):
        # feed a few data points to compute the gradient
        # of the loss and update the weights
        for j in range(len(train.x)/50):
            indices = np.random.randint(0, len(test.x), size=50)
            _ = session.run(train_one_step, feed_dict={
                x_input: train.x[indices, :].reshape(1, 10),
                y_input: train.y[indices, :].reshape(1, 1)
            })
        if i % 500 == 0:
            MSE_validate = session.run(loss, feed_dict={
                x_input: validate.x,
                y_input: validate.y
            })
            print(f'MSE at step {i} = {MSE_validate}', end='\r')
    [beta_hat, b0_hat] = session.run([b, b0])


# Neural Network



Let's now use the same data to estimate a simple neural network.
First, reset the default graph:



In [1]:
tf.reset_default_graph()
graph = tf.get_default_graph()

Then, create the 1st layer of the neural network:



In [1]:
# create a placeholder for the inputs
x_input = tf.placeholder(tf.float32, [None, 10])
# create weights and bias for the first layer
W1 = tf.Variable(tf.random_normal([10, 5]))
B1 = tf.Variable(1.0)
# create first signals and apply the non-linear transformation
s1 = tf.nn.relu(tf.add(tf.matmul(x_input, W1), B1))

Let's add a final layer:



In [1]:
# create weights and bias for the last layer
W2 = tf.Variable(tf.random_normal([5, 1]))
B2 = tf.Variable(1.0)
# generate output of the model
y_prediction = tf.add(tf.matmul(s1, W2), B2)

Create the loss function:



In [1]:
y_input = tf.placeholder(tf.float32, [None, 1])
loss = tf.reduce_mean(tf.squared_difference(y_input, y_prediction))

Create the gradient descent optimizer:



In [1]:
train_one_step = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

Minimize the loss function to estimate the parameters:



In [1]:
epochs = 20000
initializer = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(initializer)
    print(f'\n\nSTEP | MSE')
    for i in range(epochs):
        _ = session.run(train_one_step, feed_dict={
            x_input: train.x,
            y_input: train.y
        })
        if i % 500 == 0:
            MSE = session.run(loss, feed_dict={
                x_input: validate.x,
                y_input: validate.y
            })
            print(f'{i:^5}|{MSE:^4}')
    MSE = session.run(loss, feed_dict={
        x_input: test.x,
        y_input: test.y
    })
    print(f'TEST |{MSE:^4}')