# Computation Graphs and TensorFlow

This is an introduction into the declarative programming paradigm used for computing by building a graph. The tutorial uses TensorFlow as an implementation.

## Imperative paradigm or just pure python

Before we go to the actual graphs, let's see what are non-graph computation.

Calculations in Python are straight forward, for example as done below. To give it a name, this is an imperative paradigm, it's what most programs do. It means "executing one line of code changes a value".

In [2]:
const_5 = 5
const_3 = 3
sum_5_3 = const_5 + const_3

const_7 = 7
const_2 = 2
sum_7_2 = const_7 + const_2

prod_sums = sum_5_3 * sum_7_2

We get an object out of the calculations:

In [3]:
prod_sums

72

Unfortunately, this was not really efficient and we cannot deduce any more information form the output about where the number comes from 

*(equivalent to solving a simple physics exercise algebraic versus directly inserting the numbers into every variable)*

## Declarative paradigm and Lazy evaluation

So it would be better, to "build" the calculation in the first place. Therefore, we need so-called "lazy evaluation", an object that first gets composed and run afterwards. This is a declarative programing style: we don't tell exactly what to do but rather what we wan't.



## Optional (for coders): Lazy evaluation in python

Let's do a very simple example of lazy evaluation (not really declarative actually) in python.

In [None]:
# Two equivalent ways of writing a function in python
def func():
    return 42
func = lambda: 42

In [None]:
const_5 = lambda: 5
const_3 = lambda: 3
sum_5_3 = lambda: const_5() + const_3()

const_7 = lambda: 7
const_2 = lambda: 2
sum_7_2 = lambda: const_7() + const_2()

prod_sums = lambda: sum_5_3() * sum_7_2()

And now we did not yet evaluate anything.

In [None]:
prod_sums

To evaluate this object, we simply call it:

In [None]:
prod_sums()

The advantage over the previous approach: we _could_ use the information stored in prod_sums to improve the calculation _before_ we run it.

In [None]:
import inspect
lines = inspect.getsource(prod_sums)
print(lines)

## Building a Graph

Let's now try to use the declarative paradigm and build a graph. The basic idea is that we don't tell what to _execute_ but just what we _want_. And in our case TensorFlow handles the rest

In [4]:
import tensorflow as tf

In [5]:
const_5 = tf.constant(5.)
const_3 = tf.constant(3.)
sum_5_3 = tf.add(const_5, const_3)

# or even simpler
const_7 = tf.constant(7.)
const_2 = tf.constant(2.)
sum_7_2 = tf.add(const_7, const_2)

prod_sums = tf.multiply(sum_5_3, sum_7_2)

In [6]:
prod_sums

<tf.Tensor 'Mul:0' shape=() dtype=float32>

Before we hat a function, that was our lazy evaluatable object, now it's a Tensor. Names don't matter here. This looks like the following: ![graph structure visualized](images/graph_structure_addition.png)

We **just** created the graph now, the instructions on what we **want** but we did not tell that we want the result now. The object we have is an instruction, not a number.

In [7]:
sess = tf.InteractiveSession()

In [8]:
prod_sums_number = sess.run(prod_sums)  # this command actually executes the instruction

In [10]:
prod_sums_number  # this is now a number

72.0

### Optional (for coders): did we really build a graph?
Yes! Nothing else. Let's explore the graph by walking through it.

In [None]:
prod_sums  # output from the operation...

In [None]:
prod_sums.op  # multiplies the inputs:

In [None]:
prod_sums.op.inputs[:]  # with for example input 0 from the op..

In [None]:
prod_sums.op.inputs[0].op  # the first add with inputs

In [None]:
prod_sums.op.inputs[0].op.inputs[:]

We have the whole definition at hand! It is a simple matter of implementations to, for example, check if a value depends on another (by recursively searching its inputs if it is there). Since TensorFlow also supports control flow operations, the whole process of building the graph is like building an AST (Abstract Syntax Tree) and therefore similar to writing compiler code.

## Automatic differentiation
A very useful thing we can get out is the gradient. Using the chain rule (but that's an unimportant implementation detail), TensorFlow is able to compute the derivative of **any** node with respect to **any** other node. 

Let's look at the example we had before
$$
prod\_sums(const\_5) = (const\_5 + const\_3) \cdot (const\_7 + const\_2)
$$

if we take the derivative with respect to $const\_5$

$$
\frac{\partial prod\_sums}{\partial const\_5} = const\_7 + const\_2
$$



In [None]:
derivate_sum_by_const5 = tf.gradients(prod_sums, const_5)

In [None]:
derivate_sum_by_const5  # this is again a Tensor, an instruction how to do computation

In [None]:
sess.run(derivate_sum_by_const5)

Sidenote: this is taking the derivative of prod_sums with respect to the _node_ const_5 and _then_ evaluates this by inserting 5 into the node const_5.

## Using variables

The above is not yet very useful, since we cannot change the values of a node. What we want is a more general structure that allows to define e.g. a model that can be changed with parameters.

In [None]:
const_2 = tf.constant(2.)
var_1 = tf.Variable(name="variable1", initial_value=3., use_resource=True)
sess.run(var_1.initializer)  # just a thing needed to do, not important

In [None]:
# now create an operation as before
power_const_var = tf.pow(var_1, const_2)

In [None]:
sess.run(power_const_var)

In [None]:
# now let's change the value of the variable
var_1.load(4.)

In [None]:
# we can now either rerun the cell above or create a new cell here and enter the same command again.
# For illustrative purpose the second is done here
sess.run(power_const_var)

In [None]:
# and we can also take the derivative of course with respect to the variable
deriv_power_var = tf.gradients(power_const_var, var_1)

In [None]:
sess.run(deriv_power_var)

## Fitting a line
Let's try to do a simple straight line fit.

In [None]:
# generating some data
n_events = 300
true_slope = np.random.uniform(low=0.3, high=3)  # generate randomly a slope 
x_data = np.random.uniform(low=-10, high=10, size=n_events)  # generate points between -100 and 100
y_clean = true_slope * x_data           # this is the function y = slope * x
y_data = y_clean + np.random.normal(loc=0, scale=1, size=n_events)  # just adding some random noise

In [None]:
# to visualize the data
plt.plot(x_data, y_data, 'x')

TODO explanation with Ansatz

In [None]:
# create the variable to be changed
slope = tf.Variable(name="slope", initial_value=1.)
sess.run(tf.global_variables_initializer())  # ignore line basically

# x_data_tf = tf.convert_to_tensor(x_data)
y = slope * x_data

# create a loss
squared_dist_loss = tf.reduce_sum(tf.square(y - y_data))

# since it can be useful, let's also create the instructions on how to compute the gradient
grad_slope = tf.gradients(squared_dist_loss, slope)[0]  # taking element 0 since it returns a list, not a scalar

## Playground minimization

Now we got everything: the instruction on how to compute the chi square loss and the parameter (`slope`) that changes our model. Use the two (three) cells below to minimize the loss and find the true slope.

***Exercise***: call the loss cell, then load a different value, see how it changes and repeat until you find the minimum of it. Hint: if you wanna run again, restart the notebook and run everything again. This will give you a new value for slope.

In [None]:
sess.run(squared_dist_loss)

In [None]:
slope.load(1.233)

being smart, we can use the gradient information as well. Since it point into the direction of the steepest **ascent**, using the negative of it points into the direction of the **descent**. This tells us in which direction to change the parameter


In [None]:
sess.run(grad_slope)

In [None]:
# uncomment the below to see the true value
true_slope

### Optional (advanced): automatize the minimization

Why not create a while loop and automatize this? If you feel comfortable enough with the above and python, implement a loop. Hint: a criteria to stop the loop (convergence criteria) could be the absolute value of the gradient being smaller than a certain stopping value. And don't make too big steps, rather too small ones.

In [None]:
# do your minimization here

## Changing our goal

We created a simple loss using the sum of the squared distances. However, we can create way more sophisticated losses by adding more terms to it. For example if we'd know that the slope is close to 1 (or take another number if your true slope is already close to 1), we can add an additional term that penalizes if `slope` is far away.

*(while Deep Learning uses the general expression of building a loss from components, in physics and fitting, this is usually called "adding a constraint to a parameter". There are many names for "the same")*

In [None]:
slope_we_assume = tf.constant(1.)  # any number within the boundaries [0.3, 3]
slope_constr = tf.square(500 * tf.subtract(slope, slope_we_assume))
squared_dist_constr = squared_dist_loss + slope_constr

We minimized by hand before (and it was cumbersome) but there are already pre-built minimizers (and in practice, we gonna **always** use them). So let's try one. There are a few different ones and some converge better for certain problems then others. In general, `Adam` performs overall very well.

In [None]:
optimizer = tf.train.AdamOptimizer(learning_rate=3.)  # here we create an instance of the optimizer, this needs to be done once

In [None]:
minimize_op = optimizer.minimize(squared_dist_constr)  # as before, this is also just an operation!
sess.run(tf.variables_initializer(optimizer.variables()))  # just a necessity, not important

In [None]:
# execute the minimization operation here
for _ in range(1000):
    sess.run(minimize_op)  # this is only one minimization step

In [None]:
# to check the value of the slope, we also need to run it
sess.run(slope)

In [None]:
# uncomment to see the true slope
true_slope