# Mid Level Tensorflow API

This exercise introduces the mid-level concepts and API of tensorflow.
You will learn the following concepts and packages
  - collections
  - tf.layers
  - tf.summary
  - Tensorboard
  - tf.losses
  - regularizers
  - initializers
  - image recognition with convolutional neural networks (on MNIST)
  
Unfortunately the tensorflow API is not really stable and (imho) quite confusing.
There are always multiple ways on how to implement something. I will try to keep it as simple as possible, by introducing one component at a time.

In [3]:
import tensorflow as tf

### Collections

Before we dive into the mid-level API, we first have to introduce another concept, which will become important: **collections**. 

Tensorflow automatically keeps track of important tensors for you. They are stored in so-called **collections** (technically just a Python `list`). The names of default collections are defined in `tf.GraphKeys`.

In [4]:
tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)

[]

In [5]:
tf.Variable([1,2,3], name="MyWeightVector")

<tf.Variable 'MyWeightVector:0' shape=(3,) dtype=int32_ref>

In [6]:
tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)

[<tf.Variable 'MyWeightVector:0' shape=(3,) dtype=int32_ref>]

Many packages we encounter below will automatically append the tensors you create to certain collections.
In this way we don't have to keep track of the tensors ourselves.
Typical examples are:
  - trainable variables, which will be adjusted by the optimizer in `tf.GraphKeys.TRAINABLE_VARIABLES`
  - regularization losses, which are stored in `tf.GraphKeys.REGULARIZATION_LOSSES`
  - summaries used to evaluate the training, which are stored in `tf.GraphKeys.SUMMARIES`

<div class="alert alert-block alert-info">
<h3>Exercise</h3>
Execute the two cells above multiple times. What behaviour do you observe? Why do you have to be careful if you are using an interactive environment like `Jupyter`?
</div>

# MNIST
We use the same dataset as in the low-level exercise

In [7]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/")

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## `tf.layers`

The `tf.layers` package contains pre-defined layers commonly used in neural networks:
  - convolution layers: `tf.layers.conv2d`
  - pooling layers: `tf.layers.max_pooling2d`
  - dense (or fully-connected) layers: `tf.layers.dense`

https://www.tensorflow.org/api_docs/python/tf/layers

Using those layers we can easily build a more complex neural network without having to define the weight variables ourselves (like we did in the low-level exercise).

In [67]:
tf.reset_default_graph()
session = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

image = tf.reshape(x, [-1, 28, 28, 1])
conv = tf.layers.conv2d(image, filters=10, kernel_size=[5, 5], padding="same",
                         activation=tf.nn.relu)
pool = tf.layers.max_pooling2d(conv, pool_size=[3, 3], strides=3)

flat = tf.layers.flatten(pool)
logits = tf.layers.dense(flat, units=10)

In [68]:
y = tf.placeholder(tf.int64, [None])
misclassification_rate = 1.0 - tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), y), tf.float32))
loss_function = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits))
minimize = tf.train.GradientDescentOptimizer(0.5).minimize(loss_function)

In [69]:
tf.global_variables_initializer().run()
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    session.run(minimize, feed_dict={x: batch_xs, y: batch_ys})
#print("Train", session.run(misclassification_rate, feed_dict={x: mnist.train.images, y: mnist.train.labels}))
print("Test ", session.run(misclassification_rate, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

Test  0.07099998


<div class="alert alert-block alert-info">
<h3>Exercise</h3>

1. Change the number of filters and the kernel size of the convolution layers. <br>
2. Add more convolution+pooling pairs before the dense layer.
</div>

### `tf.summary`

During the training you can sample interesting quantities.
Here we define two summary objects:
  - a scalar summary (`tf.summary.scalar`), which records the misclassification rate during the training
  - a histogram summary (`tf.summary.scalar`), which records the distribution of the maximum probability of each sample (a measure on "how confident" our network is in its decision)
  
Every summary you define is automatically added to the `tf.GraphKeys.SUMMARIES` collection of tensorflow.
The convenience function `tf.summary.merge_all()` merges all summaries from the `tf.GraphKeys.SUMMARIES` collection into a single operations, which calculates them all at once.

In [70]:
tf.summary.scalar('summary/misclassification_rate', misclassification_rate)
tf.summary.histogram('summary/max_probability', tf.reduce_max(tf.nn.softmax(logits), axis=1))
tf.get_collection(tf.GraphKeys.SUMMARIES)

[<tf.Tensor 'summary/misclassification_rate:0' shape=() dtype=string>,
 <tf.Tensor 'summary/max_probability:0' shape=() dtype=string>]

In [71]:
summary = tf.summary.merge_all()

Summaries should be written into files, here we use `tf.summary.FileWriter`. Tensorflow provides an external browser-based tool called `tensorboard`, which can read in those files and visualize your summaries. The tool can also be used to monitor a running training. Keep in mind that in real world examples people train for days and even weeks. Hence, simple print statements in a loop won't do.

In [72]:
train_writer = tf.summary.FileWriter('train_log')
train_writer.add_graph(tf.get_default_graph())
test_writer = tf.summary.FileWriter('test_log')

We run our training like before, but every 10 iterations we also run our summary operation on the entire training and test dataset.

In [73]:
tf.global_variables_initializer().run()
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    session.run(minimize, feed_dict={x: batch_xs, y: batch_ys})

    if i % 10 == 0:
        train_summary = session.run(summary, feed_dict={x: mnist.train.images[:1000], y: mnist.train.labels[:1000]})
        train_writer.add_summary(train_summary, i)
        test_summary = session.run(summary, feed_dict={x: mnist.test.images[:1000], y: mnist.test.labels[:1000]})
        test_writer.add_summary(test_summary, i)    

### `Tensorboard`

It is important to be able to efficiently inspect and visualize the learning process of a neural network.
Tensorflow includes a browser-based application called **tensorboard** which allows you to easily visualize the `tf.Summary` objects you sampled during the training.

<div class="alert alert-block alert-info">
<h3>Exercise</h3>
You can now start tensorboard and inspect your training by executing the following command in `bash` <br>
<pre> tensorboard --logdir="." </pre>

Afterwards open your browser at:
http://localhost:6006
</div>

<div class="alert alert-block alert-info">
<h3>Exercise</h3>
In addition to the current summaries: visualize the loss function (scalar) and the distribution of the weights (histogram) in tensorboard.
</div>

## `tf.losses`

The `tf.losses` package contains pre-defined loss-functions commonly used in neural networks.
In contrast to the low-level API, the loss is automatically averaged over the current batch.

Every loss you define is automatically added to the `tf.GraphKeys.LOSSES` collection of tensorflow.
Every regularization loss you define is automatically added to the `tf.GraphKeys.REGULARIZATION_LOSSES` collection of tensorflow.

https://www.tensorflow.org/api_docs/python/tf/losses

In the code below I added some `tf.contrib.layers.l2_regularizer` terms. 
Those regularization terms will be automatically picked up by our loss-function below, because they are added to the correpsonding collection!

In [86]:
tf.reset_default_graph()
session = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

image = tf.reshape(x, [-1, 28, 28, 1])
conv = tf.layers.conv2d(image, filters=10, kernel_size=[5, 5], padding="same",
                         activation=tf.nn.relu,
                         kernel_regularizer=tf.contrib.layers.l2_regularizer(0.1),
                        )
pool = tf.layers.max_pooling2d(conv, pool_size=[3, 3], strides=3)

flat = tf.layers.flatten(pool)
logits = tf.layers.dense(flat, units=10,
                         kernel_regularizer=tf.contrib.layers.l2_regularizer(0.1),
                         )
tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)

[<tf.Tensor 'conv2d/kernel/Regularizer/l2_regularizer:0' shape=() dtype=float32>,
 <tf.Tensor 'dense/kernel/Regularizer/l2_regularizer:0' shape=() dtype=float32>]

Calculating the appropriate loss is now pretty easy. `tf.losses.sparse_softmax_cross_entropy` does exactly what we want.

In [87]:
y = tf.placeholder(tf.int64, [None])
tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
tf.get_collection(tf.GraphKeys.LOSSES)

[<tf.Tensor 'sparse_softmax_cross_entropy_loss/value:0' shape=() dtype=float32>]

The convenience function `tf.losses.get_total_loss` returns the total loss in your graph, including potential regularization terms.

In [88]:
minimize = tf.train.GradientDescentOptimizer(0.5).minimize(tf.losses.get_total_loss())

In [89]:
tf.summary.scalar('summary/regularization_loss', tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES), 0))
tf.summary.scalar('summary/classifier_loss', tf.reduce_sum(tf.get_collection(tf.GraphKeys.LOSSES), 0))
tf.summary.scalar('summary/total_loss', tf.losses.get_total_loss())
summary = tf.summary.merge_all()

train_writer = tf.summary.FileWriter('train_log')
test_writer = tf.summary.FileWriter('test_log')

In [90]:
tf.global_variables_initializer().run()
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    session.run(minimize, feed_dict={x: batch_xs, y: batch_ys})
    if i % 10 == 0:
        train_writer.add_summary(session.run(summary, feed_dict={x: mnist.train.images[:1000], y: mnist.train.labels[:1000]}), i)
        test_writer.add_summary(session.run(summary, feed_dict={x: mnist.test.images[:1000], y: mnist.test.labels[:1000]}), i) 

<div class="alert alert-block alert-info">
<h3>Exercise</h3>
Add an L1 regularization term to the biases using the `bias_regularizer` argument and the `tf.contrib.layers.l1_regularizer` class.
</div>

## `tf.initializers`

The initial values of the parameters of your model can be important for a successful training.
A careful initalization ensures that the order of magnitude of: the activation during the forward pass and the gradients during the backward pass, does not change.

Many different initalization schemes were proposed.
They differ mostly in the **distribution** they use to sample the weights (usually either a normal distribution or a uniform distribution) and their **variance**.

Two famous initalization schemes are:
  - Glorot/Xavier initialization (default): $\sigma =  \sqrt{\frac{2}{N_\mathrm{in} + N_\mathrm{out}}}$
  - He initialization: $\sigma =  \sqrt{\frac{2}{N_\mathrm{in}}}$
  
We can easily change the used initialization, by passing an appropriate `Initializer` object as the `kernel_initializer` argument of a layer. In the next example we use the so-called **He Initialization**

In [91]:
tf.reset_default_graph()
session = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

image = tf.reshape(x, [-1, 28, 28, 1])
conv = tf.layers.conv2d(image, filters=10, kernel_size=[5, 5], padding="same",
                         activation=tf.nn.relu,
                         kernel_initializer=tf.initializers.variance_scaling(scale=2, mode='fan_in', distribution='normal'),
                        )
pool = tf.layers.max_pooling2d(conv, pool_size=[3, 3], strides=3)

flat = tf.layers.flatten(pool)
logits = tf.layers.dense(flat, units=10,
                         kernel_initializer=tf.initializers.variance_scaling(scale=2, mode='fan_in', distribution='normal'),
                         )

In [92]:
y = tf.placeholder(tf.int64, [None])
tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
minimize = tf.train.GradientDescentOptimizer(0.5).minimize(tf.losses.get_total_loss())

In [93]:
tf.summary.scalar('summary/total_loss', tf.losses.get_total_loss())
summary = tf.summary.merge_all()

train_writer = tf.summary.FileWriter('train_log')
test_writer = tf.summary.FileWriter('test_log')

In [94]:
tf.global_variables_initializer().run()
for i in range(100):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    session.run(minimize, feed_dict={x: batch_xs, y: batch_ys})
    if i % 10 == 0:
        train_writer.add_summary(session.run(summary, feed_dict={x: mnist.train.images[:1000], y: mnist.train.labels[:1000]}), i)
        test_writer.add_summary(session.run(summary, feed_dict={x: mnist.test.images[:1000], y: mnist.test.labels[:1000]}), i)