# Tensorflow tutorial: tensorboard and other best practices

Unfortunately, tensorflow is full of footguns (ways of shooting yourself in the foot). By following some best practices, we can reduce substantially the amount of frustration we have when working with tensorflow. Unlike other deep learning frameworks, tensorflow includes an extremely useful tool, called tensorboard. It allows you to visualize the graph, and the progress of the training.

We will start from our example of logistic regression and add all the other features.

In [1]:
import tensorflow as tf
import numpy as np

In [6]:
# We start with our existing model code

def compute_logits(x):
    """Compute the logits of the model"""
    W = tf.get_variable('W', shape=[784, 10])
    b = tf.get_variable('b', shape=[10])
    
    logits = tf.add(tf.matmul(x, W), b, name='logits')
    return logits

# Note: this function is implemented in tensorflow as
# tf.nn.softmax_cross_entropy_with_logits

# We have included it here for illustration only, please don't use it.
def compute_cross_entropy(logits, y):
    y_pred = tf.nn.softmax(logits, name='y_pred') # the predicted probability for each example.

    # Compute the average cross-entropy across all the examples.
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_pred), axis=[1]))
    return cross_entropy

def compute_accuracy(logits, y):
    prediction = tf.argmax(logits, 1, name='pred_class')
    true_label = tf.argmax(y, 1, name='true_class')
    accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, true_label), tf.float32))
    return accuracy

In [3]:
# Of course, we also need to load the data

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('../data/mnist', one_hot=True)

Extracting ../data/mnist\train-images-idx3-ubyte.gz
Extracting ../data/mnist\train-labels-idx1-ubyte.gz
Extracting ../data/mnist\t10k-images-idx3-ubyte.gz
Extracting ../data/mnist\t10k-labels-idx1-ubyte.gz


## Tensorboard

To display data in tensorboard, we must ask tensorflow to record the data. This data is recorded through summaries. There are two main types of summaries: scalars and histograms. Scalars summarise a single number, often the loss, or the learning rate. Histograms allow us to summaries the distribution of a set of data (usually a tensor, or its gradient). Tensorflow is also able to produce summaries for the input data, such as images or audio.

These must be evaluated during the training step, and written out to a file using `tf.summary.FileWriter`.

To visualize this information, we need to start an external program, called tensorboard. This is installed with tensorflow. From the command line, make sure to activate your tensorflow environment. You may then call
```bash
tensorboard --logdir=/path/to/log/directory
```
to start tensorboard. You can then connect using your browser at http://localhost:6006/.

In our example here, if you place yourself in the directory of this file, you should run
```bash
tensorboard --logdir=logs
```

In [7]:
with tf.Graph().as_default():
    # We build the model here as before
    x = tf.placeholder(tf.float32, [None, 784], name='x')
    y = tf.placeholder(tf.float32, [None, 10], name='y')
    
    logits = compute_logits(x)
    loss = compute_cross_entropy(logits=logits, y=y)
    accuracy = compute_accuracy(logits, y)
    
    opt = tf.train.GradientDescentOptimizer(0.5)
    train_step = opt.minimize(loss)
    
    
    # Let's put the summaries below
    
    # create summary for loss and accuracy
    tf.summary.scalar('loss', loss) 
    tf.summary.scalar('accuracy', accuracy)
    
    # create summary for logits
    tf.summary.histogram('logits', logits)
    
    # create summary for input image
    tf.summary.image('input', tf.reshape(x, [-1, 28, 28, 1]))
    
    summary_op = tf.summary.merge_all()
    
    with tf.Session() as sess:
        summary_writer = tf.summary.FileWriter('logs/example1', sess.graph)
        
        sess.run(tf.global_variables_initializer())
    
        for i in range(100):
            _, ac, summary = sess.run((train_step, accuracy, summary_op),
                                      feed_dict={x: mnist.train.images[:5000,:], y: mnist.train.labels[:5000]})
            
            # write the summary output to file
            summary_writer.add_summary(summary, i)

            if i % 10 == 0:
                print('Step {0}: accuracy is {1}'.format(i + 1, ac))

Step 1: accuracy is 0.06659999489784241
Step 11: accuracy is 0.8277997970581055
Step 21: accuracy is 0.8613997101783752
Step 31: accuracy is 0.8735997676849365
Step 41: accuracy is 0.8823997378349304
Step 51: accuracy is 0.8887996673583984
Step 61: accuracy is 0.893399715423584
Step 71: accuracy is 0.8961997628211975
Step 81: accuracy is 0.8989997506141663
Step 91: accuracy is 0.9015997648239136


## Name scopes

Although our previous example was functional, the graph that was created is very messy. We would like to separate the graphs into distinct components, and modern neural network architectures are often built by stacking similar components on top of each other. Tensorflow's mechanism to build these reusable components is the notion of scopes, which introduce name prefixes. They also group the operation as displayed in the tensorboard graph.

In [7]:
# A very simple example of using scopes

with tf.Graph().as_default():
    with tf.variable_scope('scope_1'):
        a1 = tf.get_variable('a', [10])
    
    with tf.variable_scope('scope_2'):
        a2 = tf.get_variable('a', [10])
    
    print(a1.name)
    print(a2.name)

scope_1/a:0
scope_2/a:0


In [9]:
# We can also use scopes when calling functions
# This is the main way we will use them, to be able to reuse
# different blocks (that we may write as functions) in a single
# model

def create_variable():
    return tf.get_variable('myvar', [10])

with tf.Graph().as_default():
    with tf.variable_scope('scope_1'):
        m1 = create_variable()
    
    with tf.variable_scope('scope_2'):
        m2 = create_variable()
    
    print(m1.name)
    print(m2.name)

scope_1/myvar:0
scope_2/myvar:0


In [12]:
# Let's apply this to our example above!

def compute_logits(x):
    """Compute the logits of the model"""
    W = tf.get_variable('W', shape=[784, 10])
    b = tf.get_variable('b', shape=[10])
    
    logits = tf.add(tf.matmul(x, W), b, name='logits')
    return logits

# Note: this function is implemented in tensorflow as
# tf.nn.softmax_cross_entropy_with_logits

# We have included it here for illustration only, please don't use it.
def compute_cross_entropy(logits, y):
    with tf.name_scope('cross_entropy'):
        y_pred = tf.nn.softmax(logits, name='y_pred') # the predicted probability for each example.
        cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_pred), axis=[1]))
    return cross_entropy

def compute_accuracy(logits, y):
    with tf.name_scope('accuracy'):
        prediction = tf.argmax(logits, 1, name='pred_class')
        true_label = tf.argmax(y, 1, name='true_class')
        accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, true_label), tf.float32))
    return accuracy

with tf.Graph().as_default():
    # We build the model here as before
    x = tf.placeholder(tf.float32, [None, 784], name='x')
    y = tf.placeholder(tf.float32, [None, 10], name='y')
    
    with tf.variable_scope('model'):
        logits = compute_logits(x)
        loss = compute_cross_entropy(logits=logits, y=y)
    accuracy = compute_accuracy(logits, y)
    
    opt = tf.train.GradientDescentOptimizer(0.5)
    train_step = opt.minimize(loss)
    
    
    # Let's put the summaries below
    
    with tf.variable_scope('summaries'):
        # create summary for loss and accuracy
        tf.summary.scalar('loss', loss) 
        tf.summary.scalar('accuracy', accuracy)

        # create summary for logits
        tf.summary.histogram('logits', logits)

        # create summary for input image
        tf.summary.image('input', tf.reshape(x, [-1, 28, 28, 1]))

        summary_op = tf.summary.merge_all()
    
    with tf.Session() as sess:
        summary_writer = tf.summary.FileWriter('logs/example2', sess.graph)
        
        sess.run(tf.global_variables_initializer())
    
        for i in range(100):
            _, ac, summary = sess.run((train_step, accuracy, summary_op),
                                      feed_dict={x: mnist.train.images[:5000,:], y: mnist.train.labels[:5000]})
            
            # write the summary output to file
            summary_writer.add_summary(summary, i)

            if i % 10 == 0:
                print('Step {0}: accuracy is {1}'.format(i + 1, ac))

Step 1: accuracy is 0.13579998910427094
Step 11: accuracy is 0.822199821472168
Step 21: accuracy is 0.855999767780304
Step 31: accuracy is 0.870999813079834
Step 41: accuracy is 0.8807997703552246
Step 51: accuracy is 0.8897998332977295
Step 61: accuracy is 0.8949997425079346
Step 71: accuracy is 0.8981997966766357
Step 81: accuracy is 0.9007998108863831
Step 91: accuracy is 0.9027997851371765


## Automating the training loop: monitored sessions

So far, we have been handling running the training loop ourselves: feeding the new data, incrementing the step counter, etc. We have not implemented any functionality concerning saving and restoring our trained model yet. Fortunately, tensorflow provides a pre-made training loop, called `tf.MonitoredTrainingSession`. Let us use this instead.

In [13]:
# Let's apply this to our example above!

with tf.Graph().as_default():
    # We build the model here as before
    x = tf.placeholder(tf.float32, [None, 784], name='x')
    y = tf.placeholder(tf.float32, [None, 10], name='y')
    
    with tf.variable_scope('model'):
        logits = compute_logits(x)
        loss = compute_cross_entropy(logits=logits, y=y)
    accuracy = compute_accuracy(logits, y)
    
    # We use the global step to keep track of how many
    # steps we have. It integrates with all the other
    # tensorflow logging functionality.
    global_step = tf.train.get_or_create_global_step()
    
    opt = tf.train.GradientDescentOptimizer(0.5)
    
    # This will automatically increment the global step.
    train_step = opt.minimize(loss, global_step)
    
    # Let's put the summaries below
    with tf.variable_scope('summaries'):
        # create summary for loss and accuracy
        tf.summary.scalar('loss', loss) 
        tf.summary.scalar('accuracy', accuracy)

        # create summary for logits
        tf.summary.histogram('logits', logits)

        # create summary for input image
        tf.summary.image('input', tf.reshape(x, [-1, 28, 28, 1]))

        summary_op = tf.summary.merge_all()
    
    # Note: the monitored training session will automatically save summaries and checkpoints for us!
    # It also initializes the variables for us.
    with tf.train.MonitoredTrainingSession(save_summaries_steps=10,
                                           checkpoint_dir='logs/example3',
                                           log_step_count_steps=10) as sess:
        while not sess.should_stop():
            # Could also print more info here as before.
            _, step = sess.run((train_step, global_step),
                               feed_dict={x: mnist.train.images[:5000,:], y: mnist.train.labels[:5000]})
            if step > 100:
                break

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into logs/example3\model.ckpt.
INFO:tensorflow:global_step/sec: 23.8642
INFO:tensorflow:global_step/sec: 50.5714
INFO:tensorflow:global_step/sec: 50.5815
INFO:tensorflow:global_step/sec: 46.6678
INFO:tensorflow:global_step/sec: 46.6618
INFO:tensorflow:global_step/sec: 46.077
INFO:tensorflow:global_step/sec: 45.4713
INFO:tensorflow:global_step/sec: 48.8125
INFO:tensorflow:global_step/sec: 47.7436
INFO:tensorflow:global_step/sec: 46.5452
INFO:tensorflow:Saving checkpoints for 101 into logs/example3\model.ckpt.


## Collections

How does the `tf.train.MonitoredTrainingSession` keep track of all the summaries we have defined? Or how does `tf.train.GradientDescentOptimizer` know which variables to take derivatives for? Tensorflow has a notion of collections for a graph, which is simply a list of operations in the graph. When creating some operations, you may add (or tensorflow may automatically add) them to some collections.

In [18]:
# trainable variables

with tf.Graph().as_default():
    x = tf.get_variable('x', [10])
    y = tf.get_variable('y', [10])
    z = tf.get_variable('z', [10], trainable=False)
    
    print(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES))

[<tf.Variable 'x:0' shape=(10,) dtype=float32_ref>, <tf.Variable 'y:0' shape=(10,) dtype=float32_ref>]


In [19]:
# Losses

with tf.Graph().as_default():
    x = tf.get_variable('x', [10])
    y = tf.get_variable('y', [10])
    
    tf.losses.mean_squared_error(x, y)
    
    print(tf.get_collection(tf.GraphKeys.LOSSES))

[<tf.Tensor 'mean_squared_error/value:0' shape=() dtype=float32>]


There are a large number of predefined collections (or you can create your own), but only a few will have operations automatically added to them. You can always add them yourself to organize your models better.

## Last words

Tensorflow is still in rapid development, and best practices for tensorflow keep changing. The setup in this tutorial is more than adequate for all the problems we will be exploring in this course, but numerous other APIs exist in tensorflow. The currently recommended API for tensorflow is the [`tf.estimator`](https://www.tensorflow.org/get_started/estimator) API for learning combined with the [`tf.contrib.data`](https://www.tensorflow.org/programmers_guide/datasets) API for loading data &mdash; they are designed for more large scale projects than your homeworks but may be useful for some projects.

Even more than usual, coding in tensorflow can be frustrating, and the errors can be opaque. However, it is one of the only tools we have today to create machine learning models that can achieve such high performance in these tasks. As always, google is your friend.

Unlike some other machine learning techniques, deep learning can be extremely data and computation intensive. The homework examples will be simple enough to run on a simple laptop, but any serious project will require serious computational power and a GPU (K80 or equivalent). Fortunately, it is possible to access such resources in an affordable fashion on cloud platforms, such as [AWS](https://aws.amazon.com/ec2/pricing/) although they do require some technical skills to setup. In particular, Github offers a [student pack](https://education.github.com/pack) which includes a fair amount of AWS credit.

Other online solutions exist that manage the IT aspects for you. They tend to be slightly more expensive than simply setting up your own machine on AWS, but are much simpler to use. For example, you may consider: [FloydHub](https://www.floydhub.com/), [Paperspace](https://www.paperspace.com/), [Neptune ML](https://neptune.ml/), [Tensorport](https://tensorport.com/), [Crestle](https://www.crestle.com/).

Finally, deep learning is as much art as it is science. The main way to improve is simply practice.