##### Copyright 2018 The TensorFlow Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Sigmoid Belief Network with TFP

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Sigmoid_Belief_Network_TFP.ipynb"><img height="32px" src="https://colab.research.google.com/img/colab_favicon.ico" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Sigmoid_Belief_Network_TFP.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>
<br>
<br>
<br>

Original content [this Repository](https://github.com/blei-lab/edward/blob/master/examples/sigmoid_belief_network.py), created by [the Blei Lab](http://www.cs.columbia.edu/~blei/)

Ported to [Tensorflow Probability](https://www.tensorflow.org/probability/) by Matthew McAteer ([`@MatthewMcAteer0`](https://twitter.com/MatthewMcAteer0)), with help from Bryan Seybold, Mike Shwe ([`@mikeshwe`](https://twitter.com/mikeshwe)), Josh Dillon, and the rest of the TFP team at  Google ([`tfprobability@tensorflow.org`](mailto:tfprobability@tensorflow.org)).

---

- Dependencies & Prerequisites
- Introduction
  - Data
  - Model
  - Inference
- References

## Dependencies & Prerequisites

<div class="alert alert-success">
    Tensorflow Probability is part of the colab default runtime, <b>so you don't need to install Tensorflow or Tensorflow Probability if you're running this in the colab</b>. 
    <br>
    If you're running this notebook in Jupyter on your own machine (and you have already installed Tensorflow), you can use the following
    <br>
      <ul>
    <li> For the most recent nightly installation: <code>pip3 install -q tfp-nightly</code></li>
    <li> For the most recent stable TFP release: <code>pip3 install -q --upgrade tensorflow-probability</code></li>
    <li> For the most recent stable GPU-connected version of TFP: <code>pip3 install -q --upgrade tensorflow-probability-gpu</code></li>
    <li> For the most recent nightly GPU-connected version of TFP: <code>pip3 install -q tfp-nightly-gpu</code></li>
    </ul>
Again, if you are running this in a Colab, Tensorflow and TFP are already installed
</div>



In [0]:
#@title Imports and Global Variables  { display-mode: "form" }
!pip3 install -q observations
!pip install -q imageio
from __future__ import absolute_import, division, print_function

#@markdown This sets the warning status (default is `ignore`, since this notebook runs correctly)
warning_status = "ignore" #@param ["ignore", "always", "module", "once", "default", "error"]
import warnings
warnings.filterwarnings(warning_status)
with warnings.catch_warnings():
    warnings.filterwarnings(warning_status, category=DeprecationWarning)
    warnings.filterwarnings(warning_status, category=UserWarning)

import six
import sys
import time
import numpy as np
import string
from datetime import datetime
import os
import imageio
# from edward.models import Bernoulli
from observations import caltech101_silhouettes
from imageio import imwrite as imsave
#@markdown This sets the styles of the plotting (default is styled like plots from [FiveThirtyeight.com](https://fivethirtyeight.com/))
matplotlib_style = 'fivethirtyeight' #@param ['fivethirtyeight', 'bmh', 'ggplot', 'seaborn', 'default', 'Solarize_Light2', 'classic', 'dark_background', 'seaborn-colorblind', 'seaborn-notebook']
import matplotlib.pyplot as plt; plt.style.use(matplotlib_style)
import matplotlib.axes as axes;
from matplotlib.patches import Ellipse
%matplotlib inline
import seaborn as sns; sns.set_context('notebook')
from IPython.core.pylabtools import figsize
#@markdown This sets the resolution of the plot outputs (`retina` is the highest resolution)
notebook_screen_res = 'retina' #@param ['retina', 'png', 'jpeg', 'svg', 'pdf']
%config InlineBackend.figure_format = notebook_screen_res

import tensorflow as tf
tfe = tf.contrib.eager

# Eager Execution

#@markdown Check the box below if you want to use [Eager Execution](https://www.tensorflow.org/guide/eager)
#@markdown Eager execution provides An intuitive interface, Easier debugging, and a control flow comparable to Numpy. You can read more about it on the [Google AI Blog](https://ai.googleblog.com/2017/10/eager-execution-imperative-define-by.html)
use_tf_eager = False #@param {type:"boolean"}

# Use try/except so we can easily re-execute the whole notebook.
if use_tf_eager:
    try:
        tf.enable_eager_execution()
    except:
        pass

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors

  
def evaluate(tensors):
    """Evaluates Tensor or EagerTensor to Numpy `ndarray`s.
    Args:
    tensors: Object of `Tensor` or EagerTensor`s; can be `list`, `tuple`,
      `namedtuple` or combinations thereof.

    Returns:
      ndarrays: Object with same structure as `tensors` except with `Tensor` or
        `EagerTensor`s replaced by Numpy `ndarray`s.
    """
    if tf.executing_eagerly():
        return tf.contrib.framework.nest.pack_sequence_as(
            tensors,
            [t.numpy() if tf.contrib.framework.is_tensor(t) else t
             for t in tf.contrib.framework.nest.flatten(tensors)])
    return sess.run(tensors)

class _TFColor(object):
    """Enum of colors used in TF docs."""
    red = '#F15854'
    blue = '#5DA5DA'
    orange = '#FAA43A'
    green = '#60BD68'
    pink = '#F17CB0'
    brown = '#B2912F'
    purple = '#B276B2'
    yellow = '#DECF3F'
    gray = '#4D4D4D'
    def __getitem__(self, i):
        return [
            self.red,
            self.orange,
            self.green,
            self.blue,
            self.pink,
            self.brown,
            self.purple,
            self.yellow,
            self.gray,
        ][i % 9]
TFColor = _TFColor()

def session_options(enable_gpu_ram_resizing=True, enable_xla=True):
    """
    Allowing the notebook to make use of GPUs if they're available.
    
    XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear 
    algebra that optimizes TensorFlow computations.
    """
    config = tf.ConfigProto()
    config.log_device_placement = True
    if enable_gpu_ram_resizing:
        # `allow_growth=True` makes it possible to connect multiple colabs to your
        # GPU. Otherwise the colab malloc's all GPU ram.
        config.gpu_options.allow_growth = True
    if enable_xla:
        # Enable on XLA. https://www.tensorflow.org/performance/xla/.
        config.graph_options.optimizer_options.global_jit_level = (
            tf.OptimizerOptions.ON_1)
    return config


def reset_sess(config=None):
    """
    Convenience function to create the TF graph & session or reset them.
    """
    if config is None:
        config = session_options()
    global sess
    tf.reset_default_graph()
    try:
        sess.close()
    except:
        pass
    sess = tf.InteractiveSession(config=config)



reset_sess()

class Progbar(object):
    def __init__(self, target, width=30, interval=0.01, verbose=1):
        """(Yet another) progress bar.
        Args:
          target: int.
            Total number of steps expected.
          width: int.
            Width of progress bar.
          interval: float.
            Minimum time (in seconds) for progress bar to be displayed
            during updates.
          verbose: int.
            Level of verbosity. 0 suppresses output; 1 is default.
        """
        self.target = target
        self.width = width
        self.interval = interval
        self.verbose = verbose

        self.stored_values = {}
        self.start = time.time()
        self.last_update = 0
        self.total_width = 0
        self.seen_so_far = 0

    def update(self, current, values=None, force=False):
        """Update progress bar, and print to standard output if `force`
        is True, or the last update was completed longer than `interval`
        amount of time ago, or `current` >= `target`.
        The written output is the progress bar and all unique values.
        Args:
          current: int.
            Index of current step.
          values: dict of str to float.
            Dict of name by value-for-last-step. The progress bar
            will display averages for these values.
          force: bool.
            Whether to force visual progress update.
        """
        if values is None:
            values = {}

        for k, v in six.iteritems(values):
            self.stored_values[k] = v

        self.seen_so_far = current

        now = time.time()
        if (not force and
                (now - self.last_update) < self.interval and
                current < self.target):
            return

        self.last_update = now
        if self.verbose == 0:
            return

        prev_total_width = self.total_width
        sys.stdout.write("\b" * prev_total_width)
        sys.stdout.write("\r")

        # Write progress bar to stdout.
        n_digits = len(str(self.target))
        bar = '%%%dd/%%%dd' % (n_digits, n_digits) % (current, self.target)
        bar += ' [{0}%]'.format(str(int(current / self.target * 100)).rjust(3))
        bar += ' '
        prog_width = int(self.width * float(current) / self.target)
        if prog_width > 0:
            try:
                bar += ('█' * prog_width)
            except UnicodeEncodeError:
                bar += ('*' * prog_width)

        bar += (' ' * (self.width - prog_width))
        sys.stdout.write(bar)

        # Write values to stdout.
        if current:
            time_per_unit = (now - self.start) / current
        else:
            time_per_unit = 0

        eta = time_per_unit * (self.target - current)
        info = ''
        if current < self.target:
            info += ' ETA: %ds' % eta
        else:
            info += ' Elapsed: %ds' % (now - self.start)

        for k, v in six.iteritems(self.stored_values):
            info += ' | {0:s}: {1:0.3f}'.format(k, v)

        self.total_width = len(bar) + len(info)
        if prev_total_width > self.total_width:
            info += ((prev_total_width - self.total_width) * " ")

        sys.stdout.write(info)
        sys.stdout.flush()

        if current >= self.target:
            sys.stdout.write("\n")


## Introduction

A Sigmoid belief network [[1]](#scrollTo=2rGFv5Y2RTap) is a type of Belief Network. This is a class of neural network, composed of an acyclic graph of stochastic variables (These have a state of 1 or 0, and the probability of turning on is determined by the weighted input from other units plus a bias). Some of these variables are latent ("hidden"), and some are visible to us. 

If we're using a belief network, we'd lke to solve two specific problems: 
- **Inference problem:** Infer the states of the unobserved variables.
- **Learning problem:** Adjust the interactions between variables to make the network more likely to generate the observed data.

<img src="https://github.com/matthew-mcateer/external_project_images/blob/master/Belief_net.PNG?raw=true" width=400>

## The general rules behind a Sigmoid Belief Network

There are some pros and cons to the Belief network:

**Pros:** It is easy to generate an unbiased example at the leaf nodes, so we can see what kinds of data the network believes in.

**Cons:** It is hard to infer the posterior distribution over all possible configurations of hidden causes.  It is hard to even get a
sample from the posterior.

Looking at those cons, one would wonder if it's possible to scale this belief net architecture up to account for millions of parameters (like many other neural networks architectures)?

<img src="https://github.com/matthew-mcateer/external_project_images/blob/master/Belief_Net_learning.PNG?raw=true" width=400>

With this in mind, our strategy for updating these weights is the following:
$$
p_i \equiv p(s_i = 1) = \frac{1}{1+\exp(-\sum_j s_j w_{ji})} \\
\Delta w_{ji} = \epsilon \text{ }s_j(s_i-p_i)
$$


## Our Example

For our example, we're going to train our sigmoid belief network on the [Caltech 101 Silhouettes](http://www.cs.toronto.edu/~kswersky/data/) data set. First, let's define our model.

In [0]:
#@title Hyperparameters
data_dir = "/tmp/data"
out_dir = "/tmp/out"
#@markdown Batch size during training
batch_size = 24                   #@param
#@markdown Hidden size per layer from bottom-up
hidden_sizes = [300, 100, 50, 10] #@param
#@markdown Number of samples for training
n_train_samples = 10              #@param
#@markdown Number of samples to calculate test log-lik
n_test_samples = 1000             #@param
#@markdown Learning rate step size
step_size = 1e-3                  #@param
n_epoch = 100                     #@param
n_iter_per_epoch = 10000          #@param

if not os.path.exists(out_dir):
    os.makedirs(out_dir)

In [0]:
def generator(array, batch_size):
    """
    Generate batch with respect to array's first axis.
    """
    start = 0  # pointer to where we are in iteration
    while True:
        stop = start + batch_size
        diff = stop - array.shape[0]
        if diff <= 0:
            batch = array[start:stop]
            start += batch_size
        else:
            batch = np.concatenate((array[start:], array[:diff]))
            start = diff
        yield batch

### Data

The [Caltech 101 silhouettes](http://www.cs.toronto.edu/~kswersky/data/) dataset is a dataset of silhouettes for training classifiers. It is a series of binarized (black-and-white) images that serve as the silhouettes of objects to be classified. It was originally created from the [Caltech 101](http://www.vision.caltech.edu/Image_Datasets/Caltech101/) dataset (originally used for object recognition), for research on inductive principles for restricted Boltzmann machine learning.

<img src="https://people.cs.umass.edu/~marlin/data/caltech101s.png" align="middle">

*A sample of images from the Caltech 101 silhouettes*

In [0]:
# Set seed. Remove this line to generate different mixtures!
tf.set_random_seed(77)

(x_train, _), (x_test, _), (x_valid, _) = caltech101_silhouettes(data_dir)
x_train_generator = generator(x_train, batch_size)

with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
    x_ph = tf.get_variable(name='x_ph', 
                           dtype=tf.int32,
                           initializer=tf.to_int32(next(x_train_generator)),
                           trainable=False
                          )

# When training, assigns value of x_ph to next batch value
assign_train_op = tf.assign(x_ph, value=tf.to_int32(next(x_train_generator)))

>> Downloading /tmp/data/caltech101_silhouettes_28_split1.mat.part 
>> [851.7 KB/851.7 KB] 120% @2.4 MB/s,[0s remaining, 0s elapsed]        
URL http://people.cs.umass.edu/~marlin/data/caltech101_silhouettes_28_split1.mat downloaded to /tmp/data/caltech101_silhouettes_28_split1.mat 


  download_file(url, filepath, hash_true, resume)
  download_file(url, filepath, hash_true, resume)


### Model

We set up the directed graph described below, with the number and sizes of hidden layers that we described earlier.

In [0]:
zs = [0] * len(hidden_sizes)
for l in reversed(range(len(hidden_sizes))):
    if l == len(hidden_sizes) - 1:
          logits = tf.zeros([tf.shape(x_ph)[0], hidden_sizes[l]])
    else:
          logits = tf.layers.dense(tf.cast(zs[l + 1].sample(), tf.float32),
                                   hidden_sizes[l],
                                   activation=None)
    zs[l] = tfd.Bernoulli(logits=logits)

x = tfd.Bernoulli(logits=tf.layers.dense(tf.cast(zs[0].sample(), tf.float32),
                                         28 * 28, activation=None))

We define the variational model with reverse ordering as probability model. 
For example: if the layers of $p$ are $15-100-300$ from top-down, then the layers of $q$ are $300-100-15$ from bottom-up.

In [0]:
qzs = [0] * len(hidden_sizes)
for l in range(len(hidden_sizes)):
    if l == 0:
          logits = tf.layers.dense(tf.cast(x_ph, tf.float32),
                                   hidden_sizes[l], activation=None)
    else:
          logits = tf.layers.dense(tf.cast(qzs[l - 1].sample(), tf.float32),
                                   hidden_sizes[l], activation=None)
    qzs[l] = tfd.Bernoulli(logits=logits)

### Inference

We can train our model using a combination of log-likelihoods for our model of the test and training data, as well as an Adam Optimizer for minimization of the negative log-loss. Looking back at our diagram for the sigmoid belief network architecture, the inference architecture is just the inverse of that.

In [0]:
loss = tf.reduce_sum(x.log_prob(x_ph))
for z, qz in zip(zs, qzs):
    loss += tf.reduce_mean(z.log_prob(qz.sample(sample_shape=n_train_samples)))

optimizer = tf.train.AdamOptimizer(learning_rate=step_size)
train_op = optimizer.minimize(-loss)

init_op = tf.global_variables_initializer()

We can then train the model below. On the default settings, this takes ~273s / epoch. By epoch 100, we can get these results:
- Training negative log-likelihood: `209.443`

In [0]:
evaluate(init_op)
for epoch in range(n_epoch):
    print("Epoch {}".format(epoch))
    train_loss = 0.0
     
    # Our Progress bar
    pbar = Progbar(n_iter_per_epoch)
    for t in range(1, n_iter_per_epoch + 1):
        pbar.update(t)
        evaluate(assign_train_op)
        [_, loss_] = evaluate([train_op, loss])
        train_loss += loss_

    # Print per-data point loss, averaged over training epoch.
    train_loss /= n_iter_per_epoch
    train_loss /= batch_size
    print("Training negative log-likelihood: {:0.3f}".format(-train_loss))

    # Prior predictive check.
    images = evaluate(x.sample())
    for m in range(batch_size):
        imsave("{}/{}.png".format(out_dir, m), images[m].astype('uint8').reshape(28, 28))


Epoch 0
10000/10000 [100%] ██████████████████████████████ Elapsed: 273s
Training negative log-likelihood: 344.790
Epoch 1
10000/10000 [100%] ██████████████████████████████ Elapsed: 271s
Training negative log-likelihood: 342.218
Epoch 2
10000/10000 [100%] ██████████████████████████████ Elapsed: 271s
Training negative log-likelihood: 341.156
Epoch 3
10000/10000 [100%] ██████████████████████████████ Elapsed: 271s
Training negative log-likelihood: 339.965
Epoch 4
10000/10000 [100%] ██████████████████████████████ Elapsed: 272s
Training negative log-likelihood: 337.577
Epoch 5
10000/10000 [100%] ██████████████████████████████ Elapsed: 272s
Training negative log-likelihood: 336.907
Epoch 6
10000/10000 [100%] ██████████████████████████████ Elapsed: 271s
Training negative log-likelihood: 335.074
Epoch 7
10000/10000 [100%] ██████████████████████████████ Elapsed: 271s
Training negative log-likelihood: 334.069
Epoch 8
10000/10000 [100%] ██████████████████████████████ Elapsed: 273s
Training negativ

### Conclusion

We've seen how a Sigmoid Belief network operates. It should be noted that Sigmoid Belief Networks are subsets of a different architecture: a Deep Belief Network.

A DBN is a generative graphical model, or alternatively a class of deep neural network, composed of multiple layers of latent variables ("hidden units"), with connections between the layers but not between units within each layer.[[2]](#scrollTo=2rGFv5Y2RTap). Unlike with a typical Sigmoid Belief network, a DBN can feature bidirectional connections between some of the hidden layers.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Deep_belief_net.svg/440px-Deep_belief_net.svg.png" width=200>

When trained on a set of examples without supervision, a Sigmoid Belief can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors.[[2]](#scrollTo=2rGFv5Y2RTap) After this learning step, a Sigmoid Belief network can be further trained with supervision to perform classification.[[3]](#scrollTo=2rGFv5Y2RTap)


## Reference

[1] [Sigmoid belief network (Neal, 1990)](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.1777&rep=rep1&type=pdf)

[2] Hinton, Geoffrey E. ["Deep belief networks."](http://www.scholarpedia.org/article/Deep_belief_networks) Scholarpedia 4.5 (2009): 5947.

[3] Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. ["A fast learning algorithm for deep belief nets."](http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf) Neural computation 18.7 (2006): 1527-1554.