# **DSC Europe 2020 - Getting started with TensorFlow Probability**

Before diving into the core example of this tutorial session, let's take a brief tour of the library we'll be using to build our model, [TensorFlow Probability](https://www.tensorflow.org/probability) (TFP).

TFP is built on top of TensorFlow's numerical framework and is designed to provide the building blocks you need to create many different kinds of statistical models. 

At its core is an API for defining and transforming *random variables*. To define new random variables, one can use any of the distributions defined in [tfp.distributions](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions). 

Let's take a look at a couple of examples.


## Initial Setup

First, we need to import TensorFlow, TFP, and the libraries we'll use for plotting.

In [None]:
# Imports
import tensorflow as tf
import tensorflow_probability as tfp

# Note: using plotly for plotting
import plotly.express as plx
import plotly.figure_factory as ff

Before we create our examples, we need to configure the seed for our random number generators. This helps to ensure that our random samples will be reproducible.


In [None]:
tf.random.set_seed(42)

## Creating and Sampling Random Variables

### Example 1

As a simple example, let's create a gaussian distribution with mean 0 and unit standard deviation. Then, we can use its `sample()` method to draw random samples from it.


In [None]:
# Create a gaussian random variable using tfp.distributions
ex1 = tfp.distributions.Normal(0, 1) # mean = 0, std_dev = 1

# Distributions support drawing samples using the .sample() method
ex1_samples = ex1.sample(2000)

# For plotting it can be helpful to convert the output from a 
# tf.Tensor to a numpy array
ex1_samples = ex1_samples.numpy()

# Plot
fig1 = ff.create_distplot(
    [ex1_samples], 
    ["Ex. 1 - Normal Dist."], 
    bin_size=[0.1], 
    show_rug=False
)
fig1.update_layout({
    "xaxis": {
        "title": {
            "text": "x"}
    },
    "yaxis": {
        "title": {
            "text": "P(x)"}
    },
})
fig1.show()

### Example 2
We can also sample from other distributions. For example, we can describe a weighted coin toss like the one discussed in the slides using a Bernouli distribution:

In [None]:
# Our coin is weighted so that it lands on heads more often than tails
p_heads = 0.6
ex2 = tfp.distributions.Bernoulli(probs=p_heads) 

# Draw samples
ex2_samples = ex2.sample(5000).numpy()

# Measure the fraction of heads/tails outcomes in our samples
# Probability of tails
sample_p0 = sum(ex2_samples == 0) / ex2_samples.size
# Probability of heads
sample_p1 = 1 - sample_p0

# We can use a simple bar chart for visualizing our categorical outcomes
fig2 = plx.bar(x=["tails (0)", "heads (1)"], y=[sample_p0, sample_p1])
fig2.update_layout({
    "xaxis": {
        "title": {
            "text": "x"}
    },
    "yaxis": {
        "title": {
            "text": "P(x)"}
    },
})
fig2.show()


## Exercise 1

Switch the notebook to "Playground Mode" (`File / Open in playground mode`) to enable editing. Then, edit the parameters of the normal distribution we defined in Example 1 and re-run the cell. 

You can also try out swapping the normal distributition for something else, for example `ex1 = tfp.distributions.LogNormal(0, 1)`


## Features

The distributions defined in `tfp.distributions` implement a variety of helpers which provide probability density functions (PDFs), cumulative density functions (CDFs), measures of difference between distributions ([relative entropy / KL Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)), and more. Here are a few examples using the distributions we defined above:


In [None]:
print ("----------------------")
print("10th and 90th percentiles for the normal distribution in example 1")
print(ex1.quantile([0.1, 0.9]))
print ("----------------------")


print("CDF of example 1 evaluated at 0")
print(ex1.cdf(0))
print ("----------------------")


print("KL divergence between two gaussians")
gauss1 = tfp.distributions.Normal(1, 1)
gauss2 = tfp.distributions.Normal(0, 1)
print(gauss1.kl_divergence(gauss2))
print ("----------------------")


----------------------
10th and 90th percentiles for the normal distribution in example 1
tf.Tensor([-1.2815516  1.2815512], shape=(2,), dtype=float32)
----------------------
CDF of example 1 evaluated at 0
tf.Tensor(0.5, shape=(), dtype=float32)
----------------------
KL divergence between two gaussians
tf.Tensor(0.5, shape=(), dtype=float32)
----------------------


TFP also includes methods for transforming random variables in [tfp.bijectors](https://www.tensorflow.org/probability/api_docs/python/tfp/bijectors). We won't go into detail on this library yet, but it is good 
to be aware of its existence when you inevitably need to include a transformation in your model.


## Modeling with TFP





### Teaser: Bayesian Inference

The random variable API already provides us with a nice way to start creating models. For example, we can re-create the coin-flip example we looked at earlier from [Seeing Theory](https://seeing-theory.brown.edu/bayesian-inference/index.html#section3) fairly easily! 

Don't worry about the implementation details of this quite yet. First, let's run the cell and take a look at how our prior belief of the coin's bias evolves as we continue to flip the coin.


In [None]:
import numpy as np
import pandas as pd

##############################################################################
# Create our "coin". theta=0 means it only returns tails, theta=0.5 is a balanced coin, 
# and theta=1 only returns heads
theta = 0.2

# The number of coin flips to simulate
n_flips = 100

# Start out with an initial prior that the coin's bias follows a beta distribution
# P(theta) ~ Beta(init_alpha, init_beta)
# You can tweak the Beta distribution's parameters the same as you would in the
# Seeing Theory example.
init_alpha = 2
init_beta = 2

###############################################################################

coin_dist = tfp.distributions.Bernoulli(probs=theta)
flip_coin = lambda: coin_dist.sample(1).numpy()[0]
priors = [(init_alpha, init_beta)]
flips = []

for i in range(n_flips):
  # Flip the coin
  flip = flip_coin()
  # Compute the posterior distribution
  # Since Beta distributions form a conjugate prior for Bernoulli
  # distributions, our posterior is also just a Beta distribution. This allows us 
  # to just compute its alpha and beta parameters analytically.
  alpha, beta = priors[-1]
  if flip == 1:
    alpha += 1
  else:
    beta += 1
  posterior_dist_params = (alpha, beta)
  
  # Update our latest prior to the current posterior distribution
  priors.append(posterior_dist_params)
  flips.append(flip)


# Reformat results for plotting
eval_at = np.linspace(-1, 1, num=200)
dfs = []
for i, (a, b) in enumerate(priors):
  dfs.append(pd.DataFrame({"prior_prob": tfp.distributions.Beta(a, b).prob(eval_at), "theta": eval_at, "flip": i+1}))
df = pd.concat(dfs, ignore_index=True)

# Create line plot animation. Adjust range_y below as needed.
fig = plx.line(df, x="theta", y="prior_prob", animation_frame="flip", range_y=[0,  10], range_x=[0, 1])
fig.update_layout(title="Evolving prior for coin bias (actual value is {})".format(theta), xaxis={"title": u"\u03B8"}, yaxis={"title": "P({})".format(u"\u03B8")})

print(f"Coin flip outcomes (0=tails,1=heads): {flips}")
fig.show()

Coin flip outcomes (0=tails,1=heads): [1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0]


### Next Steps

While it's great that TFP provides some basic tools for working with probability distributions within TensorFlow, the examples we've looked at could have easily been implemented with another library like Numpy. The real power of TFP is that it enables us to build models which harness the efficient numerical and automatic differentiation methods that TensorFlow offers.

In the coin-toss example above, we were able to compute our posterior distribution analytically. Even if this wasn't possible, we could have directly computed it since our model has only one parameter. In real-world situations you will often be working with high-dimensional models for which the posterior distribution can no longer be directly calculated. As we will see in the next part of the tutorial, this is where the modeling libraries of TFP come in to play.

TFP comes with a couple of built-in modeling methods and includes support for linear regression models (`tfp.glm`), variational reference (`tfp.vi`), and MCMC (`tfp.mcmc`). It also provides Keras layers for adding random variables to a Keras neural network.

We'll be using `tfp.mcmc` in the next steps of this tutorial, but first we will review the fundamentals of MCMC...