##### Copyright 2018 The TensorFlow Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Cox Process with TFP

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/cox_process_with_tfp.ipynb"><img height="32px" src="https://colab.research.google.com/img/colab_favicon.ico" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/cox_process_with_tfp.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>
<br>
<br>
<br>

Original content [this Repository](https://github.com/blei-lab/edward/blob/master/examples/cox_process.py), created by [the Blei Lab](http://www.cs.columbia.edu/~blei/)

Ported to [Tensorflow Probability](https://www.tensorflow.org/probability/) by Matthew McAteer ([`@MatthewMcAteer0`](https://twitter.com/MatthewMcAteer0)), with help from Bryan Seybold, Mike Shwe ([`@mikeshwe`](https://twitter.com/mikeshwe)), Josh Dillon, and the rest of the TFP team at  Google ([`tfprobability@tensorflow.org`](mailto:tfprobability@tensorflow.org)).

---

- [Dependencies & Prerequisites](#scrollTo=J21wYXBIbZq3)
  - [Data](#scrollTo=vGg52VvSRlWm)
  - [Model](#scrollTo=J808jVnDRlWo)
  - [Inference](#scrollTo=q8SIkqvhRlWt)
- [References](#scrollTo=Jq1b4fk6RlWx)

## Dependencies & Prerequisites

In [0]:
#@title Imports and Global Variables  { display-mode: "form" }
!pip3 install -q observations
from __future__ import absolute_import, division, print_function

#@markdown This sets the warning status (default is `ignore`, since this notebook runs correctly)
warning_status = "ignore" #@param ["ignore", "always", "module", "once", "default", "error"]
import warnings
warnings.filterwarnings(warning_status)
with warnings.catch_warnings():
    warnings.filterwarnings(warning_status, category=DeprecationWarning)
    warnings.filterwarnings(warning_status, category=UserWarning)

import numpy as np
import pandas as pd
import string
from datetime import datetime
import os
#@markdown This sets the styles of the plotting (default is styled like plots from [FiveThirtyeight.com](https://fivethirtyeight.com/))
matplotlib_style = 'fivethirtyeight' #@param ['fivethirtyeight', 'bmh', 'ggplot', 'seaborn', 'default', 'Solarize_Light2', 'classic', 'dark_background', 'seaborn-colorblind', 'seaborn-notebook']
import matplotlib.pyplot as plt; plt.style.use(matplotlib_style)
import matplotlib.axes as axes;
from matplotlib.patches import Ellipse
%matplotlib inline
import seaborn as sns; sns.set_context('notebook')
from IPython.core.pylabtools import figsize
#@markdown This sets the resolution of the plot outputs (`retina` is the highest resolution)
notebook_screen_res = 'retina' #@param ['retina', 'png', 'jpeg', 'svg', 'pdf']
%config InlineBackend.figure_format = notebook_screen_res

import tensorflow as tf
tfe = tf.contrib.eager

# Eager Execution
#@markdown Check the box below if you want to use [Eager Execution](https://www.tensorflow.org/guide/eager)
#@markdown Eager execution provides An intuitive interface, Easier debugging, and a control flow comparable to Numpy. You can read more about it on the [Google AI Blog](https://ai.googleblog.com/2017/10/eager-execution-imperative-define-by.html)
use_tf_eager = False #@param {type:"boolean"}

# Use try/except so we can easily re-execute the whole notebook.
if use_tf_eager:
  try:
    tf.enable_eager_execution()
  except:
    pass

import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
from tensorflow_probability import edward2 as ed
from tensorflow.python.ops import control_flow_ops

  
def evaluate(tensors):
  """Evaluates Tensor or EagerTensor to Numpy `ndarray`s.
  Args:
  tensors: Object of `Tensor` or EagerTensor`s; can be `list`, `tuple`,
    `namedtuple` or combinations thereof.
 
  Returns:
    ndarrays: Object with same structure as `tensors` except with `Tensor` or
      `EagerTensor`s replaced by Numpy `ndarray`s.
  """
  if tf.executing_eagerly():
    return tf.contrib.framework.nest.pack_sequence_as(
        tensors,
        [t.numpy() if tf.contrib.framework.is_tensor(t) else t
         for t in tf.contrib.framework.nest.flatten(tensors)])
  return sess.run(tensors)

class _TFColor(object):
    """Enum of colors used in TF docs."""
    red = '#F15854'
    blue = '#5DA5DA'
    orange = '#FAA43A'
    green = '#60BD68'
    pink = '#F17CB0'
    brown = '#B2912F'
    purple = '#B276B2'
    yellow = '#DECF3F'
    gray = '#4D4D4D'
    def __getitem__(self, i):
        return [
            self.red,
            self.orange,
            self.green,
            self.blue,
            self.pink,
            self.brown,
            self.purple,
            self.yellow,
            self.gray,
        ][i % 9]
TFColor = _TFColor()

def session_options(enable_gpu_ram_resizing=True, enable_xla=True):
    """
    Allowing the notebook to make use of GPUs if they're available.
    
    XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear 
    algebra that optimizes TensorFlow computations.
    """
    config = tf.ConfigProto()
    config.log_device_placement = True
    if enable_gpu_ram_resizing:
        # `allow_growth=True` makes it possible to connect multiple colabs to your
        # GPU. Otherwise the colab malloc's all GPU ram.
        config.gpu_options.allow_growth = True
    if enable_xla:
        # Enable on XLA. https://www.tensorflow.org/performance/xla/.
        config.graph_options.optimizer_options.global_jit_level = (
            tf.OptimizerOptions.ON_1)
    return config


def reset_sess(config=None):
    """
    Convenience function to create the TF graph & session or reset them.
    """
    if config is None:
        config = session_options()
    global sess
    tf.reset_default_graph()
    try:
        sess.close()
    except:
        pass
    sess = tf.InteractiveSession(config=config)

class MVNCholPrecisionTriL(tfd.TransformedDistribution):
  """MVN from loc and (Cholesky) precision matrix."""

  def __init__(self, loc, chol_precision_tril, name=None):
    super(MVNCholPrecisionTriL, self).__init__(
        distribution=tfd.Independent(tfd.Normal(tf.zeros_like(loc),
                                                scale=tf.ones_like(loc)),
                                     reinterpreted_batch_ndims=1),
        bijector=tfb.Chain([
            tfb.Affine(shift=loc),
            tfb.Invert(tfb.Affine(scale_tril=chol_precision_tril,
                                  adjoint=True)),
        ]),
        name=name)

reset_sess()

# from edward.models import MultivariateNormalTriL, Normal, Poisson
from scipy.stats import multivariate_normal, poisson

## Introduction

### Cox vs. Poisson Point Processes

A Cox Process is a variation on the  Poisson Point Process.

Poisson Point Processes are useful for distributions of events that happen randomly. Consider the examples of banks going bust, busses arriving at bus stops, and calls coming into a call center. A poisson point process would be useful for modelling these, since you can specify how homogeneous or inhomogeneous the events are. You can create stochastic simulations of the timelines of these events, and where those events.

However, We can go even further. What if not all events were the same. Suppose we wanted the time-dependent intensity itself to be modelled. This is what a Cox Process (named after the statistician [David Cox](https://en.wikipedia.org/wiki/David_Cox_(statistician)), who first published the model in 1955) is. Because of this double-stochasticity, a Cox process is sometimes known as a **doubly** stochastic Poisson process


## Our example

In our example, we're creating a Cox process model for spatial analysis ([Cox, 1955](https://www.jstor.org/stable/2983950?seq=1/subjects); Miller et al., 2014). The data set is a $N \times V$ matrix. There are $N$ NBA players, $X = {(x_1, ..., x_N)}$, where each $x_n$ has a set of $V$ counts. $x_{n, v}$ is
the number of attempted basketball shots for the $n$th NBA player at
location $v$.

We model a latent intensity function for each data point. Let $K$ be the
$N \times V \times V$ covariance matrix applied to the data set $X$ with fixed
kernel hyperparameters, where a slice $K_n$ is the $V \times V$ covariance
matrix over counts for a data point $x_n$.

$ \text{For } n = 1, ..., N $,

$$ \begin{align*} p(f_n) &= N(f_n | 0, K_n) \text{,} \\
p(x_{n,v} | f_{n, v}) &= \text{Poisson}(x_{n,v} | \text{exp}(f_{n,v})) 
\end{align*}$$

This gives us the formula for the probability of the number of attempted basketball shots for the the $n$th NBA player in total

$$ \begin{align*}
p(x_n | f_n) &= \prod_{v=1}^V p(x_{n,v} | f_{n,v}) \text{,}\\
  &= \prod_{v=1}^V \text{Poisson}(\lambda= \text{exp}(f_{n,v})) 
  \end{align*}$$

.

In [0]:
#@title Hyperparameters { run: "auto" }
#@markdown Number of NBA players (default is 308)
N = 308   #@param {type:"slider", min:100, max:350, step:1} 
#@markdown Number of shot locations (This notebook is optimized for 2)
V = 2     #@param {type:"slider", min:2, max:4, step:1}

shot_locations = []
for v in range(V):
  shot_locations.append("Location {}".format(v+1))
  
nba_players = []
for n in range(N):
  nba_players.append("Player {}".format(n+1))


### Data

In [0]:
# Set seed. Remove this line to generate different mixtures!
tf.set_random_seed(77)

def build_toy_dataset(N, V):
    """
    A simulator mimicking the data set from 2015-2016 NBA season with
    308 NBA players and ~150,000 shots.
    """
    L = np.tril(np.random.normal(2.5, 0.1, size=[V, V]))
    K = np.matmul(L, L.T)
    x = np.zeros([N, V])
    for n in range(N):
        f_n = multivariate_normal.rvs(cov=K, size=1)
        for v in range(V):
            x[n, v] = poisson.rvs(mu=np.exp(f_n[v]), size=1)

    return x

x_data = build_toy_dataset(N, V)
x_data = x_data.astype(np.float32)
pd.options.display.float_format = '{:20,.0f}'.format
pd.set_option('display.max_rows', N)
print("Our toy dataset (Rows = Players, Columns = Court Positions) ")
print(pd.DataFrame(x_data))

Our toy dataset (Rows = Players, Columns = Court Positions) 
                       0                    1
0                      6                    2
1                      0                    2
2                      0                    0
3                      4                   35
4                      7                   80
5                      0                    0
6                      5                   19
7                      5                    3
8                      0                    5
9                     40                   68
10                     7                  217
11                    52                  104
12                     2                    0
13                    56                   72
14                     0                    0
15                     0                    0
16                     0                    2
17                     3                    2
18                    58                  214
19                 

### Model

So for our point process, we want to be able to simulate one fo these $N \times V$ datasets, which we can do with a latent intensity function $f$. This will be a multivariate normal distribution for which we calculate the covariance for every single NBA player across all of the positions they made shots from (not all players are going to be equal when it comes to which position they're best from).  The outputs of this function will then be fed

We model a latent intensity function for each data point. Let $K$ be the
$N \times V \times V$ covariance matrix applied to the data set $X$ with fixed
kernel hyperparameters, where a slice $K_n$ is the $V \times V$ covariance
matrix over counts for a data point $x_n$.

$ \text{For } n = 1, ..., N$  (with $N$ being the number of players),

$$ p(f_n) = N(f_n | 0, K_n) \text{,} $$

First we need to define the Radial basis function kernel, also known as the squared exponential or exponentiated quadratic:

It is defined as $k(x, x') = \sigma^2 \exp\Big(-\ \frac{1}{2} \sum_{d=1}^D \ \frac{1}{\ell_d^2} (x_d - x'_d)^2 \Big)$ for output variance $\sigma^2$ and lengthscale $\ell^2$.
  
The kernel is evaluated over all pairs of rows, `k(X[i, ], X2[j, ])`. If `X2` is not specified, then it evaluates over all pairs of rows in `X`, `k(X[i, ], X[j, ])`. The output is a matrix where each entry (`i`, `j`) is the kernel over the `i`th and `j`th rows.
  

In [0]:
#@title RBF Function Definition  { display-mode: "form" }
from __future__ import absolute_import, division, print_function

import tensorflow as tf



def rbf(X, X2=None, lengthscale=1.0, variance=1.0):
    """Radial basis function kernel
    Args:
        X: tf.Tensor.
            N x D matrix of N data points each with D features.
        X2: tf.Tensor.
            N x D matrix of N data points each with D features.
        lengthscale: tf.Tensor.
            Lengthscale parameter, a positive scalar or D-dimensional vector.
        variance: tf.Tensor.
            Output variance parameter, a positive scalar.
    """
    lengthscale = tf.convert_to_tensor(lengthscale)
    variance = tf.convert_to_tensor(variance)
    dependencies = [tf.assert_positive(lengthscale),
                    tf.assert_positive(variance)]
    lengthscale = control_flow_ops.with_dependencies(dependencies, lengthscale)
    variance = control_flow_ops.with_dependencies(dependencies, variance)

    X = tf.convert_to_tensor(X)
    X = X / lengthscale
    Xs = tf.reduce_sum(tf.square(X), 1)
    if X2 is None:
        X2 = X
        X2s = Xs
    else:
        X2 = tf.convert_to_tensor(X2)
        X2 = X2 / lengthscale
        X2s = tf.reduce_sum(tf.square(X2), 1)

    square = tf.reshape(Xs, [-1, 1]) + tf.reshape(X2s, [1, -1]) - \
        2 * tf.matmul(X, X2, transpose_b=True)
    output = variance * tf.exp(-square / 2)
    return output


Now we can get to defining our model. We're going to define our Distribution for our point process, $f$, as a multivariate normal distribution. From here, we feed random samples from that into a Poisson Distribution

In [0]:
# Form (N, V, V) covariance, one matrix per data point.
K = tf.stack([rbf(tf.reshape(xn, [V, 1])) + tf.diag(tf.fill([V], 1e-6))
                for xn in tf.unstack(x_data)])

# Creating our Covariance Matrix
f = tfd.MultivariateNormalTriL(loc=tf.zeros([N, V]),
                               scale_tril=tf.cholesky(K))

# Feeding the latent function into a Poisson Distribution
x_init = tfd.Poisson(rate=tf.exp(f.sample()))
x_model = tfd.Poisson(rate=tf.exp(f.sample()))

# Getting our initial parametrization tensor for setting up the 
# trainable_distribution
x_ = evaluate(x_init.sample())

### Inference

Based on this, we're going to take our prior, and use this to infer a normal distribution distribution. 

For this inference, we first need to define our trainable distributions. This will be a version of `tfd.MultivariateNormalTrill` that we will be able to parameterize with just one tensor. We can then improve the fit using using `tf.train` optimizers (or potentially even `tfp.optimizers`)

In [0]:
def softplus_and_shift(x, shift=1e-5, name=None):
  """Converts (batch of) scalars to (batch of) positive valued scalars.
  Args:
    x: (Batch of) `float`-like `Tensor` representing scalars which will be
      transformed into positive elements.
    shift: `Tensor` added to `softplus` transformation of elements.
      Default value: `1e-5`.
    name: A `name_scope` name for operations created by this function.
      Default value: `None` (i.e., "positive_tril_with_shift").
  Returns:
    scale: (Batch of) scalars`with `x.dtype` and `x.shape`.
  """
  x = tf.convert_to_tensor(x, name='x')
  y = tf.nn.softplus(x)
  if shift is not None:
      y += shift
  return y


def tril_with_diag_softplus_and_shift(x, diag_shift=1e-5, name=None):
  """Converts (batch of) vectors to (batch of) lower-triangular scale matrices.
  Args:
    x: (Batch of) `float`-like `Tensor` representing vectors which will be
      transformed into lower-triangular scale matrices with positive diagonal
      elements. Rightmost shape `n` must be such that
      `n = dims * (dims + 1) / 2` for some positive, integer `dims`.
    diag_shift: `Tensor` added to `softplus` transformation of diagonal
      elements.
      Default value: `1e-5`.
    name: A `name_scope` name for operations created by this function.
      Default value: `None` (i.e., "tril_with_diag_softplus_and_shift").
  Returns:
    scale_tril: (Batch of) lower-triangular `Tensor` with `x.dtype` and
      rightmost shape `[dims, dims]` where `n = dims * (dims + 1) / 2` where
      `n = x.shape[-1]`.
  """
  with tf.name_scope(name, 'tril_with_diag_softplus_and_shift',
                     [x, diag_shift]):
      x = tf.convert_to_tensor(x, name='x')
      x = tfd.fill_triangular(x)
      diag = softplus_and_shift(tf.matrix_diag_part(x), diag_shift)
      x = tf.matrix_set_diag(x, diag)
      return x


def trainable_multivariate_normal_tril(x, dims, layer_fn=tf.layers.dense,
    loc_fn=lambda x: x, scale_fn=tril_with_diag_softplus_and_shift,
    name=None):
  """Constructs a trainable `tfd.MultivariateNormalTriL` distribution.
  Args:
    x: `Tensor` with floating type. Must have statically defined rank and
      statically known right-most dimension.
    dims: Scalar, `int`, `Tensor` indicated the MVN event size, i.e., the
      created MVN will be distribution over length-`dims` vectors.
    layer_fn: Python `callable` which takes input `x` and `int` scalar `d` and
      returns a transformation of `x` with shape
      `tf.concat([tf.shape(x)[:-1], [d]], axis=0)`.
      Default value: `tf.layers.dense`.
    loc_fn: Python `callable` which transforms the `loc` parameter. Takes a
      (batch of) length-`dims` vectors and returns a `Tensor` of same shape and
      `dtype`.
      Default value: `lambda x: x`.
    scale_fn: Python `callable` which transforms the `scale` parameters. Takes a
      (batch of) length-`dims * (dims + 1) / 2` vectors and returns a
      lower-triangular `Tensor` of same batch shape with rightmost dimensions
      having shape `[dims, dims]`.
      Default value: `tril_with_diag_softplus_and_shift`.
    name: A `name_scope` name for operations created by this function.
      Default value: `None` (i.e., "multivariate_normal_tril").
  Returns:
    mvntril: An instance of `tfd.MultivariateNormalTriL`.
  """
  x = tf.convert_to_tensor(x, name='x')
  x = layer_fn(x, dims + dims * (dims + 1) // 2)
  return tfd.MultivariateNormalTriL(
      loc=loc_fn(x[..., :dims]),
      scale_tril=scale_fn(x[..., dims:]))

Now we can run our actual inference, and improve the parameters for $qf$.

In [0]:
# Build TF graph for fitting MVNTriL maximum likelihood estimator.
qf = trainable_multivariate_normal_tril(x_, dims=V)
kl = tf.reduce_mean(qf.kl_divergence(f))
loss = -tf.reduce_mean(qf.log_prob(x_data))
elbo = loss + kl
train_op = tf.train.AdamOptimizer(learning_rate=2.**-3).minimize(elbo)
mse = tf.reduce_mean(tf.squared_difference(x_data, qf.mean()))
init_op = tf.global_variables_initializer()

# Run graph 5000 times.
num_steps = 50000
elbo_ = evaluate(tf.zeros(num_steps)) # Style: `_` to indicate evaluate result.
mse_ = evaluate(tf.zeros(num_steps))
kl_ = evaluate(tf.zeros(num_steps))

evaluate(init_op)
for it in range(elbo_.size):
    _, elbo_[it], mse_[it], kl_[it] = evaluate([train_op, elbo, mse, kl])
    if it % 2000 == 0 or it == elbo_.size - 1:
        print("iteration:{}  elbo:{:.6f}  mse:{:.6f}  KL divergence:{:.6f}".format(it, elbo_[it], mse_[it], kl_[it]))

iteration:0  elbo:235717183668224.000000  mse:590671.125000  KL divergence:1626973.750000
iteration:2000  elbo:21641766.000000  mse:590706.312500  KL divergence:21396096.000000
iteration:4000  elbo:14370678.000000  mse:591073.437500  KL divergence:14124876.000000
iteration:6000  elbo:13340787.000000  mse:591261.312500  KL divergence:13094784.000000
iteration:8000  elbo:12783556.000000  mse:591295.812500  KL divergence:12537099.000000
iteration:10000  elbo:11454258.000000  mse:591325.000000  KL divergence:11206598.000000
iteration:12000  elbo:8651038.000000  mse:591316.375000  KL divergence:8400160.000000
iteration:14000  elbo:4329554.000000  mse:591292.187500  KL divergence:4069961.250000
iteration:16000  elbo:1233809.750000  mse:591288.250000  KL divergence:951596.375000
iteration:18000  elbo:646732.500000  mse:591288.187500  KL divergence:322155.781250
iteration:20000  elbo:506378.687500  mse:591288.187500  KL divergence:179520.453125
iteration:22000  elbo:429994.156250  mse:591288.1

And now we finally have our final process, the one we defined earlier as

In [0]:
#x_cox = tfd.Poisson(rate=tf.exp(qf.sample()))
x_cox = tfd.Poisson(rate=qf.sample())
print(x_cox)


tfp.distributions.Poisson("Poisson_2/", batch_shape=(308, 2), event_shape=(), dtype=float32)


In [0]:
x_simulated = evaluate(x_cox.sample()).astype(np.float)
pd.options.display.float_format = '{:20,.0f}'.format
pd.set_option('display.max_rows', N)
print("Our Cox-Process-Simulated dataset (Rows = Players, Columns = Court Positions) ")
print(pd.DataFrame(x_simulated, columns=list('AB')))

Our Cox-Process-Simulated dataset (Rows = Players, Columns = Court Positions) 
                       A                    B
0                      0                    0
1                      0                    6
2                      0                    0
3                      0                    0
4                      0                    0
5                      2                    0
6                      0                    0
7                    171                  166
8                      0                    0
9                      2                    1
10                     0                    0
11                     3                    3
12                     0                    0
13                    30                   34
14                     0                    1
15                     1                    3
16                     0                    0
17                     0                    0
18                     2                    4
1

## Conclusion

And there we have it. We now have a Cox point process for simulating numbers of shots taken by NBA players from two given positions on the court. The range of these values is very similar to our original data (which we would expect, after creating our new doubly-stochastic process based on the ELBO loss that takes in the previous distribution and `x_data` itself).

As we can see, we can use the Cox Process to model phenomena more complex than simpler Poisson point processes. For example, Cox processes are used in neurology researchto generate simulations of spike trains (the sequence of action potentials generated by a neuron), [[2]](#scrollTo=Jq1b4fk6RlWx). Cox processes frequently come up in financial mathematics, especially in areas related to modeling derivatives [[3]](#scrollTo=Jq1b4fk6RlWx) and other credit securities [[4]](#scrollTo=Jq1b4fk6RlWx).

## References

[1] Cox, David R. ["Some statistical methods connected with series of events."](https://www.jstor.org/stable/2983950) Journal of the Royal Statistical Society. Series B (Methodological) (1955): 129-164.  

[2] Krumin, Michael, and Shy Shoham. ["Generation of spike trains with controlled auto-and cross-correlation functions."](https://www.mitpressjournals.org/doi/abs/10.1162/neco.2009.08-08-847) Neural Computation 21.6 (2009): 1642-1664.

[3] Dassios, Angelos, and Ji-Wook Jang. ["Pricing of catastrophe reinsurance and derivatives using the Cox process with shot noise intensity."](https://link.springer.com/article/10.1007/s007800200079) Finance and Stochastics 7.1 (2003): 73-95.

[4] Lando, David. ["On Cox processes and credit risky securities."](https://link.springer.com/article/10.1007/BF01531332) Review of Derivatives research 2.2-3 (1998): 99-120.
