## Tutorial 2: Statistics

While TensorFlow offers some support for statistical inference, TensorFlow-Probability is very strong at this and provides MCMC methods, probability distributions and more.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import zfit
from zfit import z



## Distributions

There is a whole collection of different distributions to be found in TFP. They have a minimal and well designed interface, which is similar to the SciPy distributions.

In [2]:
tfd = tfp.distributions

In [3]:
cauchy = tfd.Cauchy(loc=1., scale=10.)

In [4]:
sample = cauchy.sample(10)

In [5]:
cauchy.prob(sample)

<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([0.00139813, 0.02871551, 0.02431434, 0.01461342, 0.00756948,
       0.00053465, 0.00532917, 0.00023775, 0.00022804, 0.00726295],
      dtype=float32)>

### Mixtures of PDFs

TensorFlow-Probability also supports creating mixtures of different distributions.

In [6]:
mix = 0.3
mix_gauss_cauchy = tfd.Mixture(
  cat=tfd.Categorical(probs=[mix, 1.-mix]),
  components=[
    cauchy,
    tfd.Normal(loc=+1., scale=0.5),
])

In [7]:
sample_mixed = mix_gauss_cauchy.sample(10)

In [8]:
mix_gauss_cauchy.prob(sample_mixed)

<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([0.56296194, 0.04526669, 0.00081195, 0.00912498, 0.48882836,
       0.21551062, 0.3615021 , 0.22900048, 0.481904  , 0.5654731 ],
      dtype=float32)>

### Joint distributions

Furthermore, joint distributions of multiple variables are supported.

In [9]:
joint = tfd.JointDistributionNamed(dict(
    c=             tfd.Cauchy(loc=10., scale=1.),
    n=             tfd.Normal(loc=0, scale=2.),
    m=lambda n, c: tfd.Normal(loc=n, scale=c),
))

In [10]:
sample_joint = joint.sample(10)
sample_joint

{'n': <tf.Tensor: shape=(10,), dtype=float32, numpy=
 array([ 1.2023524 , -2.5354168 ,  1.8461963 , -4.2037864 , -3.4064677 ,
        -2.000091  ,  1.4253551 , -0.5112927 ,  0.78236884, -0.97251046],
       dtype=float32)>,
 'c': <tf.Tensor: shape=(10,), dtype=float32, numpy=
 array([  5.2407813,  10.599838 ,   8.586605 ,  11.1632185,   4.2875338,
         11.737058 ,   9.83008  ,   7.2829266, -21.12375  ,  21.575275 ],
       dtype=float32)>,
 'm': <tf.Tensor: shape=(10,), dtype=float32, numpy=
 array([ 2.0575674,  6.33679  , -6.255143 ,  8.886772 ,  2.7386997,
        -6.191878 , -1.7884248, -0.8992961, -0.9361281,  2.1742077],
       dtype=float32)>}

In [11]:
joint.prob(sample_joint)

<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([1.6832477e-04, 5.5431761e-04, 4.1181527e-04, 5.3243475e-05,
       1.4745441e-05, 3.0568580e-04, 1.8417061e-03, 4.0100829e-04,
                 nan, 7.6459173e-06], dtype=float32)>

### How TFP compares to zfit

TensorFlow-Probability offers a great choice of distributions to build a model. The flexibility in terms of vectorization and parametrization is larger than in zfit. However, they only provide models with analytically known CDFs and lack any numerical normalization or sampling methods. This excludes any more sophisticated model, convolutions and more.

Internally, zfit simply wraps TFP distributions for certain implementations, such as the `Gauss`. There is also a standard wrapper, `WrapDistribution`, that allows to easily wrap any TFP distribution and use it in zfit.