# <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> CS109B Data Science 2: Advanced Topics in Data Science 

## Lab 02 - Review of Probability Distributions 

**Harvard University**<br>
**Spring 2022**<br>
**Instructors:** Mark Glickman and Pavlos Protopapas<br>
**Lab instructor and content:** Eleni Angelaki Kaxiras<br>

---

In [1]:
import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import theano.tensor as tt

from pymc3 import summary

warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt
from matplotlib import gridspec
import scipy.stats as stats
import pandas as pd
import seaborn as sns
%matplotlib inline 

import warnings
print('Running on PyMC3 v{}'.format(pm.__version__))

Running on PyMC3 v3.11.4


In [3]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 20000;

In [4]:
#pandas trick
pd.options.display.max_columns = 50  # None -> No Restrictions
pd.options.display.max_rows = 200    # None -> Be careful with this 
pd.options.display.max_colwidth = 100
pd.options.display.precision = 3

## A review of Common Probability Distributions

### Discrete Distributions

The random variable has a **probability mass function (pmf)** which measures the probability that our random variable will take a specific value $y$, denoted $P(Y=y)$.

- **Bernoulli** (binary outcome, success has probability $\theta$, $one$ trial):
$
P(Y=k) =  \theta^k(1-\theta)^{1-k}
$
<HR>
- **Binomial** (binary outcome, success has probability $\theta$, $k$ sucesses, $n$ trials):
\begin{equation}
P(Y=k) =  {{n}\choose{k}} \cdot \theta^k(1-\theta)^{n-k}
\end{equation}

*Note*: Binomial(1,$p$) = Bernouli($p$)
<HR>
- **Poisson** (counts independent events occurring at a rate $\lambda$)
\begin{equation}
P\left( Y=y|\lambda \right) = \frac{{e^{ - \lambda } \lambda ^y }}{{y!}}
\end{equation}
y = 0,1,2,...
<HR>
- **Categorical, or Multinulli** (random variables can take any of K possible categories, each having its own probability; this is a generalization of the Bernoulli distribution for a discrete variable with more than two possible outcomes, such as the roll of a die)

### Continuous Distributions

The random variable has a **probability density function (pdf)**.
- **Uniform** (variable equally likely to be near each value in interval $(a,b)$)
\begin{equation}
P(X = x) = \frac{1}{b - a}
\end{equation}
anywhere within the interval $(a, b)$, and zero elsewhere.
<HR>
- **Normal** (a.k.a. Gaussian)
\begin{equation}
X \sim  \mathcal{N}(\mu,\,\sigma^{2})
\end{equation} 

    A Normal distribution can be parameterized either in terms of precision $\tau$ or variance $\sigma^{2}$. The link between the two is given by
\begin{equation}
\tau = \frac{1}{\sigma^{2}}
\end{equation}
 - Expected mean $\mu$
 - Variance $\frac{1}{\tau}$ or $\sigma^{2}$
 - Parameters: `mu: float`, `sigma: float` or `tau: float`
 - Range of values (-$\infty$, $\infty$)
<HR>
- **Beta** (where the variable ($\theta$) takes on values in the interval $[0,1]$, and is parametrized by two positive parameters, $\alpha$ and $\beta$ that control the shape of the distribution. Note that Beta is a good distribution to use for priors (beliefs) because its range is $[0,1]$ which is the natural range for a probability and because we can model a wide range of functions by changing the $\alpha$ and $\beta$ parameters. Its density is:

\begin{equation}
\label{eq:beta} 
P(\theta|a,b) = \frac{1}{B(\alpha, \beta)} {\theta}^{\alpha - 1} (1 - \theta)^{\beta - 1} \propto {\theta}^{\alpha - 1} (1 - \theta)^{\beta - 1}
\end{equation}

where the normalisation constant, $B$, is a beta function of $\alpha$ and $\beta$,


\begin{equation}
B(\alpha, \beta) = \int_{t=0}^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.
\end{equation}
 - 'Nice', unimodal distribution
 - Range of values $[0, 1]$
    
<HR>
    
- **Exponential**
    
 - Range of values [$0$, $\infty$]
<HR>
- **Gamma**



 #### Code Resources:
 - Statistical Distributions in numpy/scipy: [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
 - Statistical Distributions in pyMC3: [distributions in PyMC3](https://docs.pymc.io/api/distributions.html) (we will see when we look at PyMC3).

<div class="discussion"><b>Exercises: Discrete Probability Plots</b></div>

#### Poisson
Change the value of $\lambda$ in the Poisson PMF and see how the plot changes. Remember that the y-axis in a discrete probability distribution shows the probability of the random variable having a specific value in the x-axis.

\begin{equation}
P\left( X=y \right|\lambda) = \frac{{e^{ - \lambda } \lambda ^y }}{{y!}}
\end{equation}

for $y \ge0$.

Routine is `stats.poisson.pmf(x, lambda)`. $\lambda$ is our $\theta$ in this case. $\lambda$ is also the mean in this distribution.

In [5]:
plt.style.use('seaborn-darkgrid')
x = np.arange(0, 60)
for lam in [0.5, 3, 8]:
    pmf = stats.poisson.pmf(x, lam)
    plt.plot(x, pmf, alpha=0.5, label='$\lambda$ = {}'.format(lam))
plt.xlabel('random variable', fontsize=12)
plt.ylabel('probability', fontsize=12)
plt.legend(loc=1)
plt.ylim=(-0.1)
plt.show()

#### Binomial

In [6]:
plt.style.use('seaborn-darkgrid')
x = np.arange(0, 50)
ns = [10, 17]
ps = [0.5, 0.7]
for n, p in zip(ns, ps):
    pmf = stats.binom.pmf(x, n, p)
    plt.plot(x, pmf, alpha=0.5, label='n = {}, p = {}'.format(n, p))
plt.xlabel('x', fontsize=14)
plt.ylabel('f(x)', fontsize=14)
plt.legend(loc=1)
plt.show()

<div class="discussion"><b>Exercise: Continuous Distributions Plot<br></div>

#### Uniform
    
Change the value of $\mu$ in the Uniform PDF and see how the plot changes.
    
Remember that the y-axis in a continuous probability distribution does not shows the actual probability of the random variable having a specific value in the x-axis because that probability is zero!. Instead, to see the probability that the variable is within a small margin we look at the integral below the curve of the PDF.

The uniform is often used as a noninformative prior.

```
Uniform - numpy.random.uniform(a=0.0, b=1.0, size)
```

$\alpha$ and $\beta$ are our parameters. `size` is how many tries to perform.
Our $\theta$ is basically the combination of the parameters a,b. We can also call it 
\begin{equation}
\mu = (a+b)/2
\end{equation}

In [7]:
from scipy.stats import uniform

r = uniform.rvs(size=1000)
plt.plot(r, uniform.pdf(r),'r-', lw=5, alpha=0.6, label='uniform pdf')
plt.hist(r, density=True, histtype='stepfilled', alpha=0.2)
plt.ylabel(r'probability density')
plt.xlabel(f'random variable')
plt.legend(loc='best', frameon=False)
plt.show()

#### Beta

We get an amazing set of shapes by tweaking the two parameters $a$ and $b$! Notice that for $a=b=1.$ we get a constant. From then on, as the values increase, we get a curve that looks more and more Gaussian.

In [8]:
from scipy.stats import beta

fontsize = 15
alphas = [0.5, 0.5, 1., 3., 6.]
betas = [0.5, 1., 1., 3., 6.]
x = np.linspace(0, 1, 1000) 
colors = ['red', 'green', 'blue', 'black', 'pink']

fig, ax = plt.subplots(figsize=(8, 5))

for a, b, colors in zip(alphas, betas, colors):
    dist = beta(a, b)
    plt.plot(x, dist.pdf(x), c=colors,
             label=f'a={a}, b={b}')

ax.set_ylim(0, 3)

ax.set_xlabel(r'$\theta$', fontsize=fontsize)
ax.set_ylabel(r'P ($\theta|\alpha,\beta)$', fontsize=fontsize)
ax.set_title('Beta Distribution', fontsize=fontsize*1.2)

ax.legend(loc='best')
fig.show();

#### Gaussian

In [9]:
y = pm.Exponential.dist(lam=2), 
#y = pm.Binomial.dist(n=10, p=0.5)
type(y)
#print(y.logp(4).eval())
#plt.plot(y.random(size=30))

In [10]:
plt.style.use('seaborn-darkgrid')
x = np.linspace(-5, 5, 1000)
mus = [0., 0., 0., -2.]
sigmas = [0.4, 1., 2., 0.4]
for mu, sigma in zip(mus, sigmas):
    pdf = stats.norm.pdf(x, mu, sigma)
    plt.plot(x, pdf, label=r'$\mu$ = '+ f'{mu},' + r'$\sigma$ = ' + f'{sigma}') 
plt.xlabel('random variable', fontsize=12)
plt.ylabel('probability density', fontsize=12)
plt.legend(loc=1)
plt.show()

<div class="discussion"> <b>At home</b>: Prove the formula mentioned in class which gives the probability density for a Beta distribution with parameters $2$ and $5$:<BR>
$p(\theta|2,5) = 30 \cdot \theta(1 - \theta)^4$

**References**:

- [Distributions in PyMC3](https://docs.pymc.io/api/distributions.html)

Information about PyMC3 functions including descriptions of distributions, sampling methods, and other functions, is available via the `help` command.