In [2]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

plt.style.use(['ggplot', 'assets/class.mplstyle'])

red = '#E24A33'
blue = '#348ABD'
purple = '#988ED5'
gray = '#777777'
yellow = '#FBC15E'
green = '#8EBA42'
pink = '#FFB5B8'

# Bayesian Estimates

Bayes theorem:

$$p(A|B) = \frac{p(A\cap B)}{p(B)} \\
  p(B|A) = \frac{p(B\cap A)}{p(A)}$$
  
but $p(A\cap B) = p(B\cap A)$ so we can write:

$$ p(A|B) = \frac{p(B|A) p(A)}{p(B)} $$

In Bayesian speak, these terms are:

- $p(A|B)$: the posterior probability of $A$ given $B$. This our ultimate goal, obtain this probability.
- $p(A)$: the prior probability of $B$
- $p(B|A)$: the likelihood of $B$ given $A$
- $p(B)$: the probability of $A$, but in general it's just a normalization hard to compute.

In most useful applications for us:

- $A$ is a parameter from a chosen distribution or a statistic
- $B$ de data observed

Let's make it more concrete for the same thing we have been calculating so far. Let's say we know that the outcomes from our experiment can be understood as RVs drawn from a gaussian with unkown $\mu$ and $\sigma$. Now we make the experiment a number of times and obtain a sample from the distribution of size $N$. This sample will be our data. What we will be calculating then is:

$$ p(\mu, \sigma| \{x_i\}_{i=1,...,N}) = \frac{p(\{x_i\}_{i=1,...,N}|\mu, \sigma) p(\mu, \sigma)}{p(\{x_i\}_{i=1,...,N})} $$

Notice that $p(\mu, \sigma| \{x_i\}_{i=1,...,N})$: the probability density associated to the probability that the parameters of the parent gaussian are a given value of $\mu$ and $\sigma$, given that we know a sample from said parent gaussian contains the $N$ values $\{x_i\}_{i=1,...,N}$; **IS A FUNCTION**. In this case it is a bivariate function that we need to know for all values of $(\mu, \sigma)$.

Several problems arise when trying to evaluate the right hand side of the expression:

- $p(\{x_i\}_{i=1,...,N}|\mu, \sigma)$: is the probability that we have of obtaining a given sample $\{x_i\}_{i=1,...,N}$ if we knew that the parent gaussian distribution has the known parameters $\mu$ and $\sigma$. This is called the likelihood and it is imperative that we know how to calculate it or get a very good approximation. If our model is a gaussian, then it is fairly simple.

- $p(\mu, \sigma)$ is the probability that the parameters of the parent gaussian take the given values $\mu$ and $\sigma$. Notice that this is a probability independent of any data set. It should be set before knowing anything about our samples. This part is controversial and can cause problems that we will explore later.

- $p(\{x_i\}_{i=1,...,N})$: finally, this is the probability of getting the given data set for any arbitrary gaussian (any arbitrary set of parameters). This one is very hard to compute but we will see that we do not need to compute it to get samples that follow the right distribution. Once we can draw samples that follow a distribution, we can compute statistics on them.

In our case, the likelihood is not hard to compute. If we know that the parent distribution for each data point in the sample is the same and that $x_i \sim \mathcal{N}(\mu, \sigma)$ for all $i$, then the likelihood of getting a given value is:

$$ p(x_i|\mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left(\frac{-(x_i-\mu)^2}{2\sigma^2}\right)$$

If we also know that the samples were obtained independently, then we can use the definition of independence to write that:

$$ p(x_1, x_2, ..., x_N|\mu, \sigma) = \prod_{i=1}^{N} \frac{1}{\sigma \sqrt{2\pi}} \exp\left(\frac{-(x_i-\mu)^2}{2\sigma^2}\right)$$