# Gibbs Sampling
One MCMC technique suitable for inference in Bayesian models.

The underlying logic of MCMC sampling is that we can estimate any desired expectation by ergodic averages. That is, we can compute any statistic of a posterior distribution as long as we have $N$ simulated samples from that distribution:
$$
E[f(x)]_\mathcal{P} = \frac{1}{N}\sum_{i=1}^N f(s^{(i)}),
$$
where $\mathcal{P}$ is the posterior distribution of interest, $f(s)$ the desired expectation, and $f(s^{(i)})$ the $i^{\mathrm{th}}$ simulated sample from $\mathcal{P}$.

The idea in Gibbs sampling is to generate posterior samples by sweeping through each variable (or block of variables) to sample from its conditional distribution with the remaining variables fixed to their current values.

**Algorithm 1** Generic Gibbs sampler

Initialize $x^{(0)} \sim q(x)$, where $q(x)$ is the prior distribution<br>
**for** iteration $i=1, 2, \ldots$ **do**<br>
<p style="padding-left: 2em;">$x_1^{(i)} \sim p(X_1 = x_1 | X_2 = x_2^{(i-1)}, X_3=x_3^{(i-1)},\ldots,X_D = x_D^{(i-1)})$</p>
<p style="padding-left: 2em;">$x_2^{(i)} \sim p(X_2 = x_2 | X_1 = x_1^{(i)}, X_3=x_3^{(i-1)},\ldots,X_D = x_D^{(i-1)})$</p>
<p style="padding-left: 2em;">$\vdots$</p>
<p style="padding-left: 2em;">$x_D^{(i)} \sim p(X_D = x_D | X_1 = x_1^{(i)}, X_2=x_2^{(i)},\ldots,X_{D-1} = x_{D-1}^{(i)})$</p>

**end for**  

#### The theory of MCMC guarantees that the stationary distribution of the samples generated under Algorithm 1 is the target joint posterior we are interested in. 

## Example: Gibbs sampling for Bayesian linear regression
This example comes from the [blog article by Kieran Campbell](!https://kieranrcampbell.github.io/blog/2016/05/15/gibbs-sampling-bayesian-linear-regression.html). It's always to do it yourself to make sure you fully understand what's it all about.

### Bayesian linear regression
We are interested in Gibbs sampling for normal linear regression with one independent variable. We assume we have paired data $(y_i, x_i), i=1,\ldots,N$. We wish to find the posterior distribution of the coefficients $\beta_0$ (the intercept), $\beta_1$ (the gradient) and of the precision $\tau$, which is the reciprocal of the variance. The model can be written as

$$
y_i\sim \mathcal{N}(\beta_0 + \beta_1x_i, 1/\tau),
$$

or equivalently
$$
y_i = \beta_0 + \beta_1 x_i + \epsilon, \epsilon\sim\mathcal{N}(0, 1\tau).
$$

The likelihood for this model may be written as the product over $N$ iid observations
$$
L(y_1,\ldots,y_N,x_1,\ldots,x_N|\beta_0, \beta_1, \tau) = \prod_{i=1}^N\mathcal{N}(\beta_0 + \beta_1 x_i, 1/\tau).
$$

We also wish to place **conjugate priors** on $\beta_0, \beta_1$ and $\tau$, we've discussed the reasons in the article on Dirichlet distribution. For these we choose:<br>
$$\beta_0 \sim \mathcal{N}(\mu_0, 1/\tau_0)$$<br>
$$\beta_1 \sim \mathcal{N}(\mu_1, 1/\tau_1)$$<br>
$$\tau\sim \mathrm{Gamma}(\alpha, \beta)$$<br>

### Gibbs sampling
Suppose we have two parameters $\theta_1$ and $\theta_2$ and some data $x$. Our goal is to find the posterior distribution of 