University of Helsinki, Master's Programme in Mathematics and Statistics  
MAST32001 Computational Statistics, Autumn 2023  
Luigi Acerbi  
Based on notebook by Antti Honkela 

# Lecture 7: Bayesian inference using MCMC

Background reading: please see Chapter 7 of the "Course notes" available in Moodle.


## 1. Metropolis-Hastings sampling for the posterior distribution of a probabilistic model

This exercise will illustrate a very simple example of Bayesian modelling: estimating the mean $\mu$ of a normal distribution for some observed data $\mathcal{D} = (x_i)_{i=1}^n$ that are assumed to be independent given $\mu$.

We assume $p(x_i | \mu) = \mathcal{N}(x_i;\; \mu, \sigma^2)$, i.e. the mean of $x_i$ is $\mu$ and the variance $\sigma^2 = 2$ is assumed to be fixed.

Our goal is to compute $p(\mu | \mathcal{D})$. We can obtain this from Bayes rule by specifying the prior $p(\mu)$. This will yield
$$ p(\mu | \mathcal{D}) = \frac{p(\mathcal{D} | \mu) p(\mu)}{p(\mathcal{D})} = \frac{\prod_{i=1}^n p(x_i | \mu) p(\mu)}{p(\mathcal{D})}. $$

1. Assume $p(\mu) = \mathcal{N}(\mu;\; 0, \sigma_0^2).$ Implement a Metropolis-Hastings sampler to sample from $p(\mu | \mathcal{D})$ using a normal proposal and $\sigma_0^2 = 100$. Summarise your samples by computing their mean and standard deviation. Compare your result with the mean of $\mathcal{D}$. *Hint*: the numerator of the Bayes rule will give you the target distribution as a function of $\mu$. The Metropolis-Hastings sampler does not require the normalization constant.
2. Compare the samples you obtained with the exact analytical solution for $p(\mu | \mathcal{D})$ (Sec. 7.1 of the course notes).
3. Repeat the experiment with $\sigma_0^2 = 1$. Can you still match the exact solution?
4. Try the sampler with $p(\mu) = \mathrm{Laplace}(\mu;\; 0, b_0) = \frac{1}{2b_0} \exp(-|\mu|/b_0)$ with $b_0=10$. Summarise your samples by computing their mean and standard deviation and compare with the above cases. Note that in this case there is no easily available exact solution.

In [1]:
import numpy as np
import numpy.random as npr
import pandas as pd

data = pd.read_csv('https://raw.githubusercontent.com/lacerbi/compstats-files/main/data/toydata.txt', sep='\t', header=None)
data = data.values

## 2. Metropolis-Hastings sampling of constrained parameters using transformations

In this task we will develop a Metropolis-Hastings sampler for Bayesian inference of the variance of normally distributed data. This will demonstrate the use of MH sampling for Bayesian inference as well as using transformations to enforce the positivity of the variance parameter.

The probabilistic model is as follows:
$$ p(x_i | \sigma) = \mathcal{N}(x_i ;\; 0, \sigma^2) $$
$$ p(\sigma) = \mathrm{Exponential}(\sigma ;\; 1) = e^{-\sigma}. $$ 

In order to apply standard MH sampling with an unbounded proposal, we will apply a transformation to parametrise the model using $\sigma = g(\phi) = \exp(\phi)$ such that $\phi = g^{-1}(\sigma) = \log (\sigma)$. As $\sigma \in \mathbb{R}^+$, $\phi \in \mathbb{R}$.

We can derive the distribution over $\phi$ using the change of variables formula (see Sec. 7.3.3 of the course notes):
$$ p_\phi(\phi) = p_\sigma(\sigma) \left| \frac{\mathrm{d}g}{\mathrm{d}\phi} \right| $$

1. Express and plot the prior density $p(\phi)$ using $p(\sigma)$, when $\sigma = g(\phi) = \exp(\phi)$.
2. Check that your density is valid by evaluating $\int_{-\infty}^{\infty} p_\phi(\phi) \mathrm{d}\phi$.
3. Generate a data set of 10 points $x_i$ with $x_i \sim \mathcal{N}(0, 2^2)$ (zero-mean normal with variance 2^2).
4. Implement a MH sampler to sample $\phi$ when $\log \pi^*(\phi) = \sum_{i=1}^{10} \log p(x_i | \phi) + \log p(\phi)$ using $Q(\phi' ; \phi) = \mathcal{N}(\phi' ;\; \phi, 1)$ as the proposal.
5. Plot a histogram of the obtained samples for $\sigma$. (*Hint*: remember to transform $\phi$ back to $\sigma$.)

## 3. MCMC with an asymmetric proposal

In this exercise we will study MCMC sampling in a discrete space, the set of integers $\{1, 2, \dots, 21\}$.

The target distribution will be the uniform distribution over the points
$$p(n) = \begin{cases} 1/21 \text{ when } 1 \le n \le 21 \\ 0 \text{ otherwise}. \end{cases},$$
and the proposal is 
$$q(n'; n) = \begin{cases} 1/2 \text{ when } |n' - n| = 1 \\ 0 \text{ otherwise} \end{cases}$$

1. Implement the sampler and test it by running it for 100000 iterations.
2. As jumps from 1 to 0 and from 21 to 22 are always rejected, we could try optimising the sampler by using an alternative proposal
$$ q'(n'; n) = \begin{cases} 1 \text{ when } (n, n') \in \{(1, 2), (21, 20)\} \\ 1/2 \text{ when } 2 \le n \le 20 \wedge |n' - n| = 1 \\ 0 \text{ otherwise}. \end{cases}$$
Implement this proposal but do not include the $\frac{q(n;n')}{q(n';n)}$ term in the acceptance ratio. Which distribution is the sampler now sampling from?
3. Fix the sampler by using the full acceptance rule. Can you get it to sample from the uniform distribution?

*Hint*: Note that it is fine to use $-\infty$ as $\log 0$ (use `-np.inf`). This is going to be useful for code working with log-probabilities.

## 4. Gibbs sampling

Gibbs sampling is a popular special form of multivariate Metropolis-Hastings sampling where a vector of variables $\theta = (\theta_1, \dots, \theta_n)$ is updated cyclically one at a time $\theta_1, \theta_2, \dots, \theta_n, \theta_1, \theta_2, \dots$. For each update, $\theta_i$ is drawn from the conditional distribution $p(\theta_i | \theta_{\setminus i})$ given the other variables $\theta_{\setminus i}$ (i.e., all the other $\theta$s minus $\theta_i$). What makes Gibbs sampling a popular choice is that for some probabilistic models the conditional distribution is available analytically.

Implement the Gibbs sampler to draw samples from a 2-dimensional multivariate normal with zero mean $\mu = (0, 0)$ and covariance $\Sigma = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}$ as described at https://theclevermachine.wordpress.com/2012/11/05/mcmc-the-gibbs-sampler/

Plot the trace plots and normed histograms of your samples. Test your sampler with different values of $\rho$. What do you observe?