Estimating mean and variance for a Gaussian distribution $N(\mu, \sigma^2)$

**Problem**: Given data $\mathcal{D} = \{(y_i)\}$, assuming that these data points were generated by a hidden Gaussian distribution $N(\mu, \sigma^2)$. Estimate its parameters.

**Solution**:

We use a noninformation prior $$p(\mu, \sigma) \propto (\sigma^2)^{-1}$$

Then $$p(\mu,\sigma^2 | y) \propto \sigma^{-n-2}exp(-\frac{1}{2\sigma^2}[(n-1)\sigma^2 + n(\bar{y} - \mu)^2])$$

where $$s^2  = \frac{1}{n-1}\sum_{i=1}^{n}(y_i - \bar{y})^2$$

is the sample variance of the data $\mathcal{D} = \{(y_i)\}$.

We also have $$\mu|\sigma^2,y \sim N(\bar{y}, \sigma^2/n)$$

and $$\sigma^2|y = Inv\text{-}\chi^2(n-1, \sigma^2)$$

Note that the scaled inverse chi square distribution is not implemented by scipy. So we need to implement ourself. Its formula is as follow:

$$Inv\text{-}\chi^2(\nu, s^2) = \frac{(\nu/2)^{\nu/2}}{\Gamma(\nu/2)}s^{\nu}\theta^{-(\nu/2 + 1)}exp(\frac{-\nu s^2}{2\theta})$$

This is the same as $Inv\text{-}gamma(\alpha=\frac{\nu}{2}, \beta = \frac{\nu}{2}s^2)$

scipy provided this function.

We are normally interested in the mean $\mu$. Hence, we want to compute the marginal posterior distribution of $\mu$: $$p(\mu|y) = \int_{0}^{\infty}p(\mu, \sigma^2|y)d\sigma^2$$

Note that we normally do not compute the above integration. But in this case, we do have a closed form for this integration. In case of no closed form availability, we perform sampling. 

The closed form is as follows $$\frac{\mu - \bar{y}}{s/\sqrt{n}}\big|y \sim t_{n-1}$$, where $t_{n-1}$ denotes the standard (Student) t density (location = 0, scale = 1), with n-1 degrees of freedom.

In [1]:
import numpy as np
from scipy import stats

%matplotlib inline
import matplotlib.pyplot as plt

In [4]:
# add utilities directory to path
import os, sys
util_path = os.path.abspath(os.path.join(os.path.curdir, 'utilities_and_data'))
if util_path not in sys.path and os.path.exists(util_path):
    sys.path.insert(0, util_path)

# import from utilities
import sinvchi2
import plot_tools

In [5]:
# edit default plot settings
plt.rc('font', size=12)
# apply custom background plotting style
plt.style.use(plot_tools.custom_styles['gray_background'])

In [6]:
# data
y = np.array([93, 112, 122, 135, 122, 150, 118, 90, 124, 114])
# sufficient statistics
n = len(y)
s2 = np.var(y, ddof=1)  # here ddof=1 is used to get the sample estimate
my = np.mean(y)

In [10]:
# set random number generator with seed
rng = np.random.RandomState(seed=0)

In [11]:
# factorize the joint posterior p(mu,sigma2|y) to p(sigma2|y)p(mu|sigma2,y)
# sample from the joint posterior using this factorization

# sample from p(sigma2|y)
sigma2 = sinvchi2.rvs(n-1, s2, size=1000, random_state=rng)
# sample from p(mu|sigma2,y)
# student-t dist is replaced by standard normal
mu = my + np.sqrt(sigma2/n)*rng.randn(*sigma2.shape)
# display sigma instead of sigma2
sigma = np.sqrt(sigma2)
# sample from the predictive distribution p(ynew|y)
# for each sample of (mu, sigma)
ynew = rng.randn(*mu.shape)*sigma + mu

In [14]:
np.random.RandomState?

In [17]:
*sigma2.shape

SyntaxError: can't use starred expression here (<ipython-input-17-77c23c2b5a56>, line 1)

In [19]:
rng.randn(*sigma2.shape).shape

(1000,)