# Project: Stochastic Gradient Hamiltonian Monte Carlo

In this project we are going to implement standard HMC, HMC with MH, Naive SGHMC and SGHMC with Friction.

# standard HMC

### Basic settings

Suppose we want to sample from the posterior distribution: 
$$p(\theta|D) \propto exp(-U(\theta))$$ where $D = \{x_{1\:n}\}$
which are independent and $U$ is the potential energy fucntion: 
$$U = -\sum_{x_i \in D}\log p(x|\theta)- \log p(\theta)$$
We will sample from joint distribution $$\pi(\theta, r) \propto exp(-U(\theta)-\frac{1}{2}r^TM^{-1}r)$$
where r is the auxiliary momentum variables and M is the mass matrix. They define the kinetic energy together.
Then we discard r and keep $\theta$.

The Hamiltonian function is:
$$H(\theta, r)=U(\theta)+\frac{1}{2}r^TM^{-1}r$$
The Hamiltonian dynamics are:
$$d\theta = M^{-1}r dt\\ dr=-\triangledown U(\theta) dt$$

### Implement HMC

We use a univariate $\theta$ for illustration. Suppose $U(\theta)=-2\theta^2+\theta^4$. Based on $U(\theta)$, we need to define functions $\triangledown U, H$.

In [3]:
import numpy as np
import matplotlib.pyplot as plt

In [7]:
# need to be set manually
def U(theta):
    '''compute U'''
    
    return -2*theta**2 + theta**4

In [8]:
# need to be set manually
def grad_U(theta):
    '''compute gradient of U'''
    
    return -4*theta+4*theta**3

In [9]:
# need to be set manually
def set_M(theta):
    '''initialize M as identity with resonable dimensions based on theta, which is a np.array'''
    
    return np.eye(theta.shape[-1])

In [10]:
# need to be set manually
def H(theta, r, M):
    '''compute Hamiltonian function'''
    assert M.shape[0]==M.shape[1], 'M is not a square matrix'
    
    return U(theta) + 1/2* r.T @ np.linalg.inv(M) @ r

In [11]:
def std_HMC(theta0, epsilon, nmc, max_iteration):
    '''
    implement standard HMC
    theta0: np.array
    epsilon: float
    '''
    
    theta_post = [theta0]
    M = set_M(theta0)
    m = M.shape[0] # number of parameters
    
    i = 1
    while i < (nmc+1):
        r = ri = np.random.multivariate_normal(np.zeros(m), M).reshape(-1,1)
        
        theta= theta0
        
        #simulate discretization of Hamiltonian Dynamics
        r = r - epsilon * grad_U(theta)/2
        
        for j in range(max_iteration):
            theta = theta + epsilon * np.linalg.inv(M) @ r
            r = r - epsilon * grad_U(theta)
        
        # MH correction
        u = np.random.rand()
        ro = np.exp(H(theta, r, M) - H(theta_post[i-1], ri, M))
        if u < min(1,ro):
            theta_post.append(theta[0,0])
            i += 1
            
    return theta_post[1:]

In [12]:
x = std_HMC(np.array([1]), 0.01, 1000, 100)

# Stochastic Gradient HMC with Naive Approach

The Naive approach refers to the simple plug-in estimator of $\triangledown \tilde{U}(\theta) = -\frac{|D|}{|\tilde{D}|}\sum_{x \in \tilde{D}} \triangledown log p(x|\theta) - \triangledown log p(\theta)$ with minibatch $\tilde{D}$. $\triangledown \tilde{U}(\theta)$ is computationally easier, but then the resulting joint distribution $\pi(\theta, r)$ is not invariant.

The Hamiltonian Dynamics are:
$$d\theta = M^{-1}r dt\\ dr=-\triangledown U(\theta) dt + N(0,2B(\theta)) dt$$
where $B(\theta) = \frac{1}{2}\epsilon V(\theta)$

# need to find covariance of the stochastic gradient noise

Since $\epsilon$ is small, it does not really matter that V is. Thus we take V as identity.

### Implement Naive SGHMC

# SGHMC with Friction

Add friction term to momentum update:
$$
d\theta = M^{-1}r dt\\
dr = - \triangledown U(\theta)dt - BM^{-1}rdt+N(0,2Bdt)
$$

# new proposal for SGHMC with friction

In [15]:
import scipy.stats as sta

In [21]:
# set up target distribution; normal in this example
def p_x_theta(theta, x):
    '''
    density of target distribution
    theta = [mean,std]
    '''
    
    N = sta.norm(theta[0], theta[1])
    
    return N.pdf(x)

In [None]:
# set up prior for theta
def grad_prior_theta(theta):
    '''prior'''
    
    sta.invgamma
    ...
    

In [None]:
def grad_log_p_x_theta(D_, theta):
    
    ...
    return logp

In [None]:
# set up gradient of U with respect to batch D_
def grad_U(theta, D_, n):
    '''
    gradient of U(theta)
    n is the total number of sample
    D_ is a minibatch
    '''
    
    return -n/D_.shape[0]*grad_log_p_x_theta(D_, theta)-grad_prior_theta(theta)


In [None]:
# minibatch function with m batchs
def minib(x, m):
    '''
    create m minibatchs of x
    x: data
    m: number of minibatchs
    '''
    
    np.random.shuffle(x)
        
    return np.array_split(x, m)

In [1]:
def theta_update(theta, r, M, epsilon):
    '''update theta in HMC'''
    
    return theta + epsilon * np.linalg.inv(M) * r

In [2]:
def r_update(theta ,r ,M ,gU, epsilon):
    '''update r'''
    m = theta.shape[0]
    C = np.eyes(m) # take C as identiy 
    B = 1/2*epsilon*C # take B as from identiy V, since small epsilon will discount any effects
    
    return r - epsilon * gU - epsilon * C @ np.linalg.inv(M) * r + np.random.multivariate_normal(np.zeros(M.shape[0]), 2*epsilon*(C-B)).reshape(-1,1)

In [None]:
def SGHMC_friction(X, theta0, m, M, epsilon = 1e-6, nvm=10000):
    '''SGHMC with friction'''
    
    M = set_M(theta0)
    m = M.shape[0] # number of parameters
    theta_post = []
    V = np.eyes(m) # covariance matrix of parameters
    n = X.shape[0]
    
    for i in range(nvm):
        r = np.random.multivariate_normal(np.zeros(m), M).reshape(-1,1)
        theta = theta0
        
        # create minibatch
        b = minib(X,m)
        for i in range(m):
            gU = grad_U(theta, b[i], n) 
            
            theta = theta_update(theta, r, M, epsilon)
            r = r_update(theta ,r ,M ,gU, epsilon)
            
        theta_post.append(theta)
    
    
    return theta_post

## take simulated data from normal distribution as example

Simulate 1000 samples from N($\mu=10$,100). We want to sample $\mu$ from the posterior $p(\mu|D)$.

In [9]:
x = np.random.normal(10,10,1000)
x

array([ 20.29704212,  21.90377859,   9.57601283,  30.19524866,
        14.11045512,  -1.79493673,  12.22342556,  -1.11721191,
        12.43547765,   8.81828118,  12.06524645,  20.6070975 ,
         0.3034241 ,  -7.405586  ,  13.61739291,   4.55759025,
         6.63839261,  13.28482266,  -9.91611328,   6.14617503,
        29.77266808,  18.25810978,  19.06046964,  -4.7176121 ,
        14.93352019,  19.95076576,   9.66597537,  14.24875865,
        16.01997658,  25.55712736,   2.00237001,  21.343053  ,
        10.72666793,   3.56950283,   1.01577635,  10.75743257,
         8.04070335,  10.11851344,  -1.29268582,   9.93414019,
         0.76541706,   2.79875012,  25.98384785,   5.67147555,
         5.99845313,  13.08505065,   9.07359208,  15.83085938,
        -2.89556118,  21.15128032,  17.1281213 ,   1.3655421 ,
        10.37169846, -26.241042  ,  13.60151245,  25.40131969,
        13.10149505,  -4.82118948,  13.77092487,  21.44663668,
        13.89909725,  12.32491133,  21.82905323,  19.76

10.096799939104047