# Introduction to Stan, Hamiltonian Monte Carlo, & No-U-Turn Sampling

**AY128/256 UC Berkeley**

(4/1/2019)

## Introduction to
* Stan
* Hamiltonian Monte Carlo 
* No-U-Turn Monte Carlo (NUTS)

### Useful link:
https://chi-feng.github.io/mcmc-demo/

In [None]:
%run ../talktools.py

## What is Stan?
* Named after Stanislaw Ulam (1909 - 1984)
* co-invented Monte Carlo sampling methods
* also co-inventor of the hydrogen bomb
* it's a probabilistic programming language
* performs very efficient statistical infernece, among other things (e.g., sampling, optimization)
* developed in the statistics community (i.e., by people who spend lots of time thinking about sampling, etc.)
* written in C++, front ends in R, python, matlab, etc.

## Why are we using Stan?
* sampling in high dimensions is hard
* most samplers (e.g., M-H, emcee) that are good in low dimensions are really bad in high dimensions, esp. if there correlations
* really bad means they don't converage in finite amount of time
* or worse, they appear to converge even through they haven't
* Example: sampling from highly covariant 250-d normal distribution :
<img src="figures/mh_nuts_250d.png"></img>


* M-H & Gibbs (10$^6$ samples, thinned to 1000)
* NUTS: 1000 samples
* Stan has many samplers under the hood, but the one we're interested in is called NUTS
* NUTS = No-U-Turn Sampler, which is a class of Hamiltonian Monte Carlo (HMC) samplers
* NUTS and HMC are very efficient in high dimensions

# What is HMC?

* Suppose we have a N-d Gaussian, where one of the dimensions in log-probability looks like this:
<img src="figures/gauss_bowl.png"><img>
* Imagine the following physics exercise: flick a particle on the surface (frictionless, of course) and watch it move recording its position
* Sampling the particles position is like taking samples in log-probability space
* repeat with different particle
* "Hamiltonian" because we can write down the Hamiltonian (think physics) for each particle's motion
* Advantages: many fewer samples needed to map our target distribution than other samplers
* Disadvntages: many more calculations under the hood, need to compute gradients (i.e., where to go next), needs to know how many steps to run for: can turn around and re-explore the same space or can sampler doesn't move enough

# What is NUTS?

* As name suggests, NUTS is a type of HMC sampler that prevents the particle from taking a "U-Turn"
* "flick" a particule along the probability surface, and NUTS adaptively tries to figure out where it will turn around, stops it there, and sends it in a new direction
* it does this by simulating multiple paths for the particle
* It's costly, but not as costly as simulating a full u-turn and starting again
* NUTS works natively under the hood in Stan -- so you don't have to understand it to use it
* But if you're thinking of doing research or going into industry, good to have a grasp of theory behind sampling