# Markov Chain Properties

##### Keywords: markov chain, MCMC, detailed balance, metropolis,  stationarity, ergodicity, transition matrix,  metropolis-hastings, irreducible

In [1]:
%matplotlib inline
import numpy as np
from scipy import stats
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns



## Contents
{:.no_toc}
* 
{: toc}

## Markov chains

Markov Chains are the first example of a **stochastic process** we will see in this class.  The values in a Markov chain depend on the previous values (probabilistically), with the defining characteristic being that a given value is depenendent only on the immediate previous value.

This is certainly not IID (idenpendently and identically distributed) data, which is what we have been assuming so far and will generally assume in this course, unless specified otherwise...

**Definition**: A sequence of random variables taking values in a state space is called a Markov Chain if the probability of the next step only depends on the current state.

Using the notation of transition probabilities to define the probability of going from state $x$ to state $y$ as $T(y \vert x)$, we can write the "only the most recent state counts" criteria mathematically:

$$T(x_n \vert x_{n-1}, x_{n-1}..., x_1) = T(x_n \vert x_{n-1})$$

In the rental car example, we specified $T(y \vert x)$ for each of the nine from_state, to_state combinations via a transition matrix. Equivalently, we wrote down a probability distribution $P(Y|X=i)$ for each state i (each row of the transition matrix was a probability distribution).

## Some Jargon
We're going to define a list of properties that together force there to be a unique set of equilibrium probabilites over the states available to the markov process. That is, these conditions will assure that, like in the rental car example, we'll get the same long-run probabilites after lots of transitions, no mater how we set the starting vector.

### Homogeneous

A chain is homogeneous at step $t$ if the transition probabilities are independent of $t$. Thus the evolution of the Markov chain only depends on the previous state with a fixed transition matrix.

Inhomogeneous markov chains might use a cycling set of transition matrices.

In the rental car example we used the same transition matrix all the time, so it was homogeneous, and we'll continue to assume homogeneous unless stated otherwise.

### Irreducible
Irriducibility requires that every state is accessible in a finite number of steps from another state. That is, there are no absorbing states. In other words, one eventually gets everywhere in the chain.

The classic counterexample here is two completely disjoint sets of states. For example rental car locations on opposite sides of an ocean. Cars in the Beijing loctions may reach equilirium, and the DC locations may do the same, but no one is driving a car from one to the other. So, depending on initial conditions, there is an equilibrium where 70% of the cars are in Beijing, spread among locations there and 30% are spead around DC, AND there is an equilibirum where 80% of the cars are spread around DC, and so on.

### Recurrent
States visited repeatedly are recurrent: positive recurrent if time-to-return is bounded and null recurrent otherwise. Harris recurrent if all states are visited infinitely often as $t \to \infty$

If there are a finite number of states and Reccurent comes for free from Irreducible.

### Aperiodic
There are no deterministic loops. 

The classic counterexampl here is a cycle: all the stuff at state 1 goes to state 2 and so on, and all the stuff in state N goes to state 1. Although splitting the probability evenly over all nodes is an equilibrium, it's impossible to converge there. If we start with states 1 and 3 having probability 1/2 each, those 1/2s will just cycle around and never converge.

![](images/mchain2.png)

### Stationarity
Stationarity and 'stationary distribution' is the technical name for "the equilibrium distribution". It's a distribution in that each of the states of the chain is assigned some probability (e.g. probability of the ferrari being at that airport).

A stationary distribution doesn't change when multiplied by the transition matrix.

That is

$$sT = s$$

or, at the level of individual nodes, the total probability (or number of cars) at node j matches the total number that flow to j from all nodes (including flow from j to j). The left side is the total flow into j, the right side is the amount that s specified was at j originally.

$$\sum_i s_i T_{ij} = s_j$$

[Conceptually, everything that's at j will leave, some will come back if there's a j$\rightarrow$j transition, and some new things will come in. If the new total matches the old, j's total hasn't changed. If this is true for all j, the distribution is stationary]

In the case of a continuous state space we'd re-write the above as

$$\int s(x_i) T(x_{i+1} \vert x_i) dx_i = s(x_{i+1})$$


### Ergodicity
Ergodic markov chains have unique stationary distributions and will converge to them.

Aperiodic, irreducible, positive Harris recurrent  markov chains are ergodic, that is, in the limit of infinite (many) steps, the marginal distribution of the chain is the same. This means that if we take largely spaced about samples from a stationary markov chain, we can draw independent samples.

Further, we can do $E_f[g(x)]$ via a sample from the distribution:

$$\int g(x) f(x) dx  = \frac{1}{N} \sum_{t=B+1}^{B+N} g(x^*_t)$$

Here $x^*_t$ is whatever state the chain was in t steps after whatever initial configuration it had. Here B is called the burin (we give the chain time to forget its initial configuration). So we have this "ergodic" law of large numbers.

## Detailed Balance

Detailed balance is an overkill condition. The above definition for stationary boiled down to "at each node, inflow matches outflow". Detailed balance is "across each edge, inflow matches outflow". That is flow from node 3 to node 7 matches flow from node 7 to node 3.

$$s(x) T(y \vert x) = s(y) T(x \vert y)$$

If detailed balance holds for a particular distribution $s(x)$, then $s$ is stationary.

If to prove this, let y=x_j and sum both sides over $x$ 
$$\int s(x_i) T(x_j \vert x_i) dx_i = s(x_j) \int T(x_i \vert x_j) dx_i $$
$$\int s(x_i) T(x_{j} \vert x_i) dx_i = s(x_{j})$$

We've used the fact that T(x_i \vert x_j) is a probability distribution over $x_i$ and must integrate to 1 over the $x_i$

### Use
If we're building a transition matrix and can show that it obeys detailed balance with a distribution $s(x)$ we'll be certain that $s(x)$ is a stationary distribution. In fact, it's often easier to build something that follows detailed balance, since detailed balance is much more restrictive than building a properly convergent transition matrix in general.