# CS486 - Artificial Intelligence
## Lesson 22 - Markov Models

Today we will discuss how to build a simple Markov model that approximates the probability distribution of a set of random variables. 

### Bayes Rule

Bayes' rule is considered to be the most important equation in AI:  

> The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise. 

$$P(x|y) = \frac{P(y|x)}{P(y)}P(x)$$



Example: Picking one out of the two coins at random would result in a 1/2 probability of having picked the fair coin. However, the question was, what is the probability of having picked the fair coin, GIVEN THAT the coin came up heads. The probability of having picked the fair coin is dependent on the evidence we have (it came up heads)

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('Zxm4Xxvzohk?rel=0&showinfo=0')

The best way to view: 

```
                 likelihood * prior
posterior = ------------------------------
                  marginal likelihood
```                  

The denominator is just a normalizing constant that ensures the posterior adds up to 1; it can be computed by summing up the numerator over all possible values of R. If you don't normalize, you can write Bayes' rule as:

$$ P(x\mid{y}) \propto P(y\mid{x})P(x) $$

## Independence

$$ X{\perp\!\!\!\perp}Y \iff P(X,Y) = P(X)P(Y) = \forall x,y P(x,y) = P(x)P(y) $$

Example: The probability distribution of two coin flips. Not usually true, but a simplifying *modeling assumption*. 

## Conditional Independence 

Our most basic and robust form of knowledge about uncertain environments. 

A variable can be **conditionally independent** of another presented some evidence. Example, cavity and toothache are dependent, but catch is conditionally independent of toothache given cavity. So

$$ P(Catch \mid{Toothache,Cavity}) = P(Catch \mid{Cavity}) $$



## Markov Models

Value of $X$ at a given time is a **state**. Okay, so we're going to take a bunch of variables and connect them linearly. We can really simplify the distribution if we assume that variables not directly connect are independent. Not usually true, but we're looking for an approximation of reality. So the assumption is:

$$ X_t {\perp\!\!\!\perp} X_1,..,X_{t-2}\mid{X_{t-1}} $$

So we can write our distribution as:

$$ P(X_1,X_2,...,X_T) = P(X_1)\prod_{t=2}^T P(X_t|X_{t-1})$$

Second assumption: Transition model doesn't change (stationary). In other words, $P(X_t\mid{X_{t-1}})$ doesn't change. 

Taking a step: Compute the marginal for the next time step. Use the table or graph, but just look up values. 

## Mini-forward algorithm

$$ P(x_1) = given$$
$$ P(x_t) = \sum_{x_{t-1}}P(x_t\mid{x_{t-1}})P(x_{t-1})$$

Iterating converges to the **stationary distribution**, $P_{\infty}$. 

PageRank is a stationary distribution of a Markov chain. 