# CS486 - Artificial Intelligence
## Lesson 21 - Probability

Today we will discuss probabilistic inference. Our goal is to learn something about *random variables* that we care about through the observation of some other set of variables. We'll explore these concepts using some AIMA helpers. 

In [2]:
import helpers
from aima.probability import *

## Distributions

A **random variable** is a variable whose possible values are outcomes of a random phenomenon. In other words their values aren't described by a function, but by a **probability distribution**. 

Consider the example of a friend who randomly brings an umbrella with her to work. Over two weeks, she brought her umbrella on 7 days. We cannot write a function that tells us when she'll have the umbrella, but we can write the distribution $P(Umbrella)$:

| Umbrella | P |
|---|---|
| `True` | 7 | 
| `False` | 7 |

A **normal distribution** is one whose values sum to 1. You can normalize a distibution by dividing each value by the sum of all the values. For the example above, we can normalize by dividing by 14. AIMA's `ProbDist` class can also normalize values: 

In [26]:
umbrella = ProbDist('Umbrella', {True: 7, False: 7})
umbrella.normalize()
umbrella.show_approx()

'False: 0.5, True: 0.5'

A **joint distribution** is a distribution that describes the probabilities of outcomes involving two or more variables. For example, consider the joint distribution of whether or not it is raining when our friend has her umbrella:

| Forecast | Rain | Umbrella | P |
|-------|-------|-------|---|
| True  | True  | True  | 0.25 | 
| True  | True  | False | 0.09 |
| True  | False | True  | 0.11 |
| True  | False | False | 0.08 |
| False | True  | True  | 0.07 |
| False | True  | False | 0.16 |
| False | False | True  | 0.04 |
| False | False | False | 0.20 |

So have the odds of our friend carrying her umbrella changed? Well, we can compute the **marginal distribution** for `Umbrella`. It is called *marginal* because if you were to do this by hand, you would find all the entries where the `Umbrella` values were the same and sum them *in the margins*. Mathematically, a marginal distribution is: 

$$P(X=x)=\sum\limits_y P(X=x,Y=y)$$ 

Given the joint distribution above, what is the marginal distribution for `Rain`? 

We can capture the joint distribution of `Rain` and `Umbrella` using AIMA's `JointProbDist` class: 

In [13]:
joint = JointProbDist(['Forecast','Rain','Umbrella'])
joint[dict(Forecast=True,  Rain=True,  Umbrella=True)]  = 0.25
joint[dict(Forecast=True,  Rain=True,  Umbrella=False)] = 0.09
joint[dict(Forecast=True,  Rain=False, Umbrella=True)]  = 0.14
joint[dict(Forecast=True,  Rain=False, Umbrella=False)] = 0.08
joint[dict(Forecast=False, Rain=True,  Umbrella=True)]  = 0.07
joint[dict(Forecast=False, Rain=True,  Umbrella=False)] = 0.16
joint[dict(Forecast=False, Rain=False, Umbrella=True)]  = 0.04
joint[dict(Forecast=False, Rain=False, Umbrella=False)] = 0.17

The joint distribution lets us answer questions like, "What are the chances that our friend has her umbrella when its raining?" That's just an entry in the table:

In [18]:
evidence = dict(Forecast=True,Rain=True)
hidden = ['Umbrella']
enumerate_joint(hidden, evidence, joint)

0.33999999999999997

In [25]:
query = 'Umbrella'
evidence = dict(Rain=True)
ans = enumerate_joint_ask(query, evidence, joint)
ans[True]

0.5614035087719298

Mathmatically, we wouyld wr

* Joint and marginal 
* Conditional distribution
* Product rule, chain rule, Bayes' Rule
* Inference
* Independence

| Rain | P |
|---|---|
| `True` | 2 | 
| `False` | 5 |

Noise. The degree to which our outcomes are uncertain. 
Random variables. Something about which we are uncertain. Capitalized. Domain is lower case. Binary values are generally +/-

Unobserved random variables have distributions. Distributions must have positive entires and sum to 1. 

Quiz question: Is this a valid distribution?

A joint distribution involves more than 1 variable. Same properties. 

The fundamental problem is that a joint distributions of $n$ variables with a domain of $d$ is $d^n$. This gets big fast, so what do we do?

A model is something that captures the joint distribution. One of the ways we can cheat is to say only certain variables interact, so we don't need to represent non-interacting variables in our model. 

Events. 

An event is an outcome. $P(E) = \sum_{(x_1...x_n)\in E} P(x_1...x_n)$
Typically we only care about partial assignments where only some variables are assigned. 

Marginal distributions *marginalize* or remove other varaibles from consideration by removing all of their values. So, a marginal distribution for $X$ would be 

$$P(X=x)=\sum\limits_y P(X=x,Y=y)$$ 

to remove the temperature variable, we would remove all the weather variables. 

## Conditional Probabilities

This is the most important thing because "if we're going to measure something, what does that tell us about the things we don't get to measure". 

$$p(a\mid{b}) =  \frac{P(a,b)}{P(b)}$$

## Conditional Distribution

## Normalization

To normalize, in probability, is to make sum to one by multiplying by a constant. 

## Probablistic Inference

There are some random variables that we care about, but we don't get to observe them. We get to observe some other set of variables. We want to infer something about the variables we care about from our observations. 

Update probabilities when there are new observations. 

### Inference by enumeration 

Build a joint distribution over all variables. 

There are evidence (observed variables), queries (unobserved variables), and hidden variables. Hidden variables are the ones we marginalize out when performing a query. Strategy: 

1. Fix the evidence variables (eliminate all entries that don't contain our evidence).
2. Marginalize any hidden variables. 
3. Normalize. 

This is $O(d^n)$ in both time and space. 

In math, this is: 

$$p(Q\mid{e_1...e_n}) =  \frac{1}{Z}P(Q\mid{...}$$

### Discrete vs Continuous Variables

All of our examples use discrete variables, but similar rules apply to continuous variables. 