In [2]:
import matplotlib


# Bayesian Statistics
- Alternative statistics to what is thought in most schools
- Became more usefull with recent advances in math and computation
- Works well when data is limited
- Applications vary from simple models to state of art machine learning, rocket science

## Bayes Rule

### Formal Definition
$$P(\theta\mid X) = \frac{P(X\mid\theta)P(\theta)}{P(X)}$$

In [1]:
%%latex
### Formal Definition: Terminology

$$P(\theta\mid X) = \frac{P(X\mid\theta)P(\theta)}{P(X)}$$

|Probability      |   | Meaning                                    |  
|----------------:|---|:-------------------------------------------|  
|$P(\theta\mid X)$| = | Posterior                                  |     
|$P(\theta)$      | = | Prior                                      |  
|$P(X\mid\theta)$ | = | Conditional Probability (Lieklihood)       |  
|$P(X)$           | = | Marginal Probability(Normalizing Constant) |  

<IPython.core.display.Latex object>

### Formal Definition: Better Explanation

$$P(\theta\mid X) = \frac{P(X\mid\theta)P(\theta)}{P(X)}$$

$P(\theta\mid X)$ = probability that theory $\theta$ is correct **after** taking into account new data  
$P(X\mid\theta)$ = probability of seeing this data we just saw assuming that theory $\theta$ is correct  
$P(\theta)$ = probability that theory $\theta$ is correct **before** we examine our new data  
$P(X)$ = probability of seeing this data without assuming that theory $\theta$ is correct

## Cookie Problem (from Think Bayes)

There are two Bowl of cookies.  
Bowl 1 has 50% Chocolate Chip, 50% Sugar Cookies.  
Bowl 2 has 75% Chocolate Chip, 25% Sugar Cookies.

You take a Chocoalte Chip cookie out of one of the bowls, what are the odds that you took it out of Bowl 1.

## Cookie Problem (from Think Bayes) cont.  
$
\begin{aligned}
Conditional Probabilities && Priors && Marginal Probability \\
P(Chocolate \mid Bowl 1) = 50\%  && P(Bowl 1) = 50\% &&  P(Chocolate) = 62.5\% \\
P(Vanila \mid Bowl 1) = 50\% && P(Bowl 2) = 50\% &&  P(Vanila) = 37.5\% \\
P(Chocolate \mid Bowl 2) = 75\% \\
P(Vanila \mid Bowl 2) = 25\% \\
\end{aligned}
$

In [4]:
import random
from ipywidgets import interact, interactive

class CookieProblem:
    def __init__(self,bowl):
        self.bowl = bowl
        self.prior = { "Bowl1" : 0.5, "Bowl2" : 0.5 } 
        self.likelyhood = { 
            "Bowl1" : {"Chocolate" : 0.5 , "Vanilla" : 0.5 },
            "Bowl2" : {"Chocolate" : 0.75, "Vanilla" : 0.25}
        }
        
    def take_cookie(self): 
        if random.random() > self.likelyhood[self.bowl]['Chocolate']:
            self.cookie = 'Chocolate'
        else:
            self.cookie = 'Vanilla'
        return self.cookie
            
    def update_probabilities(self):
        normalizing_constant = 0
        for theory, likelyhood in self.likelyhood.items():
            normalizing_constant += self.prior[theory] * likelyhood[self.cookie]
            
        for theory, likelyhood in self.likelyhood.items():
            self.prior[theory] = self.prior[theory] * likelyhood[self.cookie] / normalizing_constant
        
        return self.prior

In [10]:
bayes = CookieProblem("Bowl2")
vanillas = 0
cookies = 100
for x in range(cookies):
    cookie = bayes.take_cookie()
    if cookie == 'Vanilla':
        vanillas += 1
    probabilities = bayes.update_probabilities()

print()
print({'cookie' : cookie, 'p': probabilities})


{'p': {'Bowl1': 1.0, 'Bowl2': 2.2279761980955244e-19}, 'cookie': 'Chocolate'}


In [190]:
random.random()

0.529650747881366

## Bayesian Regession
 - Regression model just like normal linear regession
 - Predictions and coefficients are not point estimates they are probability distributions

## Notable Models
- Naive Bayes
- Bayesian Network 
- Bayesian Deep Neural Network
- Kalman Filter
- Bayesian Structural Time Series
- Bayesian Model Averaging Ensamble

### Notable Models: Naive Bayes
- Simple model
- Scalable
- Frequently used as a baseline model

### Bayesian Network
- Describes conditional probabilities

### Bayesian Deek Neural Network
- Full power of newural network
- does not assume 

### Kalman Filter
 - __powerfull__ model for estimation
 - Improves estimates of not trustworthy models
 - Does not trust bad(noisy) data
 - Keeps on updating estimates even if we don't have new data
 - Trusted by NASA in putting a man on the moon (Apollo 11 guidance computer)

### Bayesian Strutural Time Series
 - __Kalman Filter__ that doesn't just improve estimates it updates the model
 - Estimates with multiple versions of the model simultaniously

### Bayesian Model Averaging Ensambles
 - Meta-model that leverages error estimates of multiple bayesian models
 - Per Tom Mitchell this will always be the best enambling technique
 
 Tom M. Mitchell, Machine Learning, 1997, pp. 175

## Next Steps
Free resources:
 - Think Bayes by Allen Downey (link)[http://greenteapress.com/wp/think-bayes/]
 - Bayesian Methods for Hackers (link)[https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers]

Concepts to learn/review:
- Go back to fundamentals(Probability theory)
- More types of probability distributions, Beta, Gamma, Possion, Multivariate Gagussian. etc.