## Bayesian Probabilities 

$$p(w|D) = \frac{(p(D|w)p(w)}{p(D)}$$

Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. It contrasts with frequentist statistics, the other major framework for statistical inference.

**Prior**, $p(w)$ This is your belief about $w$ before seeing any data. This could be based on previous studies, domain knowledge, or it could be a non-informative prior if you have no idea.

**Likelihood**, $p(D|w)$ Given a particular value or distribution of $w$, this describes how likely the observed data,$D$, is.

**Prosterior**, $p(w|D)$ After seeing the data, this is your updated belief about $w$. This is what Bayesian inference seeks to compute.

**Evidence or Marginal Likelihood**. $p(D)$ This is the probability of observing the data under all possible values of $w$. It acts as a normalization constant to ensure the posterior probabilities sum (or integrate) to 1.

$$posterior \propto likelihood * prior$$

$$p(D) = \int{p(D|w)p(w)dw}$$
The equation means that to get the overall likelihood of the data, $D$, you integrate (sum over all possible values) the product of the likelihood and the prior.

One advantage of the Bayesian viewpoint is that the inclusion of prior knowledge arises naturally. Suppose, for instance, that a fair-looking coin is tossed three
times and lands heads each time. A classical maximum likelihood estimate of the probability of landing heads would give 1, implying that all future tosses will land
heads! By contrast, a Bayesian approach with any reasonable prior will lead to a
much less extreme conclusion.

Bayesian methods based on poor choices of prior can give poor results with high
confidence. Frequentist evaluation methods offer some protection from such problems, and techniques such as cross-validation remain useful in areas such as model
comparison.

### Example

You have a bag of 100 coins. You suspect that some of these coins might be biased to give more heads than a fair coin would. You randomly pick one coin from the bag and want to determine the probability $p(w)$ that this coin is biased based on some flips.

Let's assume 5% of coins picked at random is biased.

So let's have $p(w)$ as 0.05.

$p(D|)$ the probability of observing data **given** the coin is biased. 

We flip a coin 10 times and get 9 heads. By assuming a biased coin gives heads 90% of the time.  The likelyhood of observing this could be 0.39.

In [2]:
from scipy.stats import binom
# Biased coin probability from 10 flips and 9 heads.
n = 10
k = 9 
p = 0.9 
binom.pmf(k, n, p)

0.387420489

In [4]:
# Fair coin probability from 10 flips and 9 heads.
n = 10
k = 9 
p = 0.5
binom.pmf(k, n, p)

0.009765625000000009

The term $p(D)$ represents the probability of the observed data (in our case, 9 heads in 10 flips) given all possible scenarios or states of the world. (Biased and fair)

To compute $p(D)$, we calculate the likelihood of observing our data under each scenario and then average these, weighted by our prior beliefs about each scenario.

$$p(D) = p(D|biased) * p(biased) + p(D|not biased) * p(not biased)$$

$$p(D) = 0.39*0.05 + 0.01*0.95  = 0.029$$

In the context of our example,$p(D)$ tells us how probable it is to get 9 heads in 10 flips, considering both the possibility that the coin might be biased and the possibility that the coin might be fair.

In [6]:
(0.39*0.05) + (0.01*0.95)

0.029000000000000005

$$p(w|D) = \frac{0.39*0.05}{0.029} = 0.67$$

Given the evidence (9 heads in 10 flips), the updated probability that this coin is biased is around 67%. This is your posterior belief, updated from the 5% prior belief after observing the data.

