[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wasim/Data-Science/blob/main/data-analyst-roadmap/05_statistics_for_data_analysis/06_bayesian_statistics_intro.ipynb)

# Bayesian Statistics Intro

Update beliefs with new data.

## Frequentist vs Bayesian
- **Frequentist:** Probability is long-run frequency
- **Bayesian:** Probability is degree of belief

## Bayes' Theorem
$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

sns.set_style('whitegrid')
np.random.seed(42)

## 1. Bayes' Theorem Example

**Problem:** Diagnostic Test
- 1% of population has disease (Prior)
- Test is 99% accurate (Sensitivity/Specificity)
- You test positive. What is probability you 
  have disease?

In [None]:
# Parameters
p_disease = 0.01
p_healthy = 1 - p_disease
p_pos_given_disease = 0.99
p_pos_given_healthy = 0.01

# Calculate Total Probability of Positive Test
p_pos = (
    p_pos_given_disease * p_disease + 
    p_pos_given_healthy * p_healthy
)

# Calculate Posterior P(Disease | Positive)
p_disease_given_pos = (
    p_pos_given_disease * p_disease
) / p_pos

print(f"Prior Probability: {p_disease:.1%}")
print(f"Posterior Probability: {p_disease_given_pos:.1%}")
print("\nCounter-intuitive: Even with positive test, "
      "chance is only ~50% because disease is rare!")

## 2. Beta Distribution (Conjugate Prior)
Model probability of success (e.g., coin flip, conversion).

In [None]:
# Prior: We think coin is fair (alpha=10, beta=10)
alpha_prior = 10
beta_prior = 10

# Data: 100 flips, 80 heads (biased!)
heads = 80
tails = 20

# Posterior parameters
alpha_post = alpha_prior + heads
beta_post = beta_prior + tails

# Visualize
x = np.linspace(0, 1, 100)
prior = stats.beta.pdf(x, alpha_prior, beta_prior)
likelihood = stats.beta.pdf(x, heads+1, tails+1) # Scaled
posterior = stats.beta.pdf(x, alpha_post, beta_post)

plt.figure(figsize=(10, 6))
plt.plot(x, prior, 'b--', label='Prior (Belief)')
plt.plot(x, posterior, 'r-', lw=3, label='Posterior (Updated)')
plt.title('Bayesian Updating with Beta Distribution')
plt.xlabel('Probability of Heads')
plt.legend()
plt.show()

## 3. Bayesian A/B Testing
Directly calculate probability that B > A.

In [None]:
# Setup
visitors_a = 1000; conv_a = 100
visitors_b = 1000; conv_b = 120

# Sample from posterior distributions
n_samples = 10000
samples_a = stats.beta.rvs(
    conv_a + 1, visitors_a - conv_a + 1, size=n_samples
)
samples_b = stats.beta.rvs(
    conv_b + 1, visitors_b - conv_b + 1, size=n_samples
)

# Calculate probability B > A
prob_b_better = (samples_b > samples_a).mean()

print(f"Probability B is better than A: {prob_b_better:.1%}")

if prob_b_better > 0.95:
    print("Winner: B (High confidence)")
else:
    print("Uncertain: Note enough evidence yet")

## Practice Exercise
Update belief about spam email probability.

In [None]:
# Prior: 20% of emails are spam
# Word "free" appears in 90% of spam, 10% of ham
# Email contains "free". What is p(spam)?
# Your code here