# Probabilistic Methods in Machine Learning

This hands-on lab demonstrates core concepts in probabilistic machine learning:

1. **Bayesian Inference Basics** - Understanding prior beliefs, likelihood, and posterior distributions
2. **Overview of Probabilistic Graphical Models** - Introduction to Bayesian Networks and their applications

We'll use real-world datasets to illustrate these concepts and demonstrate practical applications.

## Setup and Imports

First, let's import the necessary libraries for our probabilistic methods demonstrations.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Machine learning libraries
from sklearn.datasets import load_breast_cancer, load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Set random seed for reproducibility
np.random.seed(42)

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Libraries imported successfully")

---
## Part 1: Bayesian Inference Basics

### What is Bayesian Inference?

Bayesian inference is a method of statistical inference where we update our beliefs about parameters based on observed data. The core principle is **Bayes' Theorem**:

$$P(\theta|D) = \frac{P(D|\theta) \times P(\theta)}{P(D)}$$

Where:
- **P(θ|D)** is the **posterior**: our updated belief about θ after seeing data D
- **P(D|θ)** is the **likelihood**: probability of observing data D given parameter θ
- **P(θ)** is the **prior**: our initial belief about θ before seeing data
- **P(D)** is the **evidence**: probability of observing the data (normalizing constant)

### Key Concepts:
1. **Prior Distribution**: Represents our initial beliefs before observing data
2. **Likelihood**: How probable the observed data is for different parameter values
3. **Posterior Distribution**: Our updated beliefs after incorporating the data
4. **Credible Intervals**: Bayesian analog of confidence intervals

### Example 1.1: Coin Flip Inference

**Scenario**: We want to determine if a coin is fair. We flip it 100 times and observe 60 heads.

**Question**: What is the probability that this coin shows heads?

We'll use Bayesian inference with a **Beta distribution** as our prior (conjugate prior for Binomial likelihood).

In [None]:
def bayesian_coin_flip(n_flips, n_heads, prior_alpha=1, prior_beta=1):
    """
    Perform Bayesian inference for coin flip probability.
    
    Parameters:
    -----------
    n_flips : int - Number of coin flips
    n_heads : int - Number of heads observed
    prior_alpha, prior_beta : int - Beta distribution parameters for prior
    
    Returns:
    --------
    dict : Posterior statistics and visualizations
    """
    n_tails = n_flips - n_heads
    
    # Posterior parameters (Beta distribution)
    post_alpha = prior_alpha + n_heads
    post_beta = prior_beta + n_tails
    
    # Posterior statistics
    post_mean = post_alpha / (post_alpha + post_beta)
    post_mode = (post_alpha - 1) / (post_alpha + post_beta - 2) if post_alpha > 1 and post_beta > 1 else None
    post_std = np.sqrt((post_alpha * post_beta) / 
                       ((post_alpha + post_beta)**2 * (post_alpha + post_beta + 1)))
    
    # 95% Credible interval
    credible_interval = stats.beta.interval(0.95, post_alpha, post_beta)
    
    return {
        'posterior_mean': post_mean,
        'posterior_mode': post_mode,
        'posterior_std': post_std,
        'credible_interval': credible_interval,
        'post_alpha': post_alpha,
        'post_beta': post_beta,
        'prior_alpha': prior_alpha,
        'prior_beta': prior_beta
    }

# Perform inference
n_flips = 100
n_heads = 60
result = bayesian_coin_flip(n_flips, n_heads)

print("=" * 60)
print("BAYESIAN COIN FLIP INFERENCE")
print("=" * 60)
print(f"\nObservations: {n_heads} heads out of {n_flips} flips")
print(f"\nPosterior Distribution: Beta({result['post_alpha']}, {result['post_beta']})")
print(f"  Mean: {result['posterior_mean']:.4f}")
print(f"  Std: {result['posterior_std']:.4f}")
print(f"  95% Credible Interval: [{result['credible_interval'][0]:.4f}, {result['credible_interval'][1]:.4f}]")
print(f"\nInterpretation:")
print(f"  We are 95% confident that the true probability of heads")
print(f"  lies between {result['credible_interval'][0]:.2%} and {result['credible_interval'][1]:.2%}")

In [None]:
# Visualize Prior, Likelihood, and Posterior
theta_values = np.linspace(0, 1, 1000)

# Prior distribution
prior = stats.beta.pdf(theta_values, result['prior_alpha'], result['prior_beta'])

# Likelihood (proportional to binomial)
likelihood = stats.binom.pmf(n_heads, n_flips, theta_values)
likelihood = likelihood / likelihood.max()  # Normalize for visualization

# Posterior distribution
posterior = stats.beta.pdf(theta_values, result['post_alpha'], result['post_beta'])

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Prior
axes[0].plot(theta_values, prior, 'b-', linewidth=2)
axes[0].fill_between(theta_values, prior, alpha=0.3)
axes[0].set_title('Prior Distribution\nBeta(1, 1) - Uniform', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Probability of Heads (θ)')
axes[0].set_ylabel('Density')
axes[0].grid(True, alpha=0.3)

# Likelihood
axes[1].plot(theta_values, likelihood, 'g-', linewidth=2)
axes[1].fill_between(theta_values, likelihood, alpha=0.3, color='green')
axes[1].set_title(f'Likelihood\n{n_heads} heads in {n_flips} flips', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Probability of Heads (θ)')
axes[1].set_ylabel('Normalized Likelihood')
axes[1].grid(True, alpha=0.3)

# Posterior
axes[2].plot(theta_values, posterior, 'r-', linewidth=2)
axes[2].fill_between(theta_values, posterior, alpha=0.3, color='red')
axes[2].axvline(result['posterior_mean'], color='darkred', linestyle='--', 
                linewidth=2, label=f"Mean: {result['posterior_mean']:.3f}")
axes[2].axvline(result['credible_interval'][0], color='orange', linestyle=':', 
                linewidth=1.5, label='95% CI')
axes[2].axvline(result['credible_interval'][1], color='orange', linestyle=':', linewidth=1.5)
axes[2].set_title(f'Posterior Distribution\nBeta({result["post_alpha"]}, {result["post_beta"]})', 
                  fontsize=12, fontweight='bold')
axes[2].set_xlabel('Probability of Heads (θ)')
axes[2].set_ylabel('Density')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 The posterior combines the prior belief with the observed data via the likelihood.")

### Example 1.2: Bayesian Updating - Sequential Learning

One powerful feature of Bayesian inference is **sequential updating**. As we collect more data, we can continuously update our beliefs.

Let's see how our posterior changes as we observe more coin flips.

In [None]:
# Sequential updating with different amounts of data
observations = [10, 20, 50, 100, 200]
heads_ratio = 0.6  # 60% heads

fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

theta_vals = np.linspace(0, 1, 1000)

for idx, n_obs in enumerate(observations):
    n_h = int(n_obs * heads_ratio)
    result_seq = bayesian_coin_flip(n_obs, n_h)
    
    # Calculate posterior
    post = stats.beta.pdf(theta_vals, result_seq['post_alpha'], result_seq['post_beta'])
    
    # Plot
    axes[idx].plot(theta_vals, post, 'b-', linewidth=2)
    axes[idx].fill_between(theta_vals, post, alpha=0.3)
    axes[idx].axvline(result_seq['posterior_mean'], color='red', linestyle='--', 
                      linewidth=2, label=f"Mean: {result_seq['posterior_mean']:.3f}")
    axes[idx].axvline(result_seq['credible_interval'][0], color='orange', 
                      linestyle=':', linewidth=1.5)
    axes[idx].axvline(result_seq['credible_interval'][1], color='orange', 
                      linestyle=':', linewidth=1.5)
    axes[idx].set_title(f'After {n_obs} flips ({n_h} heads)\nCI width: {result_seq["credible_interval"][1] - result_seq["credible_interval"][0]:.3f}', 
                       fontsize=11)
    axes[idx].set_xlabel('Probability of Heads (θ)')
    axes[idx].set_ylabel('Density')
    axes[idx].legend(fontsize=9)
    axes[idx].grid(True, alpha=0.3)
    axes[idx].set_ylim([0, None])

# Remove the extra subplot
fig.delaxes(axes[5])

plt.suptitle('Sequential Bayesian Updating - Posterior Becomes More Certain with More Data', 
             fontsize=14, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

print("\n🔍 Key Observation: As we collect more data, the posterior distribution becomes")
print("   narrower (more certain) and concentrates around the true parameter value.")

### Example 1.3: Bayesian Classification with Real Data

Now let's apply Bayesian methods to a real-world dataset: **Breast Cancer Wisconsin Dataset**.

We'll use a **Naive Bayes classifier**, which applies Bayes' theorem with the "naive" assumption that features are conditionally independent given the class.

**Dataset**: The breast cancer dataset contains features computed from digitized images of breast mass, and the task is to classify tumors as malignant or benign.

In [None]:
# Load the breast cancer dataset
cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target
feature_names = cancer_data.feature_names
target_names = cancer_data.target_names

print("=" * 60)
print("BREAST CANCER DATASET")
print("=" * 60)
print(f"\nDataset shape: {X.shape}")
print(f"Number of features: {X.shape[1]}")
print(f"Number of samples: {X.shape[0]}")
print(f"\nTarget classes: {target_names}")
print(f"Class distribution:")
print(f"  {target_names[0]}: {np.sum(y == 0)} ({np.sum(y == 0)/len(y)*100:.1f}%)")
print(f"  {target_names[1]}: {np.sum(y == 1)} ({np.sum(y == 1)/len(y)*100:.1f}%)")
print(f"\nFirst 5 features: {feature_names[:5].tolist()}")

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

# Train Naive Bayes classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Make predictions
y_pred = nb_model.predict(X_test)
y_pred_proba = nb_model.predict_proba(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")

print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test, y_pred, target_names=target_names))

In [None]:
# Visualize predictions with probability
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=target_names, yticklabels=target_names)
axes[0].set_title('Confusion Matrix', fontsize=12, fontweight='bold')
axes[0].set_ylabel('True Label')
axes[0].set_xlabel('Predicted Label')

# Prediction probabilities distribution
benign_proba = y_pred_proba[y_test == 1, 1]  # Probability of benign for benign samples
malignant_proba = y_pred_proba[y_test == 0, 1]  # Probability of benign for malignant samples

axes[1].hist(benign_proba, bins=30, alpha=0.6, label='Benign (True)', color='green', edgecolor='black')
axes[1].hist(malignant_proba, bins=30, alpha=0.6, label='Malignant (True)', color='red', edgecolor='black')
axes[1].axvline(0.5, color='black', linestyle='--', linewidth=2, label='Decision Threshold')
axes[1].set_xlabel('Predicted Probability of Benign')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Prediction Probability Distribution', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 The Naive Bayes classifier provides probability estimates, which represent")
print("   the model's confidence in its predictions. This is valuable for decision-making!")

---
## Part 2: Overview of Probabilistic Graphical Models (PGMs)

### What are Probabilistic Graphical Models?

Probabilistic Graphical Models (PGMs) are a powerful framework for representing and reasoning about complex probability distributions using graphs.

**Key Components:**
- **Nodes**: Represent random variables
- **Edges**: Represent probabilistic relationships (dependencies) between variables

### Types of PGMs:

1. **Bayesian Networks (Directed Graphs)**:
   - Nodes represent random variables
   - Directed edges represent conditional dependencies
   - Each node has a Conditional Probability Distribution (CPD)

2. **Markov Random Fields (Undirected Graphs)**:
   - Edges represent symmetric relationships
   - Used when directionality doesn't matter

### Why Use PGMs?
- **Compact Representation**: Express complex joint distributions efficiently
- **Modular**: Break down complex problems into manageable pieces
- **Inference**: Answer probabilistic queries (e.g., "What is P(Disease | Symptoms)?")
- **Learning**: Discover structure and parameters from data

### Example 2.1: Medical Diagnosis - A Simple Bayesian Network

Let's create a simple Bayesian Network for medical diagnosis:

```
      Disease
       /  \\
      /    \\
  Symptom1  Symptom2
```

- **Disease**: Binary (has disease or not)
- **Symptom1**: Binary (fever present or not)
- **Symptom2**: Binary (cough present or not)

We'll define the conditional probability tables (CPTs) and perform inference.

In [None]:
# Define a simple Bayesian Network manually
# Prior probability of disease
P_disease = 0.01  # 1% of population has the disease

# Conditional probabilities of symptoms given disease status
# P(Fever | Disease)
P_fever_given_disease = 0.9
P_fever_given_no_disease = 0.1

# P(Cough | Disease)
P_cough_given_disease = 0.8
P_cough_given_no_disease = 0.2

print("=" * 60)
print("BAYESIAN NETWORK: MEDICAL DIAGNOSIS")
print("=" * 60)
print("\nPrior Probabilities:")
print(f"  P(Disease = Yes) = {P_disease}")
print(f"  P(Disease = No) = {1 - P_disease}")
print("\nConditional Probabilities:")
print(f"  P(Fever = Yes | Disease = Yes) = {P_fever_given_disease}")
print(f"  P(Fever = Yes | Disease = No) = {P_fever_given_no_disease}")
print(f"  P(Cough = Yes | Disease = Yes) = {P_cough_given_disease}")
print(f"  P(Cough = Yes | Disease = No) = {P_cough_given_no_disease}")

In [None]:
def bayesian_network_inference(has_fever, has_cough):
    """
    Compute P(Disease | Symptoms) using Bayes' theorem.
    
    This demonstrates inference in a simple Bayesian Network.
    """
    # Calculate P(Symptoms | Disease)
    if has_fever and has_cough:
        P_symptoms_given_disease = P_fever_given_disease * P_cough_given_disease
        P_symptoms_given_no_disease = P_fever_given_no_disease * P_cough_given_no_disease
    elif has_fever and not has_cough:
        P_symptoms_given_disease = P_fever_given_disease * (1 - P_cough_given_disease)
        P_symptoms_given_no_disease = P_fever_given_no_disease * (1 - P_cough_given_no_disease)
    elif not has_fever and has_cough:
        P_symptoms_given_disease = (1 - P_fever_given_disease) * P_cough_given_disease
        P_symptoms_given_no_disease = (1 - P_fever_given_no_disease) * P_cough_given_no_disease
    else:
        P_symptoms_given_disease = (1 - P_fever_given_disease) * (1 - P_cough_given_disease)
        P_symptoms_given_no_disease = (1 - P_fever_given_no_disease) * (1 - P_cough_given_no_disease)
    
    # Calculate P(Symptoms) - Evidence
    P_symptoms = (P_symptoms_given_disease * P_disease + 
                  P_symptoms_given_no_disease * (1 - P_disease))
    
    # Apply Bayes' Theorem: P(Disease | Symptoms)
    P_disease_given_symptoms = (P_symptoms_given_disease * P_disease) / P_symptoms
    
    return P_disease_given_symptoms

# Test different symptom combinations
scenarios = [
    (False, False, "No symptoms"),
    (True, False, "Fever only"),
    (False, True, "Cough only"),
    (True, True, "Both fever and cough")
]

print("\n" + "=" * 60)
print("INFERENCE: P(Disease | Symptoms)")
print("=" * 60)

results = []
for fever, cough, description in scenarios:
    prob = bayesian_network_inference(fever, cough)
    results.append(prob)
    print(f"\n{description}:")
    print(f"  P(Disease | Symptoms) = {prob:.4f} ({prob*100:.2f}%)")
    print(f"  Interpretation: {prob/P_disease:.1f}x more likely than prior")

In [None]:
# Visualize the inference results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Bar plot of probabilities
scenario_names = [s[2] for s in scenarios]
colors = ['green' if r < 0.1 else 'orange' if r < 0.5 else 'red' for r in results]

bars = ax1.bar(scenario_names, results, color=colors, alpha=0.7, edgecolor='black')
ax1.axhline(y=P_disease, color='blue', linestyle='--', linewidth=2, label=f'Prior: {P_disease}')
ax1.set_ylabel('P(Disease | Symptoms)', fontsize=11)
ax1.set_title('Posterior Probability of Disease\nGiven Different Symptom Combinations', 
              fontsize=12, fontweight='bold')
ax1.set_ylim([0, max(results) * 1.2])
ax1.legend()
ax1.grid(True, alpha=0.3, axis='y')
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45, ha='right')

# Add value labels on bars
for bar, val in zip(bars, results):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'{val:.3f}',
             ha='center', va='bottom', fontweight='bold')

# Heatmap showing the network structure effect
fever_vals = [0, 1]
cough_vals = [0, 1]
prob_matrix = np.zeros((2, 2))

for i, fever in enumerate(fever_vals):
    for j, cough in enumerate(cough_vals):
        prob_matrix[i, j] = bayesian_network_inference(bool(fever), bool(cough))

sns.heatmap(prob_matrix, annot=True, fmt='.4f', cmap='YlOrRd', ax=ax2,
            xticklabels=['No Cough', 'Cough'], yticklabels=['No Fever', 'Fever'],
            cbar_kws={'label': 'P(Disease | Symptoms)'})
ax2.set_title('Probability Heatmap:\nJoint Effect of Symptoms', fontsize=12, fontweight='bold')
ax2.set_xlabel('Cough Status')
ax2.set_ylabel('Fever Status')

plt.tight_layout()
plt.show()

print("\n🔬 Key Insight: Having BOTH symptoms dramatically increases the probability")
print("   of disease compared to the prior or having just one symptom.")
print("   This demonstrates how Bayesian Networks combine evidence!")

### Example 2.2: Bayesian Network Application - Iris Classification

Let's apply PGM concepts to the famous **Iris dataset**. We'll use a Naive Bayes classifier, which is actually a simple Bayesian Network where:

```
                Species (Class)
                /   |   |   \\
               /    |   |    \\
    Sepal_Length  Sepal_Width  Petal_Length  Petal_Width
```

The "naive" assumption is that all features are conditionally independent given the class.

In [None]:
# Load Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names_iris = iris.feature_names
target_names_iris = iris.target_names

print("=" * 60)
print("IRIS DATASET - BAYESIAN NETWORK APPLICATION")
print("=" * 60)
print(f"\nDataset shape: {X_iris.shape}")
print(f"Features: {feature_names_iris}")
print(f"Classes: {target_names_iris.tolist()}")
print(f"\nClass distribution:")
for i, name in enumerate(target_names_iris):
    count = np.sum(y_iris == i)
    print(f"  {name}: {count} samples ({count/len(y_iris)*100:.1f}%)")

In [None]:
# Split and train
X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(
    X_iris, y_iris, test_size=0.3, random_state=42, stratify=y_iris
)

# Train Naive Bayes (Gaussian) - This is a Bayesian Network!
nb_iris = GaussianNB()
nb_iris.fit(X_train_iris, y_train_iris)

# Predictions
y_pred_iris = nb_iris.predict(X_test_iris)
y_pred_proba_iris = nb_iris.predict_proba(X_test_iris)

# Evaluate
accuracy_iris = accuracy_score(y_test_iris, y_pred_iris)

print(f"\nTest Accuracy: {accuracy_iris:.4f} ({accuracy_iris*100:.2f}%)")
print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(y_test_iris, y_pred_iris, target_names=target_names_iris))

In [None]:
# Visualize the learned parameters (means and variances)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

# Plot the learned Gaussian distributions for each feature
for feature_idx in range(4):
    ax = axes[feature_idx]
    
    # Get feature values range
    feature_range = np.linspace(X_iris[:, feature_idx].min(), 
                                X_iris[:, feature_idx].max(), 200)
    
    # Plot learned distributions for each class
    for class_idx in range(3):
        mean = nb_iris.theta_[class_idx, feature_idx]
        var = nb_iris.var_[class_idx, feature_idx]
        
        # Calculate Gaussian PDF
        pdf = stats.norm.pdf(feature_range, mean, np.sqrt(var))
        
        ax.plot(feature_range, pdf, linewidth=2, 
                label=f'{target_names_iris[class_idx]}')
    
    ax.set_xlabel(feature_names_iris[feature_idx], fontsize=10)
    ax.set_ylabel('Probability Density', fontsize=10)
    ax.set_title(f'Learned Distribution: {feature_names_iris[feature_idx]}', 
                 fontsize=11, fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.suptitle('Gaussian Naive Bayes: Learned Feature Distributions per Class', 
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

print("\n📈 These plots show the learned conditional distributions P(Feature | Class)")
print("   The model learns the mean and variance of each feature for each class.")

In [None]:
# Visualize prediction probabilities
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion Matrix
cm_iris = confusion_matrix(y_test_iris, y_pred_iris)
sns.heatmap(cm_iris, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=target_names_iris, yticklabels=target_names_iris)
axes[0].set_title('Confusion Matrix', fontsize=12, fontweight='bold')
axes[0].set_ylabel('True Label')
axes[0].set_xlabel('Predicted Label')

# Probability distribution for predictions
# Show max probability for each prediction
max_probs = y_pred_proba_iris.max(axis=1)
correct_predictions = y_pred_iris == y_test_iris

axes[1].hist(max_probs[correct_predictions], bins=20, alpha=0.6, 
             label='Correct Predictions', color='green', edgecolor='black')
axes[1].hist(max_probs[~correct_predictions], bins=20, alpha=0.6, 
             label='Incorrect Predictions', color='red', edgecolor='black')
axes[1].set_xlabel('Maximum Predicted Probability', fontsize=11)
axes[1].set_ylabel('Frequency', fontsize=11)
axes[1].set_title('Prediction Confidence Distribution', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"\n✅ Correct predictions tend to have higher probabilities (higher confidence)")
print(f"❌ Incorrect predictions often have lower probabilities (lower confidence)")
print(f"\nThis uncertainty quantification is a key advantage of probabilistic models!")

---
## Summary and Key Takeaways

### Bayesian Inference Basics:

1. **Bayes' Theorem** provides a principled way to update beliefs based on evidence
2. **Prior + Likelihood → Posterior**: We combine prior knowledge with observed data
3. **Sequential Updating**: Posteriors become more confident with more data
4. **Credible Intervals**: Provide probabilistic statements about parameters (95% CI means "95% probability the parameter is in this range")
5. **Practical Applications**: Medical diagnosis, A/B testing, classification, etc.

### Probabilistic Graphical Models:

1. **PGMs** represent complex probability distributions using graphs
2. **Nodes** = Random variables, **Edges** = Dependencies
3. **Bayesian Networks** use directed graphs to encode conditional dependencies
4. **Inference** allows us to answer probabilistic queries given evidence
5. **Naive Bayes** is a simple but powerful Bayesian Network for classification
6. **Real-world Applications**: 
   - Medical diagnosis (combining symptoms)
   - Spam filtering
   - Recommendation systems
   - Risk assessment

### Advantages of Probabilistic Methods:

✅ **Uncertainty Quantification**: Provides confidence in predictions  
✅ **Principled Framework**: Based on probability theory  
✅ **Incorporates Prior Knowledge**: Can use domain expertise  
✅ **Interpretable**: Clear probabilistic interpretation  
✅ **Handles Missing Data**: Natural framework for incomplete information  

### Next Steps:

- Explore more complex PGMs (Hidden Markov Models, Conditional Random Fields)
- Study Markov Chain Monte Carlo (MCMC) for complex inference
- Learn about variational inference for scalable Bayesian methods
- Apply to real-world problems in your domain

### Further Reading:

- "Pattern Recognition and Machine Learning" by Christopher Bishop
- "Probabilistic Graphical Models" by Daphne Koller and Nir Friedman
- "Bayesian Data Analysis" by Andrew Gelman et al.
- PyMC3 documentation for practical Bayesian modeling

---
## Exercises (Optional)

Try these exercises to deepen your understanding:

### Exercise 1: Different Priors
Modify the coin flip example to use different priors:
- Informative prior: Beta(5, 5) (belief that coin is fair)
- Skeptical prior: Beta(10, 2) (belief that coin favors heads)

How does the choice of prior affect the posterior?

### Exercise 2: Expand the Medical Diagnosis Network
Add a third symptom (e.g., headache) to the medical diagnosis Bayesian Network and perform inference.

### Exercise 3: Apply to Your Own Data
Choose a dataset relevant to your field and apply Naive Bayes classification. Analyze:
- Which features are most discriminative?
- How confident is the model in its predictions?
- What are common misclassifications?

### Exercise 4: A/B Testing
Implement Bayesian A/B testing for a scenario where:
- Variant A: 50 conversions out of 500 visitors
- Variant B: 65 conversions out of 500 visitors

What's the probability that B is better than A?