# Summer of Code - Artificial Intelligence
## Week 03: Descriptive Statistics and Probability
### Day 02: Probability

In this notebook, we will explore the fundamental concepts of **probability** that form the backbone of machine learning and data science.

In [None]:
# Import necessary libraries
import warnings
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm, binom, poisson, uniform, expon

warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

# What is Probability?

**Probability** is a measure of the likelihood that an event will occur. It's expressed as a number between 0 and 1, where:
- **0** means the event will never occur (impossible)
- **1** means the event will always occur (certain)
- **0.5** means the event is equally likely to occur or not occur

*following are to be included in slides*
### Key Concepts

1. **Sample Space (S)**: The set of all possible outcomes of an experiment
2. **Event (E)**: A subset of the sample space
3. **Probability of an Event**: P(E) = Number of favorable outcomes / Total number of possible outcomes

### Probability Axioms (Kolmogorov Axioms)

1. **Non-negativity**: P(A) ≥ 0 for any event A
2. **Normalization**: P(S) = 1 (probability of sample space is 1)
3. **Additivity**: For mutually exclusive events A and B, P(A ∪ B) = P(A) + P(B)

Let's explore these concepts with practical examples!


In [None]:
# Example 1: Rolling a Fair Die
print("=== Example 1: Rolling a Fair Die ===")
print()

# Sample space for rolling a die
sample_space = [1, 2, 3, 4, 5, 6]
print(f"Sample Space: {sample_space}")
print(f"Total number of outcomes: {len(sample_space)}")
print()

# Probability of rolling an even number
even_numbers = [2, 4, 6]
prob_even = len(even_numbers) / len(sample_space)
print(f"Even numbers: {even_numbers}")
print(f"P(Even number) = {len(even_numbers)}/{len(sample_space)} = {prob_even:.2f}")
print()

# Probability of rolling a number greater than 4
greater_than_4 = [5, 6]
prob_greater_than_4 = len(greater_than_4) / len(sample_space)
print(f"Numbers > 4: {greater_than_4}")
print(f"P(Number > 4) = {len(greater_than_4)}/{len(sample_space)} = {prob_greater_than_4:.2f}")
print()

# Example 2: Coin Toss
print("=== Example 2: Coin Toss ===")
print()

# Sample space for coin toss
coin_sample_space = ['H', 'T']  # H = Heads, T = Tails
print(f"Sample Space: {coin_sample_space}")
print("P(Heads) = 1/2 = 0.5")
print("P(Tails) = 1/2 = 0.5")
print("P(Heads) + P(Tails) = 0.5 + 0.5 = 1.0 ✓")


# Joint, Marginal, and Conditional Probability

## Joint Probability

**Joint Probability** $\text{P(A}\cap \text{B)}$ is the probability that both events A and B occur simultaneously.

**Formula**: $\text{P(A}\cap \text{B) = P(A and B)}$

## Marginal Probability

**Marginal Probability** is the probability of a single event occurring, regardless of other events. It's obtained by summing the joint probabilities across all possible values of the other variable.

**Formula**: $\text{P(A)} = \sum \text{P(A} \cap \text{B) for all B}$


## Conditional Probability

**Conditional Probability** $\text{P(A}\vert\text{B)}$ is the probability of event A occurring given that event B has already occurred.

**Formula**: $\text{P(A}\vert\text{B)} = \frac{\text{P(A}\cap\text{B)}}{\text{P(B)}}$

### Independence

Two events A and B are **independent** if:
P(A ∩ B) = P(A) × P(B)

This means: P(A|B) = P(A) and P(B|A) = P(B)

Let's explore these concepts with a practical example!


In [None]:
# Example: Student Survey Data
print("=== Example: Student Survey Data ===")
print()

# Let's create a contingency table for student preferences
# Rows: Gender (M/F), Columns: Subject Preference (Math/Science/Arts)

# Joint probabilities (given data)
joint_probs = {
    ("Male", "Math"): 0.15,
    ("Male", "Science"): 0.20,
    ("Male", "Arts"): 0.10,
    ("Female", "Math"): 0.10,
    ("Female", "Science"): 0.25,
    ("Female", "Arts"): 0.20,
}

print("Joint Probability Table:")
print("=" * 50)
print(f"{'Gender':<8} {'Math':<8} {'Science':<10} {'Arts':<8}")
print("-" * 50)
print(
    f"{'Male':<8} {joint_probs[('Male', 'Math')]:<8.2f} {joint_probs[('Male', 'Science')]:<10.2f} {joint_probs[('Male', 'Arts')]:<8.2f}"
)
print(
    f"{'Female':<8} {joint_probs[('Female', 'Math')]:<8.2f} {joint_probs[('Female', 'Science')]:<10.2f} {joint_probs[('Female', 'Arts')]:<8.2f}"
)
print()

# Calculate marginal probabilities
print("Marginal Probabilities:")
print("=" * 30)

# Marginal probabilities for Gender
P_Male = (
    joint_probs[("Male", "Math")]
    + joint_probs[("Male", "Science")]
    + joint_probs[("Male", "Arts")]
)
P_Female = (
    joint_probs[("Female", "Math")]
    + joint_probs[("Female", "Science")]
    + joint_probs[("Female", "Arts")]
)

print(
    f"P(Male) = {joint_probs[('Male', 'Math')]:.2f} + {joint_probs[('Male', 'Science')]:.2f} + {joint_probs[('Male', 'Arts')]:.2f} = {P_Male:.2f}"
)
print(
    f"P(Female) = {joint_probs[('Female', 'Math')]:.2f} + {joint_probs[('Female', 'Science')]:.2f} + {joint_probs[('Female', 'Arts')]:.2f} = {P_Female:.2f}"
)
print()

# Marginal probabilities for Subject
P_Math = joint_probs[("Male", "Math")] + joint_probs[("Female", "Math")]
P_Science = joint_probs[("Male", "Science")] + joint_probs[("Female", "Science")]
P_Arts = joint_probs[("Male", "Arts")] + joint_probs[("Female", "Arts")]

print(
    f"P(Math) = {joint_probs[('Male', 'Math')]:.2f} + {joint_probs[('Female', 'Math')]:.2f} = {P_Math:.2f}"
)
print(
    f"P(Science) = {joint_probs[('Male', 'Science')]:.2f} + {joint_probs[('Female', 'Science')]:.2f} = {P_Science:.2f}"
)
print(
    f"P(Arts) = {joint_probs[('Male', 'Arts')]:.2f} + {joint_probs[('Female', 'Arts')]:.2f} = {P_Arts:.2f}"
)
print()

# Verify that probabilities sum to 1
total_prob = P_Male + P_Female
print(
    f"Verification: P(Male) + P(Female) = {P_Male:.2f} + {P_Female:.2f} = {total_prob:.2f} ✓"
)
print()

# Calculate conditional probabilities
print("Conditional Probabilities:")
print("=" * 30)

# P(Math|Male) = P(Math ∩ Male) / P(Male)
P_Math_given_Male = joint_probs[("Male", "Math")] / P_Male
print(
    f"P(Math|Male) = P(Math ∩ Male) / P(Male) = {joint_probs[('Male', 'Math')]:.2f} / {P_Male:.2f} = {P_Math_given_Male:.2f}"
)

# P(Science|Female) = P(Science ∩ Female) / P(Female)
P_Science_given_Female = joint_probs[("Female", "Science")] / P_Female
print(
    f"P(Science|Female) = P(Science ∩ Female) / P(Female) = {joint_probs[('Female', 'Science')]:.2f} / {P_Female:.2f} = {P_Science_given_Female:.2f}"
)

# P(Female|Arts) = P(Female ∩ Arts) / P(Arts)
P_Female_given_Arts = joint_probs[("Female", "Arts")] / P_Arts
print(
    f"P(Female|Arts) = P(Female ∩ Arts) / P(Arts) = {joint_probs[('Female', 'Arts')]:.2f} / {P_Arts:.2f} = {P_Female_given_Arts:.2f}"
)


# Probability Distributions

A **probability distribution** describes how probabilities are distributed over the values of a random variable. It provides a complete description of the probability structure of a random phenomenon.


*To be included in the presentation*
### Key Components

1. **Random Variable (X)**: A variable whose possible values are outcomes of a random phenomenon
2. **Probability Mass Function (PMF)**: For discrete variables, gives P(X = x)
3. **Probability Density Function (PDF)**: For continuous variables, gives the density at point x
4. **Cumulative Distribution Function (CDF)**: Gives P(X ≤ x)


## Discrete Probability Distributions

### Binomial Distribution

The **Binomial Distribution** models the number of successes in n independent Bernoulli trials.

**Parameters**:
- n: number of trials
- p: probability of success in each trial

**Example**: Flipping a coin 10 times, counting heads


In [None]:
# Binomial Distribution Example
print("=== Binomial Distribution ===")
print()

# Parameters
n = 10  # number of trials
p = 0.5  # probability of success (fair coin)

# Generate binomial distribution
k_values = np.arange(0, n + 1)
binomial_probs = binom.pmf(k_values, n, p)

print(f"Parameters: n = {n}, p = {p}")
print(f"Mean: μ = n×p = {n}×{p} = {n*p}")
print(f"Variance: σ² = n×p×(1-p) = {n}×{p}×{1-p} = {n*p*(1-p):.2f}")
print(f"Standard Deviation: σ = √(n×p×(1-p)) = {np.sqrt(n*p*(1-p)):.2f}")
print()

# Display probabilities
print("Probability Mass Function:")
print("=" * 40)
for k, prob in zip(k_values, binomial_probs):
    print(f"P(X = {k:2d}) = {prob:.4f}")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# PMF Plot
ax1.bar(k_values, binomial_probs, alpha=0.7, color='skyblue', edgecolor='navy')
ax1.set_xlabel('Number of Successes (k)')
ax1.set_ylabel('Probability P(X = k)')
ax1.set_title('Binomial Distribution PMF\n(n=10, p=0.5)')
ax1.grid(True, alpha=0.3)

# CDF Plot
cdf_values = binom.cdf(k_values, n, p)
ax2.step(k_values, cdf_values, where='post', linewidth=2, color='red')
ax2.set_xlabel('Number of Successes (k)')
ax2.set_ylabel('Cumulative Probability P(X ≤ k)')
ax2.set_title('Binomial Distribution CDF\n(n=10, p=0.5)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Practical example
print("\n=== Practical Example ===")
print("What's the probability of getting exactly 5 heads in 10 coin flips?")
prob_5_heads = binom.pmf(5, n, p)
print(f"P(X = 5) = {prob_5_heads:.4f} = {prob_5_heads*100:.2f}%")

print("\nWhat's the probability of getting at most 3 heads?")
prob_at_most_3 = binom.cdf(3, n, p)
print(f"P(X ≤ 3) = {prob_at_most_3:.4f} = {prob_at_most_3*100:.2f}%")


### Poisson Distribution

The **Poisson Distribution** models the number of events occurring in a fixed interval of time or space.

**Parameters**:
- λ (lambda): average rate of occurrence

**PMF**: P(X = k) = (λ^k × e^(-λ)) / k!

**Example**: Number of emails received per hour


In [None]:
# Poisson Distribution Example
print("=== Poisson Distribution ===")
print()

# Parameters
lambda_param = 3  # average rate (e.g., 3 emails per hour)

# Generate Poisson distribution
k_values = np.arange(0, 15)  # 0 to 14 events
poisson_probs = poisson.pmf(k_values, lambda_param)

print(f"Parameter: λ = {lambda_param}")
print(f"Mean: μ = λ = {lambda_param}")
print(f"Variance: σ² = λ = {lambda_param}")
print(f"Standard Deviation: σ = √λ = {np.sqrt(lambda_param):.2f}")
print()

# Display probabilities
print("Probability Mass Function:")
print("=" * 40)
for k, prob in zip(k_values[:8], poisson_probs[:8]):  # Show first 8
    print(f"P(X = {k:2d}) = {prob:.4f}")
print("...")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# PMF Plot
ax1.bar(k_values, poisson_probs, alpha=0.7, color='lightcoral', edgecolor='darkred')
ax1.set_xlabel('Number of Events (k)')
ax1.set_ylabel('Probability P(X = k)')
ax1.set_title(f'Poisson Distribution PMF\n(λ = {lambda_param})')
ax1.grid(True, alpha=0.3)

# CDF Plot
cdf_values = poisson.cdf(k_values, lambda_param)
ax2.step(k_values, cdf_values, where='post', linewidth=2, color='purple')
ax2.set_xlabel('Number of Events (k)')
ax2.set_ylabel('Cumulative Probability P(X ≤ k)')
ax2.set_title(f'Poisson Distribution CDF\n(λ = {lambda_param})')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Practical example
print("\n=== Practical Example ===")
print("What's the probability of receiving exactly 2 emails in an hour?")
prob_2_emails = poisson.pmf(2, lambda_param)
print(f"P(X = 2) = {prob_2_emails:.4f} = {prob_2_emails*100:.2f}%")

print("\nWhat's the probability of receiving at most 1 email?")
prob_at_most_1 = poisson.cdf(1, lambda_param)
print(f"P(X ≤ 1) = {prob_at_most_1:.4f} = {prob_at_most_1*100:.2f}%")

print("\nWhat's the probability of receiving more than 5 emails?")
prob_more_than_5 = 1 - poisson.cdf(5, lambda_param)
print(f"P(X > 5) = 1 - P(X ≤ 5) = 1 - {poisson.cdf(5, lambda_param):.4f} = {prob_more_than_5:.4f}")


## 5. Continuous Probability Distributions

### Normal Distribution (Gaussian Distribution)

The **Normal Distribution** is the most important continuous distribution, characterized by its bell-shaped curve.

**Parameters**:
- μ (mu): mean
- σ (sigma): standard deviation

**PDF**: f(x) = (1/σ√(2π)) × e^(-½((x-μ)/σ)²)

**Properties**:
- Symmetric about the mean
- 68-95-99.7 rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ


In [None]:
# Normal Distribution Example
print("=== Normal Distribution ===")
print()

# Parameters
mu = 0      # mean
sigma = 1   # standard deviation

# Generate normal distribution
x = np.linspace(-4, 4, 1000)
pdf_values = norm.pdf(x, mu, sigma)
cdf_values = norm.cdf(x, mu, sigma)

print(f"Parameters: μ = {mu}, σ = {sigma}")
print(f"Mean: μ = {mu}")
print(f"Variance: σ² = {sigma**2}")
print(f"Standard Deviation: σ = {sigma}")
print()

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# PDF Plot
ax1.plot(x, pdf_values, 'b-', linewidth=2, label='PDF')
ax1.fill_between(x, pdf_values, alpha=0.3, color='blue')
ax1.axvline(mu, color='red', linestyle='--', label=f'Mean (μ = {mu})')
ax1.set_xlabel('x')
ax1.set_ylabel('Probability Density f(x)')
ax1.set_title(f'Normal Distribution PDF\n(μ = {mu}, σ = {sigma})')
ax1.legend()
ax1.grid(True, alpha=0.3)

# CDF Plot
ax2.plot(x, cdf_values, 'r-', linewidth=2, label='CDF')
ax2.axvline(mu, color='red', linestyle='--', label=f'Mean (μ = {mu})')
ax2.set_xlabel('x')
ax2.set_ylabel('Cumulative Probability F(x)')
ax2.set_title(f'Normal Distribution CDF\n(μ = {mu}, σ = {sigma})')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 68-95-99.7 Rule demonstration
print("=== 68-95-99.7 Rule ===")
print()

# Calculate probabilities for different ranges
prob_1_sigma = norm.cdf(mu + sigma, mu, sigma) - norm.cdf(mu - sigma, mu, sigma)
prob_2_sigma = norm.cdf(mu + 2*sigma, mu, sigma) - norm.cdf(mu - 2*sigma, mu, sigma)
prob_3_sigma = norm.cdf(mu + 3*sigma, mu, sigma) - norm.cdf(mu - 3*sigma, mu, sigma)

print(f"P(μ - σ ≤ X ≤ μ + σ) = P({mu-sigma:.1f} ≤ X ≤ {mu+sigma:.1f}) = {prob_1_sigma:.3f} ≈ 68%")
print(f"P(μ - 2σ ≤ X ≤ μ + 2σ) = P({mu-2*sigma:.1f} ≤ X ≤ {mu+2*sigma:.1f}) = {prob_2_sigma:.3f} ≈ 95%")
print(f"P(μ - 3σ ≤ X ≤ μ + 3σ) = P({mu-3*sigma:.1f} ≤ X ≤ {mu+3*sigma:.1f}) = {prob_3_sigma:.3f} ≈ 99.7%")
print()

# Practical examples
print("=== Practical Examples ===")
print("What's the probability that X ≤ 1?")
prob_less_than_1 = norm.cdf(1, mu, sigma)
print(f"P(X ≤ 1) = {prob_less_than_1:.4f} = {prob_less_than_1*100:.2f}%")

print("\nWhat's the probability that X > 2?")
prob_greater_than_2 = 1 - norm.cdf(2, mu, sigma)
print(f"P(X > 2) = 1 - P(X ≤ 2) = 1 - {norm.cdf(2, mu, sigma):.4f} = {prob_greater_than_2:.4f}")

print("\nWhat's the probability that -1 ≤ X ≤ 1?")
prob_between = norm.cdf(1, mu, sigma) - norm.cdf(-1, mu, sigma)
print(f"P(-1 ≤ X ≤ 1) = P(X ≤ 1) - P(X ≤ -1) = {norm.cdf(1, mu, sigma):.4f} - {norm.cdf(-1, mu, sigma):.4f} = {prob_between:.4f}")


### Uniform Distribution

The **Uniform Distribution** has equal probability density over a specified interval.

**Parameters**:
- a: lower bound
- b: upper bound

**PDF**: f(x) = 1/(b-a) for a ≤ x ≤ b, 0 otherwise

**Example**: Random number generation


In [None]:
# Uniform Distribution Example
print("=== Uniform Distribution ===")
print()

# Parameters
a = 0  # lower bound
b = 10  # upper bound

# Generate uniform distribution
x = np.linspace(-2, 12, 1000)
pdf_values = uniform.pdf(x, a, b-a)
cdf_values = uniform.cdf(x, a, b-a)

print(f"Parameters: a = {a}, b = {b}")
print(f"Mean: μ = (a + b)/2 = ({a} + {b})/2 = {(a+b)/2}")
print(f"Variance: σ² = (b-a)²/12 = ({b}-{a})²/12 = {((b-a)**2)/12:.2f}")
print(f"Standard Deviation: σ = (b-a)/√12 = {((b-a)/np.sqrt(12)):.2f}")
print()

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# PDF Plot
ax1.plot(x, pdf_values, 'g-', linewidth=2, label='PDF')
ax1.fill_between(x, pdf_values, alpha=0.3, color='green')
ax1.axvline(a, color='red', linestyle='--', label=f'Lower bound (a = {a})')
ax1.axvline(b, color='red', linestyle='--', label=f'Upper bound (b = {b})')
ax1.set_xlabel('x')
ax1.set_ylabel('Probability Density f(x)')
ax1.set_title(f'Uniform Distribution PDF\n(a = {a}, b = {b})')
ax1.legend()
ax1.grid(True, alpha=0.3)

# CDF Plot
ax2.plot(x, cdf_values, 'orange', linewidth=2, label='CDF')
ax2.axvline(a, color='red', linestyle='--', label=f'Lower bound (a = {a})')
ax2.axvline(b, color='red', linestyle='--', label=f'Upper bound (b = {b})')
ax2.set_xlabel('x')
ax2.set_ylabel('Cumulative Probability F(x)')
ax2.set_title(f'Uniform Distribution CDF\n(a = {a}, b = {b})')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Practical examples
print("=== Practical Examples ===")
print("What's the probability that X ≤ 3?")
prob_less_than_3 = uniform.cdf(3, a, b-a)
print(f"P(X ≤ 3) = {prob_less_than_3:.4f} = {prob_less_than_3*100:.2f}%")

print("\nWhat's the probability that 2 ≤ X ≤ 8?")
prob_between = uniform.cdf(8, a, b-a) - uniform.cdf(2, a, b-a)
print(f"P(2 ≤ X ≤ 8) = P(X ≤ 8) - P(X ≤ 2) = {uniform.cdf(8, a, b-a):.4f} - {uniform.cdf(2, a, b-a):.4f} = {prob_between:.4f}")

print("\nWhat's the probability density at x = 5?")
pdf_at_5 = uniform.pdf(5, a, b-a)
print(f"f(5) = {pdf_at_5:.4f}")

# Compare different distributions
print("\n=== Distribution Comparison ===")
print("Let's compare Normal, Uniform, and Exponential distributions:")
print()

# Generate data for comparison
x_norm = np.linspace(-3, 8, 1000)
x_unif = np.linspace(-1, 11, 1000)
x_exp = np.linspace(0, 8, 1000)

pdf_norm = norm.pdf(x_norm, 2, 1)
pdf_unif = uniform.pdf(x_unif, 0, 10)
pdf_exp = expon.pdf(x_exp, 0, 2)

plt.figure(figsize=(12, 6))
plt.plot(x_norm, pdf_norm, 'b-', linewidth=2, label='Normal (μ=2, σ=1)')
plt.plot(x_unif, pdf_unif, 'g-', linewidth=2, label='Uniform (a=0, b=10)')
plt.plot(x_exp, pdf_exp, 'r-', linewidth=2, label='Exponential (λ=0.5)')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Comparison of Probability Distributions')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()


## 6. Bayesian Probability

### What is Bayesian Probability?

**Bayesian Probability** is a framework for updating our beliefs about the probability of an event as we gather more evidence. It's based on Bayes' Theorem, which provides a way to revise prior probabilities with new information.

### Bayes' Theorem

**Formula**: P(A|B) = P(B|A) × P(A) / P(B)

Where:
- **P(A|B)**: Posterior probability (updated belief)
- **P(B|A)**: Likelihood (probability of evidence given hypothesis)
- **P(A)**: Prior probability (initial belief)
- **P(B)**: Marginal likelihood (probability of evidence)

### Key Concepts

1. **Prior**: Our initial belief about the probability of an event
2. **Likelihood**: How likely the observed evidence is given our hypothesis
3. **Posterior**: Our updated belief after considering the evidence
4. **Evidence**: New information that helps us update our beliefs

### Applications in Machine Learning

- **Naive Bayes Classifier**: Email spam detection, text classification
- **Bayesian Networks**: Probabilistic graphical models
- **Bayesian Optimization**: Hyperparameter tuning
- **Bayesian Neural Networks**: Uncertainty quantification

Let's explore Bayesian probability with practical examples!


In [None]:
# Example 1: Medical Diagnosis (Classic Bayesian Example)
print("=== Example 1: Medical Diagnosis ===")
print()

# Given information
P_disease = 0.01  # Prior: 1% of population has the disease
P_positive_given_disease = 0.95  # Likelihood: 95% test accuracy if you have disease
P_positive_given_no_disease = 0.05  # False positive rate: 5%

print("Given Information:")
print(f"P(Disease) = {P_disease:.2f} (Prior)")
print(f"P(Positive Test | Disease) = {P_positive_given_disease:.2f} (Sensitivity)")
print(f"P(Positive Test | No Disease) = {P_positive_given_no_disease:.2f} (False Positive Rate)")
print()

# Calculate marginal probability P(Positive Test)
P_positive = P_positive_given_disease * P_disease + P_positive_given_no_disease * (1 - P_disease)
print(f"P(Positive Test) = P(Positive|Disease)×P(Disease) + P(Positive|No Disease)×P(No Disease)")
print(f"P(Positive Test) = {P_positive_given_disease:.2f}×{P_disease:.2f} + {P_positive_given_no_disease:.2f}×{1-P_disease:.2f}")
print(f"P(Positive Test) = {P_positive_given_disease * P_disease:.3f} + {P_positive_given_no_disease * (1 - P_disease):.3f} = {P_positive:.3f}")
print()

# Apply Bayes' Theorem
P_disease_given_positive = (P_positive_given_disease * P_disease) / P_positive
print("Applying Bayes' Theorem:")
print(f"P(Disease | Positive Test) = P(Positive|Disease) × P(Disease) / P(Positive Test)")
print(f"P(Disease | Positive Test) = {P_positive_given_disease:.2f} × {P_disease:.2f} / {P_positive:.3f}")
print(f"P(Disease | Positive Test) = {P_disease_given_positive:.3f} = {P_disease_given_positive*100:.1f}%")
print()

print("Interpretation:")
print(f"Even with a positive test result, there's only a {P_disease_given_positive*100:.1f}% chance of having the disease!")
print("This is because the disease is rare (1% prevalence) and the false positive rate is significant.")
print()

# Example 2: Email Spam Detection
print("=== Example 2: Email Spam Detection ===")
print()

# Given information for spam detection
P_spam = 0.3  # Prior: 30% of emails are spam
P_word_given_spam = 0.8  # Likelihood: 80% of spam emails contain word "free"
P_word_given_not_spam = 0.1  # Likelihood: 10% of legitimate emails contain word "free"

print("Given Information:")
print(f"P(Spam) = {P_spam:.2f} (Prior)")
print(f"P('free' | Spam) = {P_word_given_spam:.2f}")
print(f"P('free' | Not Spam) = {P_word_given_not_spam:.2f}")
print()

# Calculate marginal probability P('free')
P_word = P_word_given_spam * P_spam + P_word_given_not_spam * (1 - P_spam)
print(f"P('free') = P('free'|Spam)×P(Spam) + P('free'|Not Spam)×P(Not Spam)")
print(f"P('free') = {P_word_given_spam:.2f}×{P_spam:.2f} + {P_word_given_not_spam:.2f}×{1-P_spam:.2f}")
print(f"P('free') = {P_word_given_spam * P_spam:.2f} + {P_word_given_not_spam * (1 - P_spam):.2f} = {P_word:.2f}")
print()

# Apply Bayes' Theorem
P_spam_given_word = (P_word_given_spam * P_spam) / P_word
print("Applying Bayes' Theorem:")
print(f"P(Spam | 'free') = P('free'|Spam) × P(Spam) / P('free')")
print(f"P(Spam | 'free') = {P_word_given_spam:.2f} × {P_spam:.2f} / {P_word:.2f}")
print(f"P(Spam | 'free') = {P_spam_given_word:.3f} = {P_spam_given_word*100:.1f}%")
print()

print("Interpretation:")
print(f"An email containing the word 'free' has a {P_spam_given_word*100:.1f}% probability of being spam.")
print("This is much higher than the prior probability of 30%!")


In [None]:
# Interactive Bayesian Update Visualization
print("=== Interactive Bayesian Update ===")
print()

def bayesian_update(prior, likelihood_pos, likelihood_neg, evidence):
    """
    Perform Bayesian update given prior, likelihoods, and evidence
    """
    if evidence == 1:  # Positive evidence
        posterior = (likelihood_pos * prior) / (likelihood_pos * prior + likelihood_neg * (1 - prior))
    else:  # Negative evidence
        posterior = ((1 - likelihood_pos) * prior) / ((1 - likelihood_pos) * prior + (1 - likelihood_neg) * (1 - prior))
    
    return posterior

# Example: Multiple pieces of evidence
print("Example: Multiple Evidence Updates")
print("=" * 40)

# Initial prior
prior = 0.1  # 10% chance of rain
print(f"Initial Prior: P(Rain) = {prior:.2f}")

# Evidence 1: Dark clouds
likelihood_pos_1 = 0.8  # P(Dark clouds | Rain)
likelihood_neg_1 = 0.2  # P(Dark clouds | No rain)

posterior_1 = bayesian_update(prior, likelihood_pos_1, likelihood_neg_1, 1)
print(f"After seeing dark clouds: P(Rain | Dark clouds) = {posterior_1:.3f}")

# Evidence 2: Barometer dropping
likelihood_pos_2 = 0.9  # P(Barometer drops | Rain)
likelihood_neg_2 = 0.1  # P(Barometer drops | No rain)

posterior_2 = bayesian_update(posterior_1, likelihood_pos_2, likelihood_neg_2, 1)
print(f"After barometer drops: P(Rain | Both evidences) = {posterior_2:.3f}")

# Evidence 3: Weather forecast says no rain
likelihood_pos_3 = 0.1  # P(Forecast says no rain | Rain)
likelihood_neg_3 = 0.9  # P(Forecast says no rain | No rain)

posterior_3 = bayesian_update(posterior_2, likelihood_pos_3, likelihood_neg_3, 0)
print(f"After forecast says no rain: P(Rain | All evidences) = {posterior_3:.3f}")

print()
print("Interpretation:")
print(f"• Started with {prior*100:.0f}% chance of rain")
print(f"• Dark clouds increased it to {posterior_1*100:.1f}%")
print(f"• Barometer drop increased it to {posterior_2*100:.1f}%")
print(f"• Weather forecast decreased it to {posterior_3*100:.1f}%")
print()

# Visualization of Bayesian update process
steps = ['Prior', 'Dark Clouds', 'Barometer Drop', 'Forecast Says No Rain']
probabilities = [prior, posterior_1, posterior_2, posterior_3]

plt.figure(figsize=(12, 6))
plt.plot(steps, probabilities, 'bo-', linewidth=2, markersize=8)
plt.fill_between(range(len(steps)), probabilities, alpha=0.3, color='blue')
plt.xlabel('Evidence Step')
plt.ylabel('Probability of Rain')
plt.title('Bayesian Update Process: Probability of Rain')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.ylim(0, 1)

# Add probability values on the plot
for i, (step, prob) in enumerate(zip(steps, probabilities)):
    plt.annotate(f'{prob:.3f}', (i, prob), textcoords="offset points", xytext=(0,10), ha='center')

plt.tight_layout()
plt.show()

# Naive Bayes Classifier Example
print("=== Naive Bayes Classifier Example ===")
print()

# Simple text classification example
def naive_bayes_classifier():
    """
    Simple Naive Bayes classifier for text classification
    """
    # Training data
    spam_words = ['free', 'money', 'win', 'urgent', 'click']
    ham_words = ['meeting', 'project', 'team', 'work', 'schedule']
    
    # Prior probabilities
    P_spam = 0.3
    P_ham = 0.7
    
    # Word frequencies in spam and ham
    spam_word_freq = {'free': 0.8, 'money': 0.6, 'win': 0.4, 'urgent': 0.3, 'click': 0.5}
    ham_word_freq = {'meeting': 0.4, 'project': 0.5, 'team': 0.3, 'work': 0.6, 'schedule': 0.2}
    
    # Test email
    test_email = ['free', 'money', 'meeting']
    
    print(f"Test email words: {test_email}")
    print()
    
    # Calculate likelihood for spam
    spam_likelihood = 1.0
    for word in test_email:
        if word in spam_word_freq:
            spam_likelihood *= spam_word_freq[word]
        else:
            spam_likelihood *= 0.1  # Small probability for unknown words
    
    # Calculate likelihood for ham
    ham_likelihood = 1.0
    for word in test_email:
        if word in ham_word_freq:
            ham_likelihood *= ham_word_freq[word]
        else:
            ham_likelihood *= 0.1  # Small probability for unknown words
    
    # Calculate posterior probabilities
    spam_posterior = (spam_likelihood * P_spam) / (spam_likelihood * P_spam + ham_likelihood * P_ham)
    ham_posterior = (ham_likelihood * P_ham) / (spam_likelihood * P_spam + ham_likelihood * P_ham)
    
    print(f"Spam likelihood: {spam_likelihood:.4f}")
    print(f"Ham likelihood: {ham_likelihood:.4f}")
    print()
    print(f"P(Spam | email) = {spam_posterior:.4f} = {spam_posterior*100:.1f}%")
    print(f"P(Ham | email) = {ham_posterior:.4f} = {ham_posterior*100:.1f}%")
    print()
    
    if spam_posterior > ham_posterior:
        print("Classification: SPAM")
    else:
        print("Classification: HAM")

naive_bayes_classifier()


## 7. Hands-on Exercises and Practice Problems

### Exercise 1: Basic Probability Calculations

**Problem**: A bag contains 5 red balls, 3 blue balls, and 2 green balls. If you randomly select one ball:

1. What is the probability of selecting a red ball?
2. What is the probability of selecting a blue or green ball?
3. What is the probability of not selecting a red ball?

**Solution**:
- Total balls = 5 + 3 + 2 = 10
- P(Red) = 5/10 = 0.5
- P(Blue or Green) = P(Blue) + P(Green) = 3/10 + 2/10 = 0.5
- P(Not Red) = 1 - P(Red) = 1 - 0.5 = 0.5

### Exercise 2: Conditional Probability

**Problem**: In a class of 30 students, 18 are girls and 12 are boys. 8 girls and 6 boys wear glasses. If a student is selected at random:

1. What is the probability that the student wears glasses?
2. What is the probability that a girl wears glasses?
3. What is the probability that a student wearing glasses is a girl?

### Exercise 3: Binomial Distribution

**Problem**: A fair coin is flipped 8 times. Calculate:

1. The probability of getting exactly 4 heads
2. The probability of getting at most 2 heads
3. The probability of getting at least 6 heads

### Exercise 4: Normal Distribution

**Problem**: The heights of students in a class follow a normal distribution with mean 170 cm and standard deviation 10 cm. Calculate:

1. The probability that a randomly selected student is taller than 180 cm
2. The probability that a student's height is between 160 cm and 180 cm
3. The height below which 25% of students fall

### Exercise 5: Bayesian Probability

**Problem**: A factory produces widgets. 5% of widgets are defective. A quality control test correctly identifies 90% of defective widgets and 95% of non-defective widgets. If a widget tests positive:

1. What is the probability that it is actually defective?
2. What is the probability that it is not defective despite testing positive?

Let's solve these exercises step by step!


In [None]:
# Exercise Solutions
print("=== EXERCISE SOLUTIONS ===")
print()

# Exercise 1: Basic Probability Calculations
print("Exercise 1: Basic Probability Calculations")
print("=" * 50)

# Given data
red_balls = 5
blue_balls = 3
green_balls = 2
total_balls = red_balls + blue_balls + green_balls

print(f"Given: {red_balls} red, {blue_balls} blue, {green_balls} green balls")
print(f"Total balls: {total_balls}")
print()

# Calculations
P_red = red_balls / total_balls
P_blue = blue_balls / total_balls
P_green = green_balls / total_balls
P_blue_or_green = P_blue + P_green
P_not_red = 1 - P_red

print("Solutions:")
print(f"1. P(Red) = {red_balls}/{total_balls} = {P_red:.2f}")
print(f"2. P(Blue or Green) = P(Blue) + P(Green) = {blue_balls}/{total_balls} + {green_balls}/{total_balls} = {P_blue_or_green:.2f}")
print(f"3. P(Not Red) = 1 - P(Red) = 1 - {P_red:.2f} = {P_not_red:.2f}")
print()

# Exercise 2: Conditional Probability
print("Exercise 2: Conditional Probability")
print("=" * 50)

# Given data
total_students = 30
girls = 18
boys = 12
girls_with_glasses = 8
boys_with_glasses = 6
total_with_glasses = girls_with_glasses + boys_with_glasses

print(f"Given: {total_students} students ({girls} girls, {boys} boys)")
print(f"Girls with glasses: {girls_with_glasses}, Boys with glasses: {boys_with_glasses}")
print()

# Calculations
P_glasses = total_with_glasses / total_students
P_glasses_given_girl = girls_with_glasses / girls
P_girl_given_glasses = girls_with_glasses / total_with_glasses

print("Solutions:")
print(f"1. P(Glasses) = {total_with_glasses}/{total_students} = {P_glasses:.3f}")
print(f"2. P(Glasses | Girl) = {girls_with_glasses}/{girls} = {P_glasses_given_girl:.3f}")
print(f"3. P(Girl | Glasses) = {girls_with_glasses}/{total_with_glasses} = {P_girl_given_glasses:.3f}")
print()

# Exercise 3: Binomial Distribution
print("Exercise 3: Binomial Distribution")
print("=" * 50)

# Parameters
n = 8  # number of trials
p = 0.5  # probability of success (heads)

print(f"Given: Fair coin flipped {n} times")
print()

# Calculations
prob_exactly_4 = binom.pmf(4, n, p)
prob_at_most_2 = binom.cdf(2, n, p)
prob_at_least_6 = 1 - binom.cdf(5, n, p)

print("Solutions:")
print(f"1. P(Exactly 4 heads) = {prob_exactly_4:.4f}")
print(f"2. P(At most 2 heads) = P(X ≤ 2) = {prob_at_most_2:.4f}")
print(f"3. P(At least 6 heads) = P(X ≥ 6) = 1 - P(X ≤ 5) = {prob_at_least_6:.4f}")
print()

# Exercise 4: Normal Distribution
print("Exercise 4: Normal Distribution")
print("=" * 50)

# Parameters
mu = 170  # mean height
sigma = 10  # standard deviation

print(f"Given: Heights ~ Normal(μ = {mu} cm, σ = {sigma} cm)")
print()

# Calculations
prob_taller_than_180 = 1 - norm.cdf(180, mu, sigma)
prob_between_160_180 = norm.cdf(180, mu, sigma) - norm.cdf(160, mu, sigma)
height_25th_percentile = norm.ppf(0.25, mu, sigma)

print("Solutions:")
print(f"1. P(Height > 180 cm) = 1 - P(Height ≤ 180) = {prob_taller_than_180:.4f}")
print(f"2. P(160 ≤ Height ≤ 180) = P(Height ≤ 180) - P(Height ≤ 160) = {prob_between_160_180:.4f}")
print(f"3. Height below which 25% fall = {height_25th_percentile:.1f} cm")
print()

# Exercise 5: Bayesian Probability
print("Exercise 5: Bayesian Probability")
print("=" * 50)

# Given data
P_defective = 0.05  # Prior: 5% defective
P_positive_given_defective = 0.90  # Test correctly identifies 90% of defective
P_positive_given_not_defective = 0.05  # Test incorrectly identifies 5% of non-defective

print(f"Given: {P_defective*100:.0f}% widgets defective")
print(f"Test accuracy: {P_positive_given_defective*100:.0f}% for defective, {P_positive_given_not_defective*100:.0f}% false positive")
print()

# Calculations
P_positive = P_positive_given_defective * P_defective + P_positive_given_not_defective * (1 - P_defective)
P_defective_given_positive = (P_positive_given_defective * P_defective) / P_positive
P_not_defective_given_positive = 1 - P_defective_given_positive

print("Solutions:")
print(f"1. P(Defective | Positive Test) = {P_defective_given_positive:.4f} = {P_defective_given_positive*100:.1f}%")
print(f"2. P(Not Defective | Positive Test) = {P_not_defective_given_positive:.4f} = {P_not_defective_given_positive*100:.1f}%")
print()

print("Interpretation:")
print(f"Even with a positive test, there's only a {P_defective_given_positive*100:.1f}% chance the widget is actually defective!")
print("This is because the defect rate is low (5%) and the false positive rate is significant.")


## 8. Summary and Key Takeaways

### What We've Learned

In this comprehensive hands-on lecture on probability, we've covered:

#### 1. **Probability Fundamentals**
- Probability measures likelihood (0 to 1)
- Sample space, events, and probability axioms
- Basic probability calculations with practical examples

#### 2. **Joint, Marginal, and Conditional Probability**
- **Joint Probability**: P(A ∩ B) - both events occur
- **Marginal Probability**: P(A) - single event probability
- **Conditional Probability**: P(A|B) - probability given another event
- Independence and dependence relationships

#### 3. **Probability Distributions**
- Mathematical models for random phenomena
- PMF for discrete variables, PDF for continuous variables
- CDF for cumulative probabilities

#### 4. **Discrete Distributions**
- **Binomial**: Number of successes in n trials
- **Poisson**: Number of events in fixed intervals
- Parameters, formulas, and practical applications

#### 5. **Continuous Distributions**
- **Normal**: Bell-shaped curve, most common in nature
- **Uniform**: Equal probability over intervals
- 68-95-99.7 rule for normal distributions

#### 6. **Bayesian Probability**
- Updating beliefs with evidence using Bayes' Theorem
- Prior → Likelihood → Posterior
- Applications in machine learning and decision making

### Key Formulas to Remember

1. **Basic Probability**: P(A) = Favorable outcomes / Total outcomes
2. **Conditional Probability**: P(A|B) = P(A ∩ B) / P(B)
3. **Bayes' Theorem**: P(A|B) = P(B|A) × P(A) / P(B)
4. **Independence**: P(A ∩ B) = P(A) × P(B)
5. **Binomial PMF**: P(X = k) = C(n,k) × p^k × (1-p)^(n-k)
6. **Poisson PMF**: P(X = k) = (λ^k × e^(-λ)) / k!

### Applications in Machine Learning

- **Classification**: Naive Bayes classifiers
- **Regression**: Bayesian linear regression
- **Optimization**: Bayesian optimization
- **Uncertainty**: Bayesian neural networks
- **Decision Making**: Risk assessment and decision theory

### Next Steps

1. **Practice**: Work through more probability problems
2. **Explore**: Learn about other distributions (exponential, gamma, beta)
3. **Apply**: Use probability in your machine learning projects
4. **Advanced**: Study Bayesian statistics and probabilistic programming

### Resources for Further Learning

- **Books**: "Introduction to Probability" by Joseph K. Blitzstein
- **Online**: Khan Academy Probability course
- **Tools**: Python scipy.stats, R probability packages
- **Applications**: Bayesian inference, Monte Carlo methods

---

**Congratulations!** You've completed a comprehensive journey through probability theory. These concepts form the foundation for understanding machine learning algorithms, statistical inference, and data science. Keep practicing and applying these concepts in your projects!
