### 🎯 Learning Objectives
- Understand what a distribution represents
- Distinguish between descriptive and inferential statistics
- Visualize and internalize the Central Limit Theorem
- Recognize the impact of variance and shape
- Build a reusable thinking framework for statistical reasoning

# Deep Statistics: Foundations for ML & Inference

## 1. What is a Distribution, Really?
A distribution tells us how likely values are to appear in a population or process. It encodes *assumptions* we make about reality.

- Uniform: all outcomes equally likely
- Normal: many small influences combine (CLT basis)
- Skewed: one-sided influences, like income or wait times

In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#### 🔍 Analogy: Distributions as City Maps
Imagine each distribution as a **map of population density**:
- **Normal**: Most people live in the city center
- **Exponential**: A few live very far out (rich outliers)
- **Uniform**: People evenly spread along a beach

In [None]:
# Different distributions
dists = {
    'Normal': np.random.normal(loc=0, scale=1, size=1000),
    'Exponential (Skewed)': np.random.exponential(scale=2, size=1000),
    'Uniform': np.random.uniform(low=-2, high=2, size=1000)
}
fig = make_subplots(rows=1, cols=3, subplot_titles=list(dists.keys()))
for i, (name, data) in enumerate(dists.items(), 1):
    fig.add_trace(go.Histogram(x=data, name=name, nbinsx=40, opacity=0.75), row=1, col=i)
fig.update_layout(title_text='Different Distributions', showlegend=False)
fig.show()

👉 Before you run this: Which of these distributions do you think has a mean that best represents the 'typical' value? Why?

## 2. Descriptive vs Inferential Thinking
Descriptive stats **summarize** what you see (mean, median, std).
Inferential stats **predict** what you can't see — like population properties from samples.

In [None]:
# Simulate skewed data to compare mean vs median
data = np.random.exponential(scale=2, size=1000)
mean_val = np.mean(data)
median_val = np.median(data)
fig = px.histogram(data, nbins=40, title='Mean vs Median in Skewed Data')
fig.add_vline(x=mean_val, line_dash='dash', line_color='red', annotation_text='Mean')
fig.add_vline(x=median_val, line_dash='dot', line_color='blue', annotation_text='Median')
fig.show()

## 3. Central Limit Theorem — Visualized
Even if a population is skewed, the **mean of repeated samples** will approach a normal distribution.
This lets us use normal-based tools like confidence intervals.

**Let's unpack the CLT step-by-step:**
1. You take multiple random samples from a skewed population
2. For each sample, you compute its mean
3. You collect all these means and plot their distribution
4. That distribution will be *approximately normal*, even if the original data wasn't

In [None]:
# Simulate 1000 sample means from exponential (skewed) distribution
means = [np.mean(np.random.exponential(scale=2, size=30)) for _ in range(1000)]
fig = px.histogram(means, nbins=50, title='Central Limit Theorem in Action')
fig.show()

## 4. Why Variance Matters
Two datasets can have the same mean but different spreads (variances), leading to very different behaviors.

In [None]:
# Compare same mean, different std
x1 = np.random.normal(loc=0, scale=1, size=1000)
x2 = np.random.normal(loc=0, scale=3, size=1000)
fig = make_subplots(rows=1, cols=2, subplot_titles=['std=1', 'std=3'])
fig.add_trace(go.Histogram(x=x1, nbinsx=50, name='std=1'), row=1, col=1)
fig.add_trace(go.Histogram(x=x2, nbinsx=50, name='std=3'), row=1, col=2)
fig.update_layout(title='Same Mean, Different Variance', showlegend=False)
fig.show()

### ✅ Concept Check
1. What would happen to the spread of the sampling distribution if we increased the sample size from 30 to 100?
2. Why does the mean of means still reflect the original population mean?
3. When does CLT fail to apply?

## 🧠 In ML Terms:
- Distribution shape matters in **assumption-heavy models** (like Naive Bayes, Linear Regression)
- Understanding variance helps in **bias-variance tradeoff**
- CLT justifies using **confidence intervals** in model evaluation

## 🎓 Stretch Challenge
- Create your own synthetic population with a weird distribution (e.g., bimodal)
- Run the CLT experiment on it and see what happens
- Try increasing the sample size — what stabilizes first: the mean or the variance?

### 🧾 Summary Card
**Stat Tools You Used:**
- Mean, median
- Histogram
- Sampling
- Central Limit Theorem
- Variance and standard deviation

**Mental Model:** Shape → Spread → Center → Position

**Next Step:** Confidence intervals, hypothesis testing, inference