# Statistics — Overview

## Purpose
- Model uncertainty and variability in data.
- Estimate parameters and test hypotheses.
- Support decision-making with probabilistic reasoning.

## Key questions this section answers
- What distribution fits the data and why?
- How confident are we in an estimate?
- Which test answers the scientific question?

## Topics
- Probability distributions and random variables
- Estimation, confidence intervals, and bias
- Hypothesis testing and p-values
- Bayesian vs frequentist inference
- Experimental design and power

## References
- SciPy, statsmodels; "Practical Statistics for Data Scientists"


In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

rng = np.random.default_rng(1)

# Central Limit Theorem: sampling distribution of the mean
samples = rng.exponential(scale=1.0, size=(2000, 40))
means = samples.mean(axis=1)

fig = px.histogram(
    means,
    nbins=40,
    title="Central Limit Theorem: sample means",
    labels={"value": "sample mean"},
)
fig.show()

mu, sigma = means.mean(), means.std()
x = np.linspace(means.min(), means.max(), 200)
pdf = (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma) ** 2)

fig = go.Figure()
fig.add_trace(go.Histogram(x=means, nbinsx=40, histnorm="probability density", name="sample means"))
fig.add_trace(go.Scatter(x=x, y=pdf, name="normal approximation"))
fig.update_layout(title="Normal approximation to sample means", xaxis_title="mean", yaxis_title="density")
fig.show()


## Takeaway
Statistics provides the language for uncertainty. Good models begin with good distributional assumptions.

