# One-sample t-test (Student’s t-test)

The **one-sample t-test** answers:

> Is the population mean plausibly equal to a reference value μ0?

Use it when you have **one sample** of numeric measurements, the population standard deviation is **unknown**, and you care about the **mean**.


## Learning goals

- Know when the one-sample t-test is the right tool (and when it is not).
- Write the null and alternative hypotheses (two-sided / one-sided).
- Understand the t-statistic as a signal-to-noise ratio.
- Interpret the p-value correctly (what it is and what it is not).
- Implement the test with **NumPy only** (including an estimated p-value).
- Build intuition with Plotly visuals (p-value area, df effects, sampling behavior, power).


## Prerequisites

- Basic descriptive stats: mean, (sample) standard deviation.
- The idea of a sampling distribution (statistics vary from sample to sample).
- Optional: the z-test (mean test when σ is known).


In [None]:
import math
import platform

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)

print("Python", platform.python_version())
print("NumPy ", np.__version__)
try:
    import plotly

    print("Plotly", plotly.__version__)
except Exception:
    pass
try:
    import scipy

    print("SciPy ", scipy.__version__)
except Exception:
    pass


## Intuition: signal-to-noise

You observe a sample `x1, x2, …, xn` and want to compare its mean `x̄` to a reference mean `μ0`.

The core question is not “is `x̄` different from `μ0`?” (it almost always is, a little), but:

**Is the difference large relative to the uncertainty in the mean estimate?**

The t-statistic is exactly that ratio:

```text
t = (x̄ − μ0) / SE
SE = s / √n
df = n − 1
```

- numerator: **signal** (`x̄ − μ0`)
- denominator: **noise** (estimated standard error of the mean)

Because we estimate the standard deviation with `s` (instead of knowing σ exactly), the statistic follows a **Student t distribution** under the null hypothesis.


## When is it used?

Typical use cases:

- **Quality control**: Is the average fill volume equal to the labeled amount?
- **Performance/SLA**: Is the mean latency higher than a target threshold?
- **Science/medicine**: Is the mean change from baseline different from 0?

You are comparing a *single sample mean* to a *fixed reference value*.

If you are comparing **two groups** (two samples), you want a *two-sample t-test* (or Welch’s t-test).


## Assumptions (and what happens if they fail)

The classical one-sample t-test relies on:

1. **Independence**: observations are independent (no time dependence, no clustering).
2. **Approximately normal data** (or large `n`):
   - If the population is normal, the test is exact.
   - For larger `n`, the Central Limit Theorem makes the mean closer to normal.
3. **No extreme outliers**: outliers inflate `s` and can dominate the mean.

If normality is questionable and `n` is small, consider robust / nonparametric alternatives (e.g., sign test, Wilcoxon signed-rank test) or bootstrap confidence intervals.


## Hypotheses

Choose an alternative that matches the question **before** looking at the data:

- Two-sided: `H0: μ = μ0` vs `H1: μ ≠ μ0`
- Greater (right-tailed): `H0: μ = μ0` vs `H1: μ > μ0`
- Less (left-tailed): `H0: μ = μ0` vs `H1: μ < μ0`


## Test procedure (recipe)

1. Choose `μ0`, a significance level `α` (often 0.05), and the alternative.
2. Compute `x̄`, `s`, and `SE = s/√n`.
3. Compute `t = (x̄ − μ0)/SE` with `df = n − 1`.
4. Compute the p-value as a tail probability under `T ~ t(df)`.
5. Reject H0 if `p ≤ α`.
6. Report the estimate (mean), a CI for `μ`, and an effect size.


## Interpreting the result (what it means)

The p-value is:

> The probability, **assuming the null hypothesis is true**, of seeing a t-statistic at least as extreme as the one you observed.

So:

- Small p-value → your data would be **rare under H0** → evidence against `μ = μ0`.
- Large p-value → your data is **not unusual under H0** → you *fail to reject* H0.

Important: a large p-value does **not** prove `μ = μ0`. It usually means “not enough evidence with this sample size / noise level”.

A helpful companion is the confidence interval (CI) for `μ`:

- A 95% two-sided CI is the set of means that would not be rejected by a 5% two-sided test.
- If `μ0` is outside the CI, you reject at that α.

Also separate **statistical significance** (p-value) from **practical significance** (effect size like Cohen’s d).


In [None]:
# Example data: fill volumes (ml). The label says 250ml.
mu0 = 250.0

# Synthetic sample: true mean slightly above 250, unknown variance
x = rng.normal(loc=252.0, scale=4.0, size=20)

print("n=", x.size)
print("mean=", float(x.mean()))
print("std=", float(x.std(ddof=1)))
x[:10]


In [None]:
# Always visualize the sample (outliers + skewness matter)
fig = go.Figure()
fig.add_trace(
    go.Violin(
        y=x,
        box_visible=True,
        meanline_visible=True,
        points="all",
        jitter=0.2,
        name="sample",
    )
)
fig.add_shape(
    type="line",
    x0=0,
    x1=1,
    xref="paper",
    y0=mu0,
    y1=mu0,
    line=dict(color="rgba(214, 39, 40, 1)", dash="dash", width=2),
)
fig.add_annotation(
    x=0.98,
    y=mu0,
    xref="paper",
    text="μ0",
    showarrow=False,
    yshift=10,
    font=dict(color="rgba(214, 39, 40, 1)"),
)
fig.update_layout(title="Sample vs reference μ0", yaxis_title="measurement (ml)")
fig

## NumPy-only implementation (from scratch)

We can compute the t-statistic exactly with NumPy.

The only tricky part (if you restrict yourself to NumPy) is computing the **t-distribution tail probability** (the p-value) and the **t critical value** for the CI.

To keep everything NumPy-only and still match the definition (“tail area under the null distribution”), we estimate these probabilities via **Monte Carlo**:

- Draw many samples from `T ~ t(df)` using `np.random.standard_t`.
- Approximate probabilities as empirical frequencies.

This is not how you’d do production statistics (you’d use a specialized library for accurate CDF/PPF), but it’s a great way to understand what the p-value *is*.


In [None]:
def student_t_pdf(x: np.ndarray, df: int) -> np.ndarray:
    """Student t PDF computed from the definition (NumPy + standard library only)."""
    x = np.asarray(x, dtype=float)
    df = int(df)
    if df <= 0:
        raise ValueError("df must be a positive integer")

    log_norm = math.lgamma((df + 1) / 2) - (
        0.5 * math.log(df * math.pi) + math.lgamma(df / 2)
    )
    return np.exp(log_norm) * (1 + (x**2) / df) ** (-(df + 1) / 2)


def normal_pdf(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    return (1 / np.sqrt(2 * np.pi)) * np.exp(-0.5 * x**2)


def ttest_1samp_numpy(
    x: np.ndarray,
    mu0: float,
    *,
    alternative: str = "two-sided",
    alpha: float = 0.05,
    n_mc: int = 300_000,
    seed: int = 123,
) -> dict:
    """One-sample t-test with a NumPy-only Monte Carlo p-value.

    Parameters
    - x: sample (1D array)
    - mu0: null mean
    - alternative: 'two-sided', 'greater', 'less'
    - alpha: significance level for CI and decision
    - n_mc: Monte Carlo sample size for approximating p-value and t critical values
    """
    x = np.asarray(x, dtype=float)
    x = x[~np.isnan(x)]

    n = int(x.size)
    if n < 2:
        raise ValueError("Need at least 2 non-NaN observations.")

    df = n - 1
    mean = float(x.mean())
    s = float(x.std(ddof=1))
    se = s / np.sqrt(n) if s > 0 else 0.0

    if se == 0.0:
        t_stat = float(np.inf * np.sign(mean - mu0) if mean != mu0 else 0.0)
        p_value = float(0.0 if mean != mu0 else 1.0)
        t_crit = float("nan")
        ci = (mean, mean)
        cohen_d = float(np.inf * np.sign(mean - mu0) if mean != mu0 else 0.0)
        decision = "reject H0" if p_value <= alpha else "fail to reject H0"
        return {
            "n": n,
            "df": df,
            "mu0": float(mu0),
            "mean": mean,
            "std": s,
            "se": se,
            "t_stat": t_stat,
            "p_value": p_value,
            "alpha": float(alpha),
            "alternative": alternative,
            "t_crit": t_crit,
            "ci": (float(ci[0]), float(ci[1])),
            "cohen_d": cohen_d,
            "decision": decision,
            "mc_samples": int(n_mc),
            "mc_seed": int(seed),
        }

    t_stat = float((mean - mu0) / se)
    cohen_d = float((mean - mu0) / s)

    rng_local = np.random.default_rng(seed)
    t_null = rng_local.standard_t(df, size=int(n_mc))

    if alternative == "two-sided":
        p_value = float(np.mean(np.abs(t_null) >= abs(t_stat)))
        t_crit = float(np.quantile(t_null, 1 - alpha / 2))
        ci = (mean - t_crit * se, mean + t_crit * se)
    elif alternative == "greater":
        p_value = float(np.mean(t_null >= t_stat))
        t_crit = float(np.quantile(t_null, 1 - alpha))
        ci = (mean - t_crit * se, np.inf)
    elif alternative == "less":
        p_value = float(np.mean(t_null <= t_stat))
        t_crit = float(np.quantile(t_null, 1 - alpha))
        ci = (-np.inf, mean + t_crit * se)
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    decision = "reject H0" if p_value <= alpha else "fail to reject H0"

    return {
        "n": n,
        "df": df,
        "mu0": float(mu0),
        "mean": mean,
        "std": s,
        "se": float(se),
        "t_stat": t_stat,
        "p_value": p_value,
        "alpha": float(alpha),
        "alternative": alternative,
        "t_crit": t_crit,
        "ci": (float(ci[0]), float(ci[1])),
        "cohen_d": cohen_d,
        "decision": decision,
        "mc_samples": int(n_mc),
        "mc_seed": int(seed),
    }


In [None]:
res = ttest_1samp_numpy(x, mu0, alternative="two-sided", alpha=0.05, n_mc=500_000, seed=42)

print(f"n={res['n']}, mean={res['mean']:.3f}, std={res['std']:.3f}, SE={res['se']:.3f}")
print(f"t={res['t_stat']:.3f} (df={res['df']}), p≈{res['p_value']:.4f}, alpha={res['alpha']}")
print(f"95% CI for μ: [{res['ci'][0]:.3f}, {res['ci'][1]:.3f}]")
print(f"Cohen's d (one-sample): {res['cohen_d']:.3f}")
print("Decision:", res["decision"])


In [None]:
# Optional validation against SciPy (production-grade distribution functions)
try:
    from scipy.stats import ttest_1samp

    scipy_res = ttest_1samp(x, popmean=mu0, alternative="two-sided")
    print("SciPy t:", float(scipy_res.statistic))
    print("SciPy p:", float(scipy_res.pvalue))
except Exception as e:
    print("SciPy check skipped:", e)


## Visual: the p-value is a tail area

For a two-sided test, the p-value is the probability (under `H0`) of seeing `|T| ≥ |t_obs|`.

That probability corresponds to the **red shaded area** below.


In [None]:
t_obs = res["t_stat"]
df = res["df"]
tcrit = res["t_crit"]

xmax = max(6.0, abs(t_obs) + 1.0, abs(tcrit) + 1.0)
xx = np.linspace(-xmax, xmax, 3001)
yy = student_t_pdf(xx, df)

abs_t = abs(t_obs)
mask_left = xx <= -abs_t
mask_right = xx >= abs_t

fig = go.Figure()
fig.add_trace(go.Scatter(x=xx, y=yy, mode="lines", name=f"t pdf (df={df})"))
fig.add_trace(
    go.Scatter(
        x=xx[mask_left],
        y=yy[mask_left],
        mode="lines",
        line=dict(width=0),
        fill="tozeroy",
        name="p-value tail",
        showlegend=False,
        fillcolor="rgba(214, 39, 40, 0.35)",
    )
)
fig.add_trace(
    go.Scatter(
        x=xx[mask_right],
        y=yy[mask_right],
        mode="lines",
        line=dict(width=0),
        fill="tozeroy",
        showlegend=False,
        fillcolor="rgba(214, 39, 40, 0.35)",
    )
)

ymax = float(yy.max())
for xline, dash, color, width in [
    (t_obs, "dash", "rgba(214, 39, 40, 1)", 2),
    (tcrit, "dot", "rgba(0, 0, 0, 0.6)", 1),
    (-tcrit, "dot", "rgba(0, 0, 0, 0.6)", 1),
]:
    fig.add_shape(
        type="line",
        x0=xline,
        x1=xline,
        y0=0,
        y1=ymax,
        xref="x",
        yref="y",
        line=dict(color=color, dash=dash, width=width),
    )

fig.update_layout(
    title=f"Two-sided p-value as tail area (t={t_obs:.3f}, p≈{res['p_value']:.4f})",
    xaxis_title="t",
    yaxis_title="density",
)
fig

## Visual: degrees of freedom control tail heaviness

For small `df`, the t distribution has heavier tails than the standard normal.

As `df → ∞`, the t distribution approaches a normal distribution.


In [None]:
dfs = [1, 2, 5, 10, 30, 100]
xx = np.linspace(-5, 5, 2001)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=xx,
        y=normal_pdf(xx),
        mode="lines",
        name="Normal(0,1)",
        line=dict(color="black", dash="dash"),
    )
)

for i, df_ in enumerate(dfs):
    fig.add_trace(
        go.Scatter(
            x=xx,
            y=student_t_pdf(xx, df_),
            mode="lines",
            name=f"t(df={df_})",
            visible=(i == 0),
        )
    )

steps = []
for i, df_ in enumerate(dfs):
    visible = [True] + [False] * len(dfs)
    visible[1 + i] = True
    steps.append(
        dict(
            method="update",
            args=[{"visible": visible}, {"title": f"Student t vs Normal — df={df_}"}],
            label=str(df_),
        )
    )

fig.update_layout(
    title=f"Student t vs Normal — df={dfs[0]}",
    xaxis_title="x",
    yaxis_title="density",
    sliders=[
        dict(
            active=0,
            currentvalue={"prefix": "df: "},
            pad={"t": 30},
            steps=steps,
        )
    ],
)
fig

## Visual: under H0, the computed t-statistic follows a t distribution

If the data really comes from a normal population with mean `μ0`, then the statistic

```text
t = (x̄ − μ0) / (s / √n)
```

has a `t(df=n−1)` distribution.

We can check that by simulation.


In [None]:
rng_sim = np.random.default_rng(123)

B = 50_000
n = res["n"]
df = n - 1

mu0_sim = 0.0
x0 = rng_sim.normal(loc=mu0_sim, scale=1.0, size=(B, n))
t_stats = (x0.mean(axis=1) - mu0_sim) / (x0.std(axis=1, ddof=1) / np.sqrt(n))

xx = np.linspace(-6, 6, 2001)
yy = student_t_pdf(xx, df)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=t_stats,
        nbinsx=80,
        histnorm="probability density",
        name="Simulated t-stat",
        opacity=0.65,
    )
)
fig.add_trace(
    go.Scatter(x=xx, y=yy, mode="lines", name=f"t pdf (df={df})", line=dict(width=3))
)

fig.update_layout(
    barmode="overlay",
    title=f"t-statistic under H0 matches t(df={df})",
    xaxis_title="t",
    yaxis_title="density",
)
fig

## Visual: power depends on effect size and sample size

**Power** is the probability of rejecting H0 when H1 is true.

If the true mean is `μ = μ0 + δ`, power increases when:

- `|δ|` is larger (bigger effect)
- `n` is larger (smaller SE)
- α is larger (more aggressive rejection rule)

Below we estimate power by simulation for a two-sided test at α=0.05.


In [None]:
alpha = 0.05

# Generic setup for power: mu0=0, sigma=1 so δ is an effect size in "sigma units"
mu0_power = 0.0
sigma_power = 1.0
deltas = [0.2, 0.5, 0.8]

n_grid = np.array([5, 8, 12, 20, 30, 40, 60, 80, 100])

B_power = 12_000
B_crit = 120_000

rng_power = np.random.default_rng(202)

power = {d: [] for d in deltas}

for n in n_grid:
    df = n - 1
    t_null = rng_power.standard_t(df, size=B_crit)
    tcrit = float(np.quantile(t_null, 1 - alpha / 2))

    for d in deltas:
        x_alt = rng_power.normal(
            loc=mu0_power + d * sigma_power,
            scale=sigma_power,
            size=(B_power, n),
        )
        t_alt = (x_alt.mean(axis=1) - mu0_power) / (
            x_alt.std(axis=1, ddof=1) / np.sqrt(n)
        )
        power[d].append(float(np.mean(np.abs(t_alt) >= tcrit)))

fig = go.Figure()
for d in deltas:
    fig.add_trace(
        go.Scatter(
            x=n_grid,
            y=power[d],
            mode="lines+markers",
            name=f"δ={d}σ",
        )
    )

fig.update_layout(
    title="Estimated power vs sample size (two-sided t-test, α=0.05)",
    xaxis_title="n",
    yaxis_title="power",
    yaxis=dict(range=[0, 1]),
)
fig

## Pitfalls and practical notes

- **p-value is not** “probability that H0 is true”. It’s `P(data as extreme | H0)`.
- **Failing to reject** does not prove equality; it often means low power.
- **Outliers** can strongly affect the mean and inflate `s`. Always plot the data.
- **Independence** is crucial. If you have time series dependence or repeated measures, use the right model/test.
- If you test many hypotheses, consider **multiple testing** corrections.


## Exercises

1. Change `alternative` to `'greater'` and `'less'`. How does the CI change?
2. Make `n` smaller (e.g., 5) and rerun. What happens to the t critical value and p-value stability?
3. Add a single extreme outlier to `x` and rerun. What happens to the mean, `s`, and the decision?
4. Compare the Monte Carlo p-value to SciPy while increasing `n_mc`. How fast does it converge?

## References

- Standard introductory statistics texts: Student’s one-sample t-test
- SciPy: `scipy.stats.ttest_1samp`
