# Studentized Range Distribution (`studentized_range`)

The **studentized range** distribution is the null distribution of a key statistic in **multiple comparisons**: the (scaled) range of \(k\) normal means when the variance is **estimated from data**.

It is the mathematical backbone behind **Tukey’s HSD** procedure (and related simultaneous confidence intervals) after one-way ANOVA.

## What you’ll learn
- what the studentized range statistic measures (and why it’s “studentized”)
- the parameter roles: number of groups \(k\) and degrees of freedom \(\nu\)
- an integral-form PDF/CDF (and why there is no simple closed form)
- when moments exist (and how to compute them numerically / by Monte Carlo)
- a NumPy-only simulator from the defining construction
- practical SciPy usage: `scipy.stats.studentized_range` (`pdf`, `cdf`, `rvs`, `fit`)
- how it appears in Tukey-style hypothesis tests and modeling workflows


## Notebook roadmap
1) Title & classification
2) Intuition & motivation
3) Formal definition (PDF/CDF)
4) Moments & properties
5) Parameter interpretation
6) Derivations (expectation, variance, likelihood)
7) Sampling & simulation (NumPy-only)
8) Visualization (PDF, CDF, Monte Carlo)
9) SciPy integration (`scipy.stats.studentized_range`)
10) Statistical use cases
11) Pitfalls
12) Summary


In [1]:
import math
import warnings

import numpy as np
import scipy
from scipy import stats
from scipy.special import gammaln

import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

SEED = 7
rng = np.random.default_rng(SEED)

np.set_printoptions(precision=4, suppress=True)

print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)


numpy  1.26.2
scipy  1.15.0
plotly 6.5.2


## Prerequisites & notation

**Prerequisites**
- comfort with continuous distributions (PDF/CDF)
- basic calculus and change-of-variables
- familiarity with the \(\chi^2\) distribution and the Gamma function \(\Gamma(\cdot)\)
- basic ideas from order statistics (max/min/range)

**Notation**
- \(k\): number of groups / number of normal means being compared (typically an integer \(k\ge 2\)).
- \(\nu\): degrees of freedom for the variance estimate (\(\nu>0\)).
- \(\phi\) and \(\Phi\): standard normal PDF and CDF.
- \(\chi^2_\nu\): chi-square distribution with \(\nu\) degrees of freedom.

In SciPy, the distribution is exposed as `scipy.stats.studentized_range` with shape parameters `(k, df)` (where `df` corresponds to \(\nu\)), plus the usual `loc` and `scale`.


## 1) Title & classification

- **Name**: `studentized_range` (studentized range distribution)
- **Type**: **continuous**
- **Support**: \(q \in [0, \infty)\)
- **Parameter space**:
  - \(k \ge 2\) (typically an integer)
  - \(\nu > 0\) degrees of freedom

A location-scale family also exists: if \(Q\sim \mathrm{studentized\_range}(k,\nu)\), then \(X = \mathrm{loc} + \mathrm{scale}\,Q\) is supported on \([\mathrm{loc}, \infty)\). SciPy provides this via the standard `loc` and `scale` arguments.


## 2) Intuition & motivation

### What this distribution models
The studentized range distribution describes the random variable

\[
Q \,=\, \frac{\max_i \bar{Y}_i - \min_i \bar{Y}_i}{\hat\sigma\,/\,\sqrt{n}},
\]

where:
- \(\bar{Y}_1,\dots,\bar{Y}_k\) are \(k\) sample means,
- each mean is based on \(n\) observations (often equal group sizes), and
- \(\hat\sigma\) is an estimate of the common standard deviation built from an independent \(\chi^2\)-type sum of squares.

In words: **how far apart are the largest and smallest means, measured in estimated standard error units?**

The “studentized” part means we divide by an **estimated** standard deviation rather than the true \(\sigma\).

### Typical real-world use cases
- **Tukey’s HSD / multiple comparisons after ANOVA**: control family-wise error across all pairwise mean comparisons.
- **Simultaneous confidence intervals**: for differences between many means in a balanced one-way ANOVA.
- **Quality / A/B testing / experiments**: quantifying the largest observed separation among several treatments while accounting for variance estimation.

### Relations to other distributions
- **\(k=2\)**: the range of two normals is an absolute difference; one can show
  \[
  Q \;\overset{d}{=}\; \sqrt{2}\,|T_\nu|,
  \]
  where \(T_\nu\) is Student’s \(t\) with \(\nu\) degrees of freedom.
- **\(\nu\to\infty\)**: the variance estimate becomes exact and \(Q\) approaches the **range of \(k\) standard normals**.
- **Large \(k\)**: behavior is driven by extremes (max/min), so after centering/scaling it connects to extreme-value ideas.


## 3) Formal definition

A clean way to define the studentized range is through a *construction*.

Let
- \(Z_1,\dots,Z_k \overset{iid}{\sim} \mathcal{N}(0,1)\)
- \(V \sim \chi^2_\nu\), independent of the \(Z_i\)

Define the normal range
\[
R = \max_i Z_i - \min_i Z_i \;\;\; (R\ge 0)
\]
and the studentizing scale
\[
S = \sqrt{V/\nu} \;\;\; (S>0).
\]

Then the **studentized range** random variable is
\[
Q = \frac{R}{S} = \frac{\max_i Z_i - \min_i Z_i}{\sqrt{V/\nu}} \;\;\; (Q\ge 0).
\]

### PDF / CDF (integral form)
There is no simple closed-form PDF/CDF. One standard representation mixes the **range distribution of normals** over the distribution of \(S\).

For a continuous parent distribution with PDF \(f\) and CDF \(F\), the range \(R\) of \(k\) i.i.d. samples has:

\[
F_R(r) = k\int_{-\infty}^{\infty} f(x)\,[F(x+r)-F(x)]^{k-1}\,dx, \quad r\ge 0
\]

\[
f_R(r) = k(k-1)\int_{-\infty}^{\infty} f(x)\,f(x+r)\,[F(x+r)-F(x)]^{k-2}\,dx, \quad r\ge 0.
\]

For the standard normal, \(f=\phi\) and \(F=\Phi\).

Because \(Q = R/S\) with \(R\perp S\), we can write:

\[
F_Q(q\,;k,\nu) = \int_0^\infty F_R(qs\,;k)\,f_S(s\,;\nu)\,ds, \quad q\ge 0
\]

\[
f_Q(q\,;k,\nu) = \int_0^\infty s\,f_R(qs\,;k)\,f_S(s\,;\nu)\,ds, \quad q\ge 0,
\]

where \(S=\sqrt{V/\nu}\) and its PDF is

\[
f_S(s\,;\nu) = \frac{2\,\nu^{\nu/2}}{2^{\nu/2}\,\Gamma(\nu/2)}\,s^{\nu-1}\,\exp\left(-\frac{\nu s^2}{2}\right),\quad s>0.
\]

In practice, libraries (including SciPy) evaluate the distribution via specialized numerical integration routines.


## 4) Moments & properties

### Moment existence
Write
\[
Q = R\,\sqrt{\nu/V},\quad R\perp V.
\]

Since \(R\) (a range of Gaussians) has finite moments of all orders, the existence of \(\mathbb{E}[Q^m]\) is controlled by the inverse-\(\chi^2\) term.

Using \(V\sim\chi^2_\nu\), one can show:

\[
\mathbb{E}[Q^m] < \infty \quad \Longleftrightarrow \quad \nu > m.
\]

In particular:
- **Mean** exists if \(\nu>1\)
- **Variance** exists if \(\nu>2\)
- **Skewness** exists if \(\nu>3\)
- **(Excess) kurtosis** exists if \(\nu>4\)

### General moment formula (factorization)
For \(m<\nu\):

\[
\mathbb{E}[Q^m] = \mathbb{E}[R^m]\,\mathbb{E}\!\left[(\nu/V)^{m/2}\right].
\]

The inverse-\(\chi^2\) factor has a closed form:

\[
\mathbb{E}\!\left[(\nu/V)^{m/2}\right]
= \left(\frac{\nu}{2}\right)^{m/2}\,\frac{\Gamma\left((\nu-m)/2\right)}{\Gamma(\nu/2)},\quad \nu>m.
\]

What remains is \(\mathbb{E}[R^m]\), the \(m\)-th moment of the range of \(k\) standard normals, which typically has to be computed numerically (or by Monte Carlo).

### MGF / characteristic function
- The **MGF** \(M_Q(t)=\mathbb{E}[e^{tQ}]\) does **not** exist (it diverges for any \(t>0\)) because \(Q\) has polynomially decaying tails (similar to the \(t\) distribution).
- The **characteristic function** \(\varphi_Q(t)=\mathbb{E}[e^{itQ}]\) exists for all real \(t\) but has no simple closed form; it can be approximated numerically.

### Entropy
The differential entropy

\[
h(Q) = -\int_0^\infty f_Q(q)\,\log f_Q(q)\,dq
\]

generally has no simple closed form and is typically computed numerically.


In [2]:
# Moments in SciPy (computed numerically)

k, df = 5, 10

mean, var, skew, ex_kurt = stats.studentized_range.stats(k, df, moments="mvsk")
print(f"k={k}, df={df}")
print("mean              ", float(mean))
print("variance          ", float(var))
print("skewness          ", float(skew))
print("excess kurtosis   ", float(ex_kurt))

# Monte Carlo check (SciPy sampling)
n = 80_000
samples = stats.studentized_range.rvs(k, df, size=n, random_state=rng)
print("\nMonte Carlo (SciPy rvs)")
print("mean    ", float(samples.mean()))
print("var     ", float(samples.var()))

# Moment existence demo: when df <= m, E[Q^m] diverges.
print("\nMoment existence (SciPy may return inf/nan when moments diverge):")
for df_test in [0.8, 1.0, 1.5, 2.0, 3.0, 5.0]:
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        m, v, s, krt = stats.studentized_range.stats(3, df_test, moments="mvsk")
    print(f"df={df_test:>4}: mean={m}, var={v}")


k=5, df=10
mean               2.520661086851837
variance           1.3419965211478724
skewness           1.188652638505497
excess kurtosis    2.999458199777277


KeyboardInterrupt: 

## 5) Parameter interpretation

The parameters control two intuitive “sources of size” in the statistic:

- **\(k\)** (number of groups / means): increasing \(k\) increases the typical separation between the largest and smallest of \(k\) values. The distribution shifts to the right and becomes more spread out.

- **\(\nu\)** (degrees of freedom): controls how noisy the variance estimate is.
  - small \(\nu\) means a noisy denominator \(\sqrt{V/\nu}\), producing heavier right tails and larger quantiles.
  - as \(\nu\to\infty\), the denominator concentrates near 1 and the distribution approaches the normal-range distribution.

Below we visualize these effects.


In [None]:
x = np.linspace(0, 8, 500)

# Effect of k (fix df)
df_fixed = 10
ks = [2, 3, 5, 10]

fig = go.Figure()
for k_ in ks:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=stats.studentized_range.pdf(x, k_, df_fixed),
            mode="lines",
            name=f"k={k_}",
        )
    )

fig.update_layout(
    title=f"studentized_range PDF: varying k (df={df_fixed})",
    xaxis_title="q",
    yaxis_title="pdf",
)
fig.show()

# Effect of df (fix k)
k_fixed = 5
dfs = [3, 5, 10, 30]

fig = go.Figure()
for df_ in dfs:
    fig.add_trace(
        go.Scatter(
            x=x,
            y=stats.studentized_range.pdf(x, k_fixed, df_),
            mode="lines",
            name=f"df={df_}",
        )
    )

fig.update_layout(
    title=f"studentized_range PDF: varying df (k={k_fixed})",
    xaxis_title="q",
    yaxis_title="pdf",
)
fig.show()


## 6) Derivations

The studentized range is defined as \(Q = R/S\) with:
- \(R = \max_i Z_i - \min_i Z_i\) (range of \(k\) standard normals)
- \(S = \sqrt{V/\nu}\) with \(V\sim\chi^2_\nu\)
- \(R\perp V\)

This independence is what makes several useful derivations short.

### Expectation
For \(\nu>1\):

\[
\mathbb{E}[Q] = \mathbb{E}[R]\,\mathbb{E}[1/S] = \mathbb{E}[R]\,\mathbb{E}[\sqrt{\nu/V}].
\]

Using the Gamma-function moment of \(\chi^2\):

\[
\mathbb{E}[\sqrt{\nu/V}] = \sqrt{\frac{\nu}{2}}\,\frac{\Gamma\left((\nu-1)/2\right)}{\Gamma(\nu/2)}.
\]

The remaining factor \(\mathbb{E}[R]\) depends on \(k\) and usually must be computed numerically.

### Variance
For \(\nu>2\), using \(\mathbb{E}[Q^2] = \mathbb{E}[R^2]\,\mathbb{E}[\nu/V]\) and \(\mathbb{E}[\nu/V] = \nu/(\nu-2)\):

\[
\mathrm{Var}(Q) = \mathbb{E}[R^2]\,\frac{\nu}{\nu-2} - \left(\mathbb{E}[R]\,\sqrt{\frac{\nu}{2}}\,\frac{\Gamma\left((\nu-1)/2\right)}{\Gamma(\nu/2)}\right)^2.
\]

### Likelihood
If you model observations \(q_1,\dots,q_n\) as i.i.d. from \(\mathrm{studentized\_range}(k,\nu)\), the likelihood is

\[
L(k,\nu\mid q_{1:n}) = \prod_{i=1}^n f_Q(q_i\,;k,\nu),
\]

and the log-likelihood is \(\ell = \sum_i \log f_Q(q_i\,;k,\nu)\).

In most classical applications \(k\) and \(\nu\) are determined by the experimental design (numbers of groups and error degrees of freedom), so MLE is less common. When fitting is needed, numerical optimization relies on evaluating `logpdf` via numerical integration.


In [None]:
# Demonstrate the moment factorization idea numerically

def inv_chi2_scaled_moment(df: float, m: float) -> float:
    # E[(df / V)^(m/2)] where V ~ chi2(df). Requires df > m.

    if not (np.isfinite(df) and df > m):
        return math.inf
    # (df/2)^(m/2) * Gamma((df-m)/2) / Gamma(df/2)
    return (df / 2) ** (m / 2) * math.exp(gammaln((df - m) / 2) - gammaln(df / 2))


def normal_range_moments(k: int, n: int = 200_000) -> tuple[float, float]:
    # Monte Carlo E[R] and E[R^2] for the range of k standard normals.

    z = rng.standard_normal((n, k))
    r = z.max(axis=1) - z.min(axis=1)
    return float(r.mean()), float((r**2).mean())


k, df = 5, 10
ER, ER2 = normal_range_moments(k)

a1 = inv_chi2_scaled_moment(df, m=1.0)  # E[(df/V)^(1/2)]
a2 = inv_chi2_scaled_moment(df, m=2.0)  # E[(df/V)^(1)]

mean_pred = ER * a1
var_pred = ER2 * a2 - mean_pred**2

mean_scipy = float(stats.studentized_range.mean(k, df))
var_scipy = float(stats.studentized_range.var(k, df))

print(f"k={k}, df={df}")
print("E[R]   (MC) ", ER)
print("E[R^2] (MC) ", ER2)
print("\nPredicted from factorization")
print("mean   ", mean_pred)
print("var    ", var_pred)
print("\nSciPy (numerical)")
print("mean   ", mean_scipy)
print("var    ", var_scipy)


## 7) Sampling & simulation (NumPy-only)

The defining construction suggests a direct simulator:

1. Draw \(Z_1,\dots,Z_k\overset{iid}{\sim}\mathcal{N}(0,1)\) and compute the range \(R=\max Z_i - \min Z_i\).
2. Draw \(V\sim\chi^2_\nu\) independently (equivalently, \(V\sim\mathrm{Gamma}(\nu/2,\;\text{scale}=2)\)).
3. Return \(Q = R / \sqrt{V/\nu}\).

This is **vectorizable**: we can draw a `(size, k)` array of normals and a `(size,)` array of chi-square variables, then compute everything in NumPy without Python loops.


In [None]:
def studentized_range_rvs_numpy(k: int, df: float, size: int, rng: np.random.Generator) -> np.ndarray:
    # NumPy-only sampling via the (range of normals) / sqrt(chi2/df) construction.

    if int(k) != k or k < 2:
        raise ValueError("k must be an integer >= 2")
    if not (np.isfinite(df) and df > 0):
        raise ValueError("df must be finite and > 0")
    if size < 1:
        raise ValueError("size must be >= 1")

    z = rng.standard_normal((size, k))
    r = z.max(axis=1) - z.min(axis=1)

    # Chi-square via Gamma(df/2, scale=2)
    v = rng.gamma(shape=df / 2, scale=2.0, size=size)
    q = r / np.sqrt(v / df)
    return q


k, df = 5, 10
n = 80_000

q_numpy = studentized_range_rvs_numpy(k, df, size=n, rng=rng)
q_scipy = stats.studentized_range.rvs(k, df, size=n, random_state=rng)

qs = [0.5, 0.9, 0.95, 0.99]

print(f"k={k}, df={df}")
print("quantiles\n  p     numpy      scipy      theory")
q_theory = stats.studentized_range.ppf(qs, k, df)
for p, a, b, t in zip(qs, np.quantile(q_numpy, qs), np.quantile(q_scipy, qs), q_theory):
    print(f"{p:>4.2f}  {a:>8.4f}  {b:>8.4f}  {t:>8.4f}")


## 8) Visualization (PDF, CDF, Monte Carlo)

We’ll compare:
- SciPy’s numerical `pdf`/`cdf`
- Monte Carlo samples from the NumPy-only sampler

This is a good sanity check because the PDF/CDF are computed via numerical integration and the distribution has heavy tails when \(\nu\) is small.


In [None]:
def ecdf(x: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
    x = np.sort(np.asarray(x))
    y = np.arange(1, x.size + 1) / x.size
    return x, y


k, df = 5, 10
n = 80_000

samples = studentized_range_rvs_numpy(k, df, size=n, rng=rng)

x = np.linspace(0, 8, 500)
pdf = stats.studentized_range.pdf(x, k, df)
cdf = stats.studentized_range.cdf(x, k, df)

# PDF vs histogram
fig = px.histogram(
    samples,
    nbins=80,
    histnorm="probability density",
    title=f"studentized_range: Monte Carlo vs PDF (k={k}, df={df})",
)
fig.add_trace(go.Scatter(x=x, y=pdf, mode="lines", name="SciPy pdf"))
fig.update_layout(xaxis_title="q", yaxis_title="density")
fig.show()

# CDF vs ECDF
xs, ys = ecdf(samples)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=cdf, mode="lines", name="SciPy cdf"))
fig.add_trace(go.Scatter(x=xs[::200], y=ys[::200], mode="markers", name="ECDF (subsampled)"))
fig.update_layout(
    title=f"studentized_range: ECDF vs CDF (k={k}, df={df})",
    xaxis_title="q",
    yaxis_title="probability",
)
fig.show()


## 9) SciPy integration (`scipy.stats.studentized_range`)

SciPy exposes the distribution as a *continuous* distribution object:

- `stats.studentized_range.pdf(x, k, df)`
- `stats.studentized_range.cdf(x, k, df)`
- `stats.studentized_range.ppf(p, k, df)` (quantiles / critical values)
- `stats.studentized_range.rvs(k, df, size=..., random_state=...)`
- `stats.studentized_range.fit(data, ...)` (MLE; can be slow)

Like most SciPy continuous distributions, it also supports `loc` and `scale`.


In [None]:
k, df = 5, 10
x = np.linspace(0, 8, 5)

print("x         ", x)
print("pdf(x)    ", stats.studentized_range.pdf(x, k, df))
print("cdf(x)    ", stats.studentized_range.cdf(x, k, df))
print("sf(x)     ", stats.studentized_range.sf(x, k, df))

# Sampling
samp = stats.studentized_range.rvs(k, df, size=5, random_state=rng)
print("\nsample   ", samp)

# Critical value / quantile (common in Tukey-style tests)
alpha = 0.05
qcrit = stats.studentized_range.ppf(1 - alpha, k, df)
print(f"\nqcrit (1-alpha={1-alpha:.2f}) = {qcrit:.4f}")

# Fitting (MLE) can be slow because logpdf involves numerical integration.
# We'll fit df on a tiny synthetic sample and keep k fixed.
k_true, df_true = 5, 12
data = stats.studentized_range.rvs(k_true, df_true, size=10, random_state=rng)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    k_hat, df_hat, loc_hat, scale_hat = stats.studentized_range.fit(
        data,
        f0=k_true,  # fix k
        floc=0,
        fscale=1,
    )

print("\nFit (k fixed)")
print("k_hat    ", k_hat)
print("df_hat   ", df_hat)
print("loc_hat  ", loc_hat)
print("scale_hat", scale_hat)


## 10) Statistical use cases

### Hypothesis testing (Tukey’s HSD intuition)
In a balanced one-way ANOVA with \(k\) groups of size \(n\), define:

- group means \(\bar{Y}_1,\dots,\bar{Y}_k\)
- pooled error standard deviation \(s_p\)
- error degrees of freedom \(\nu\)

Tukey’s HSD compares pairwise mean differences using a critical value from the studentized range distribution:

\[
\text{HSD} = q_{1-\alpha}(k,\nu)\,\frac{s_p}{\sqrt{n}},
\]

where \(q_{1-\alpha}(k,\nu)\) is the \((1-\alpha)\)-quantile of `studentized_range(k, nu)`.

A pair \((i,j)\) is flagged when

\[
|\bar{Y}_i - \bar{Y}_j| > \text{HSD}.
\]

### Bayesian modeling
In Bayesian ANOVA / hierarchical normal models, practitioners often compute the *posterior* of a studentized range-like statistic by **sampling from the posterior** of group means and \(\sigma\), then computing the range/scale ratio on each draw. The classical studentized range distribution is useful as a **reference null** and for posterior predictive checks.

### Generative modeling
The studentized range is a distribution over a *summary statistic* (a standardized range). It can be used as:
- a synthetic-data generator for stress-testing multiple-comparison pipelines,
- a component in simulation-based calibration (e.g., generate \(Q\) under the null),
- a target distribution in approximate Bayesian computation (ABC) when ranges are part of the chosen summaries.


In [None]:
# Mini demo: Tukey-style thresholding in a balanced one-way layout

alpha = 0.05
k = 4
n = 12

# Simulate k groups under the null (all means equal)
y = rng.normal(loc=0.0, scale=1.0, size=(k, n))

group_means = y.mean(axis=1)
df_error = k * (n - 1)
ss_within = ((y - group_means[:, None]) ** 2).sum()
sp = math.sqrt(ss_within / df_error)

qcrit = stats.studentized_range.ppf(1 - alpha, k, df_error)
hsd = qcrit * sp / math.sqrt(n)

print(f"k={k}, n={n}, df_error={df_error}")
print("group means:", group_means)
print(f"sp (pooled sd) = {sp:.4f}")
print(f"qcrit          = {qcrit:.4f}")
print(f"HSD threshold  = {hsd:.4f}\n")

# Pairwise comparisons
pairs = []
for i in range(k):
    for j in range(i + 1, k):
        diff = abs(group_means[i] - group_means[j])
        q = diff / (sp / math.sqrt(n))
        p_adj = stats.studentized_range.sf(q, k, df_error)
        reject = diff > hsd
        pairs.append((i, j, diff, q, p_adj, reject))

print("i  j    |diff|      q     p_adj   reject")
for i, j, diff, q, p_adj, reject in pairs:
    print(f"{i}  {j}  {diff:8.4f}  {q:7.3f}  {p_adj:7.4f}   {reject}")


## 11) Pitfalls

- **Parameter validity**: interpret \(k\) as an integer \(\ge 2\); require \(\nu>0\). (SciPy allows non-integer `k` because it treats it as a continuous shape parameter, but the classical meaning is “number of groups”.)
- **Moment existence**: mean/variance/skew/kurtosis only exist when \(\nu\) exceeds 1/2/3/4 respectively.
- **Numerical integration**: `pdf`, `cdf`, and especially `fit` may be slower than for simpler distributions and can emit integration warnings for some parameter/data combinations.
- **Tail behavior**: small \(\nu\) gives heavy right tails; Monte Carlo estimates of extreme quantiles need large sample sizes.
- **Modeling assumption**: the distribution is rooted in normality + common variance assumptions; violations (heteroskedasticity, non-normal errors, unbalanced designs) can make Tukey-style procedures inaccurate.


## 12) Summary

- `studentized_range(k, df)` is the distribution of the **range of \(k\)** standard normals divided by an independent **estimated scale** \(\sqrt{\chi^2_{df}/df}\).
- It underpins **Tukey’s HSD** and related multiple-comparison procedures.
- The PDF/CDF are typically defined and evaluated via **numerical integration**.
- Moments exist only up to order \(m<df\); the inverse-\(\chi^2\) part yields a clean Gamma-function factor.
- Sampling is straightforward from the defining construction and can be implemented in **NumPy only**.

**References (starting points)**
- SciPy documentation: `scipy.stats.studentized_range`
- Standard treatments in multiple comparisons / ANOVA texts (Tukey HSD and the studentized range statistic)
