# Kendall’s Tau (Rank Correlation) — Measure + Hypothesis Test

Kendall’s tau answers a concrete *ordering* question:

> If I pick **two observations** at random, how often do **x and y agree** on which one is larger?

It’s a **non-parametric** measure of *monotonic* association (excellent for **ordinal data**), and it naturally supports a hypothesis test for **association / independence**.

---

## Learning goals

By the end you can:

- explain *concordant* vs *discordant* pairs (the entire statistic is built from this)
- compute $\tau$ (tau-a and tau-b) from scratch with NumPy
- run and interpret a **permutation test** for `H0: no association`
- interpret $\tau$ as a *probability difference* (not “percent correlation”)


In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)


## When to use Kendall’s tau

Use Kendall’s tau when:

- your variables are **ordinal** (ranks, ratings, Likert scales) or you mostly trust the **ordering**
- you expect a **monotonic** relationship (increasing/decreasing, not necessarily linear)
- you want something fairly **robust to outliers** compared to Pearson correlation

Common alternatives:

- **Pearson**: measures *linear* association (sensitive to outliers; assumes more structure)
- **Spearman’s rho**: correlation of ranks (also monotonic; different weighting than tau)

Kendall’s tau is often the most interpretable when you want to reason in terms of **pairwise ordering agreement**.


## 1) The core idea: concordant vs discordant pairs

Take any pair of observations $(i, j)$.

Define the pairwise differences:

- $\Delta x = x_i - x_j$
- $\Delta y = y_i - y_j$

Look at the signs:

- if $\Delta x$ and $\Delta y$ have the **same sign**, the pair is **concordant**
- if they have **opposite signs**, the pair is **discordant**
- if either difference is 0, you have a **tie** in $x$, $y$, or both

A convenient encoding is the *pair contribution*:

$$
\operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j)
\in \{-1, 0, +1\}
$$

Summing those contributions over all pairs gives Kendall’s **S** statistic:

$$
S = \sum_{i<j} \operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j)
$$

From $S$ we get **tau-a** (no tie correction):

$$
\tau_a = \frac{S}{\binom{n}{2}}
$$

And **tau-b** (tie-corrected; usually preferred for ordinal/discrete data):

$$
\tau_b =
\frac{S}{\sqrt{\left(\binom{n}{2} - n_1\right)\left(\binom{n}{2} - n_2\right)}}
$$

where:

- $n_1$ is the number of pairs tied in $x$
- $n_2$ is the number of pairs tied in $y$

Under **independence**, $S$ (and therefore $\tau$) is centered around 0.


In [None]:
def _clean_xy(x, y):
    """Return 1D arrays with rows containing NaNs removed."""

    x = np.asarray(x)
    y = np.asarray(y)

    if x.shape != y.shape:
        raise ValueError(f"x and y must have the same shape, got {x.shape} and {y.shape}.")

    x = np.ravel(x)
    y = np.ravel(y)

    # np.isnan works for numeric dtypes; for non-numeric inputs this will raise.
    mask = ~(np.isnan(x) | np.isnan(y))
    return x[mask], y[mask]


def kendall_pair_counts(x, y):
    """Compute concordant/discordant/tie counts for Kendall's tau.

    Returns a dict with:
    - n: number of observations
    - n_pairs: number of pairs (n choose 2)
    - C: #concordant
    - D: #discordant
    - T_x: #ties in x only
    - T_y: #ties in y only
    - T_xy: #ties in both x and y
    - S: C - D

    This is an O(n^2) reference implementation meant for learning.
    """

    x, y = _clean_xy(x, y)
    n = x.size

    if n < 2:
        return dict(n=int(n), n_pairs=0, C=0, D=0, T_x=0, T_y=0, T_xy=0, S=0)

    i, j = np.triu_indices(n, k=1)
    dx = x[i] - x[j]
    dy = y[i] - y[j]

    sx = np.sign(dx)
    sy = np.sign(dy)

    prod = sx * sy

    C = int(np.sum(prod > 0))
    D = int(np.sum(prod < 0))

    T_x = int(np.sum((sx == 0) & (sy != 0)))
    T_y = int(np.sum((sy == 0) & (sx != 0)))
    T_xy = int(np.sum((sx == 0) & (sy == 0)))

    S = C - D

    return dict(
        n=int(n),
        n_pairs=int(i.size),
        C=C,
        D=D,
        T_x=T_x,
        T_y=T_y,
        T_xy=T_xy,
        S=int(S),
    )


def kendall_tau_a(x, y):
    """Kendall's tau-a (no tie correction)."""

    counts = kendall_pair_counts(x, y)
    n_pairs = counts["n_pairs"]
    if n_pairs == 0:
        return np.nan, counts

    tau = counts["S"] / n_pairs
    return float(tau), counts


def kendall_tau_b(x, y):
    """Kendall's tau-b (tie-corrected)."""

    counts = kendall_pair_counts(x, y)

    C = counts["C"]
    D = counts["D"]
    T_x = counts["T_x"]
    T_y = counts["T_y"]

    denom = np.sqrt((C + D + T_x) * (C + D + T_y))
    tau = counts["S"] / denom if denom != 0 else np.nan

    counts = {**counts, "denom": float(denom)}
    return (float(tau) if np.isfinite(tau) else np.nan), counts


## 2) A tiny example you can *see*

We’ll use a small dataset so we can reason about pairs directly.

- the scatter plot shows the data
- the bar chart shows how many pairs are concordant vs discordant vs tied


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 4, 3, 5])  # one inversion (3 and 4 swap)

tau_b, counts = kendall_tau_b(x, y)

print(f"tau-b = {tau_b:.3f}")
counts


In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=x,
        y=y,
        mode="markers+text",
        text=[str(i) for i in range(len(x))],
        textposition="top center",
        marker=dict(size=10),
    )
)
fig.update_layout(
    title="Tiny example (point labels are indices)",
    xaxis_title="x",
    yaxis_title="y",
)
fig.show()

labels = ["Concordant (C)", "Discordant (D)", "Tie in x (T_x)", "Tie in y (T_y)", "Tie in both (T_xy)"]
values = [counts["C"], counts["D"], counts["T_x"], counts["T_y"], counts["T_xy"]]

fig = px.bar(
    x=labels,
    y=values,
    title="Pair types that build Kendall’s tau",
    labels={"x": "pair type", "y": "count"},
)
fig.update_layout(xaxis_tickangle=-20)
fig

### Interpreting the sign and magnitude

- **sign**: $\tau > 0$ means *larger x tends to come with larger y* (monotone increasing); $\tau < 0$ means the opposite.
- **magnitude**: in the no-ties (continuous) case, $\tau$ has a clean probability interpretation:

$$
\tau = P(\text{concordant}) - P(\text{discordant})
$$

So if $\tau = 0.30$ (and there are no ties), concordance is about **30 percentage points** more likely than discordance for a randomly chosen pair.

Important nuance:

- **Independence implies** $\tau = 0$, but **$\tau = 0$ does not necessarily imply independence**. It means “no *monotone* tendency detected by this statistic”.


In [None]:
# Among comparable (non-tied) pairs, what fraction are concordant vs discordant?
comparable = counts["C"] + counts["D"]

p_conc = counts["C"] / comparable
p_disc = counts["D"] / comparable

print(f"Comparable pairs: {comparable} / {counts['n_pairs']} total")
print(f"P(concordant | comparable) = {p_conc:.3f}")
print(f"P(discordant | comparable) = {p_disc:.3f}")


## 3) Ties (and why tau-b exists)

With ordinal/discrete data, ties are common. Ties create a practical issue:

- tau-a divides by the total number of pairs $\binom{n}{2}$, even though many pairs might be “uninformative” because of ties
- as a result, even a perfectly monotone relationship with many ties can have $|\tau_a| < 1$

Tau-b fixes this by rescaling based on how many pairs are actually *comparable* in $x$ and in $y$.


In [None]:
# An ordinal-ish example with ties
x_tie = np.array([1, 1, 2, 2, 3, 3])
y_tie = np.array([1, 1, 2, 3, 3, 3])

tau_a, counts_a = kendall_tau_a(x_tie, y_tie)
tau_b, counts_b = kendall_tau_b(x_tie, y_tie)

print(f"tau-a = {tau_a:.3f}")
print(f"tau-b = {tau_b:.3f}")
counts_b


In [None]:
# Visualize ties with a little jitter so points don't sit exactly on top of each other
jitter = 0.06
xj = x_tie + rng.normal(0, jitter, size=x_tie.size)
yj = y_tie + rng.normal(0, jitter, size=y_tie.size)

fig = px.scatter(
    x=xj,
    y=yj,
    title="Ordinal data with ties (visualized with small jitter)",
    labels={"x": "x (jittered)", "y": "y (jittered)"},
)
fig.add_annotation(
    x=0.02,
    y=0.98,
    xref="paper",
    yref="paper",
    showarrow=False,
    align="left",
    text=f"tau-a = {tau_a:.3f}<br>tau-b = {tau_b:.3f}",
)
fig

## 4) Visual intuition: the concordance matrix

For a small dataset, you can visualize *every pair’s contribution*.

We build a matrix:

$$
M_{ij} = \operatorname{sign}(x_i - x_j)\;\operatorname{sign}(y_i - y_j)
$$

- `+1` (red) means the pair is **concordant**
- `-1` (blue) means the pair is **discordant**
- `0` (white) means a **tie** in x or y (or the diagonal)

This is *literally* what $S$ sums over (for $i < j$).


In [None]:
n_small = 12
x_small = np.arange(n_small)
# Mostly increasing, but with noise so we get some discordant pairs
y_small = x_small + rng.normal(0, 2.0, size=n_small)

# Build the full matrix for visualization (O(n^2) but tiny here)
dx = x_small[:, None] - x_small[None, :]
dy = y_small[:, None] - y_small[None, :]
M = np.sign(dx) * np.sign(dy)
np.fill_diagonal(M, 0)

fig = px.imshow(
    M,
    zmin=-1,
    zmax=1,
    color_continuous_scale="RdBu",
    title="Concordance matrix M (red=concordant, blue=discordant)",
    labels=dict(x="j", y="i", color="sign"),
)
fig.update_layout(coloraxis_colorbar=dict(tickvals=[-1, 0, 1]))
fig.show()

# Check that summing the upper triangle matches S
_, counts_small = kendall_tau_a(x_small, y_small)
S_from_matrix = int(np.sum(np.triu(M, k=1)))
print("S (from counts):", counts_small["S"])
print("S (from matrix):", S_from_matrix)


## 5) Tau cares about *order*, not the scale

Because tau is built from comparisons (`x_i > x_j`?), it is invariant to **strictly monotone transformations**.

Example: if you replace $x$ with $\exp(x)$ (strictly increasing), the ordering doesn’t change — and tau doesn’t change.

This is a big reason tau is popular for:

- ordinal scales
- heavy-tailed data
- relationships that are monotonic but not linear


In [None]:
# Nonlinear but monotonic relationship
n = 80
x_nl = rng.normal(size=n)
y_nl = x_nl**3 + rng.normal(0, 1.5, size=n)

tau_raw, _ = kendall_tau_b(x_nl, y_nl)
tau_exp, _ = kendall_tau_b(np.exp(x_nl), y_nl)

pearson = np.corrcoef(x_nl, y_nl)[0, 1]

print(f"Kendall tau-b (x, y)      = {tau_raw:.3f}")
print(f"Kendall tau-b (exp(x), y) = {tau_exp:.3f}")
print(f"Pearson corr (x, y)       = {pearson:.3f}")

fig = px.scatter(
    x=x_nl,
    y=y_nl,
    title="Monotonic but nonlinear relationship (y = x^3 + noise)",
    labels={"x": "x", "y": "y"},
)
fig.add_annotation(
    x=0.02,
    y=0.98,
    xref="paper",
    yref="paper",
    showarrow=False,
    align="left",
    text=f"Kendall tau-b = {tau_raw:.3f}<br>Pearson r = {pearson:.3f}",
)
fig

## 6) Hypothesis testing: is the association *more than chance*?

A common hypothesis test is:

- **H0**: $X$ and $Y$ are independent (no association)  
  (under H0 the expected tau is 0)
- **H1**: there is an association (two-sided), or specifically increasing/decreasing (one-sided)

### Permutation test (recommended for learning and for small samples)

Under H0, the pairing between x and y is arbitrary.

So we:

1. compute the observed $\tau$
2. repeatedly **permute y** (break any real association)
3. recompute $\tau$ for each permutation
4. see how extreme the observed $\tau$ is relative to this null distribution

This uses only the assumption of **exchangeability under H0** (a good match for independent observations).


In [None]:
def kendall_permutation_test(
    x,
    y,
    *,
    n_resamples=2000,
    alternative="two-sided",
    rng=None,
):
    """Permutation test for association using Kendall's tau-b.

    Returns (tau_obs, p_value, tau_perm).

    alternative:
    - "two-sided": |tau_perm| >= |tau_obs|
    - "greater": tau_perm >= tau_obs
    - "less": tau_perm <= tau_obs
    """

    if rng is None:
        rng = np.random.default_rng()

    x, y = _clean_xy(x, y)

    tau_obs, _ = kendall_tau_b(x, y)
    if not np.isfinite(tau_obs):
        raise ValueError("Observed tau is not finite; check for constant inputs or all-ties.")

    tau_perm = np.empty(n_resamples, dtype=float)
    for b in range(n_resamples):
        y_perm = rng.permutation(y)
        tau_perm[b], _ = kendall_tau_b(x, y_perm)

    alternative = alternative.lower()
    if alternative == "two-sided":
        extreme = np.abs(tau_perm) >= abs(tau_obs)
    elif alternative == "greater":
        extreme = tau_perm >= tau_obs
    elif alternative == "less":
        extreme = tau_perm <= tau_obs
    else:
        raise ValueError("alternative must be one of: 'two-sided', 'greater', 'less'.")

    # +1 smoothing avoids p=0
    p_value = (np.sum(extreme) + 1) / (n_resamples + 1)
    return float(tau_obs), float(p_value), tau_perm


# Example: moderately monotone association
n = 60
x_ex = rng.normal(size=n)
y_ex = np.tanh(1.2 * x_ex) + rng.normal(0, 0.35, size=n)

tau_obs, p_value, tau_null = kendall_permutation_test(x_ex, y_ex, n_resamples=3000, rng=rng)

print(f"Observed tau-b = {tau_obs:.3f}")
print(f"Permutation p-value (two-sided) = {p_value:.4f}")

fig = px.histogram(
    tau_null,
    nbins=50,
    title="Permutation null distribution of Kendall tau-b (H0: independence)",
    labels={"value": "tau-b (permuted)"},
)
fig.add_vline(
    x=tau_obs,
    line_width=3,
    line_color="crimson",
    annotation_text=f"observed tau = {tau_obs:.3f}",
    annotation_position="top",
)
fig.add_vline(
    x=-tau_obs,
    line_width=3,
    line_color="crimson",
    line_dash="dot",
    annotation_text="-observed",
    annotation_position="top",
)
fig

### Interpreting the test

- A **small p-value** means the observed ordering agreement (tau) is unlikely under H0.
- Always report **tau itself** as the effect size.

Two practical reminders:

1. With large samples, even tiny tau values can be “statistically significant”.
2. With many ties (discrete data), prefer **tau-b** and/or a **permutation test**.


## 7) (Optional) Large-sample normal approximation (no ties)

For the *continuous* case (no ties), Kendall’s $S$ has an asymptotic normal distribution under H0.

For sample size $n$ (no ties):

$$
\operatorname{Var}(S) = \frac{n(n-1)(2n+5)}{18}
$$

So a z-score is:

$$
Z = \frac{S}{\sqrt{\operatorname{Var}(S)}}
$$

This is fast, but:

- it’s only accurate-ish for larger $n$
- tie corrections make the variance formula more complicated
- permutation tests are usually easier to trust when learning


In [None]:
import math


def _normal_cdf(z):
    return 0.5 * (1.0 + math.erf(z / math.sqrt(2.0)))


def kendall_tau_a_asymptotic_test(x, y, *, alternative="two-sided"):
    """Asymptotic z-test using S variance formula (no ties)."""

    _, counts = kendall_tau_a(x, y)
    n = counts["n"]
    S = counts["S"]

    if n < 2:
        return np.nan, np.nan

    var_s = n * (n - 1) * (2 * n + 5) / 18
    z = S / math.sqrt(var_s)

    alternative = alternative.lower()
    if alternative == "two-sided":
        p = 2 * (1 - _normal_cdf(abs(z)))
    elif alternative == "greater":
        p = 1 - _normal_cdf(z)
    elif alternative == "less":
        p = _normal_cdf(z)
    else:
        raise ValueError("alternative must be one of: 'two-sided', 'greater', 'less'.")

    return float(z), float(p)


# Compare (roughly) to permutation on the same example when there are effectively no ties
z, p_asym = kendall_tau_a_asymptotic_test(x_ex, y_ex)
print(f"Asymptotic z (tau-a) = {z:.3f}")
print(f"Asymptotic p-value   = {p_asym:.4f}")


## 8) Bootstrap confidence interval (effect size uncertainty)

A p-value answers “is it plausible tau is 0?”, but you often also want:

- an uncertainty interval for tau itself

A simple approach is a **bootstrap**:

1. resample the dataset with replacement
2. recompute tau for each bootstrap sample
3. take percentiles for a CI

(Like the permutation test, this is straightforward to implement and visualize.)


In [None]:
def bootstrap_tau_b(x, y, *, n_boot=2000, ci=0.95, rng=None):
    if rng is None:
        rng = np.random.default_rng()

    x, y = _clean_xy(x, y)
    n = x.size

    tau_samples = np.empty(n_boot, dtype=float)
    for b in range(n_boot):
        idx = rng.integers(0, n, size=n)
        tau_samples[b], _ = kendall_tau_b(x[idx], y[idx])

    alpha = 1 - ci
    lo = np.quantile(tau_samples, alpha / 2)
    hi = np.quantile(tau_samples, 1 - alpha / 2)
    return tau_samples, float(lo), float(hi)


tau_boot, lo, hi = bootstrap_tau_b(x_ex, y_ex, n_boot=3000, rng=rng)
print(f"Bootstrap 95% CI for tau-b: [{lo:.3f}, {hi:.3f}]")

fig = px.histogram(
    tau_boot,
    nbins=60,
    title="Bootstrap distribution of Kendall tau-b",
    labels={"value": "tau-b (bootstrap)"},
)
fig.add_vline(x=lo, line_color="black", line_dash="dot", annotation_text="CI low")
fig.add_vline(x=hi, line_color="black", line_dash="dot", annotation_text="CI high")
fig.add_vline(x=tau_obs, line_color="crimson", annotation_text="observed")
fig

## 9) Diagnostics and pitfalls

- **Independence of observations** matters. If you have time series or repeated measures, tau’s usual p-values can be misleading.
- **Ties** are common in ordinal data → prefer **tau-b**.
- **Effect size vs significance**: don’t stop at “p < 0.05”. Report tau (and ideally a CI).
- **Complexity**: this reference implementation is $O(n^2)$. For large datasets, use an optimized library implementation.


## Exercises

1. Create a dataset where Pearson correlation is near 0 but Kendall tau is clearly non-zero (hint: monotone but nonlinear).
2. Modify `kendall_permutation_test` to support a *one-sided* alternative and verify it behaves as expected.
3. Stress-test the $O(n^2)$ implementation with increasing `n` and plot runtime.

## References

- Kendall, M. (1938). *A New Measure of Rank Correlation*.
- SciPy: `scipy.stats.kendalltau` (for a production-ready implementation and additional details).
