# Fisher’s Exact Test (2×2) — Intuition + NumPy Implementation

Fisher’s exact test answers a simple question:

> Given a **2×2 contingency table**, is there evidence that the two categorical variables are **associated** (not independent)?

It is especially useful when sample sizes are small (or expected counts are low), where large-sample approximations (like the chi-square test) can be unreliable.

## What you’ll learn
- when Fisher’s exact test is the right tool
- what “**exact**” means (conditioning on margins → hypergeometric distribution)
- how the **p-value** is constructed for one-sided vs two-sided tests
- a low-level **NumPy-only** implementation you can read end-to-end
- how to interpret the result (and what it does *not* tell you)

## Prerequisites
- basic probability (combinations)
- null/alternative hypotheses + p-values


In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.random.seed(42)

import sys
import plotly

print("python:", sys.version.split()[0])
print("numpy:", np.__version__)
print("plotly:", plotly.__version__)


## 1) When to use Fisher’s exact test

Use Fisher’s exact test when:
- you have **two categorical variables**, each with **two levels** (a 2×2 table)
- you want to test whether they are **independent**
- counts are small (or expected counts are low), and you want an **exact** p-value

Common examples:
- **A/B tests**: variant A vs B, conversion yes/no
- **clinical studies**: treatment vs control, improved yes/no
- **survey analysis**: group membership vs response category

Fisher’s exact test is valid for any sample size, but it’s most often chosen when the chi-square approximation is questionable.


## 2) The 2×2 table + hypotheses

We’ll write the 2×2 table like this:

|            | Outcome = 1 | Outcome = 0 |
|---|---:|---:|
| Group = 1  | a | b |
| Group = 0  | c | d |

- **Null hypothesis (H₀)**: the variables are **independent** (equivalently, the **odds ratio = 1**)
- **Alternative (H₁)** depends on the question:
  - `greater`: Group=1 has *higher* odds of Outcome=1 (odds ratio > 1)
  - `less`: Group=1 has *lower* odds of Outcome=1 (odds ratio < 1)
  - `two-sided`: any association (odds ratio ≠ 1)


In [None]:
# Example: treatment (1) vs control (0), success (1) vs failure (0)
treatment = np.array([1] * 10 + [0] * 6)
success = np.array([1] * 8 + [0] * 2 + [1] * 1 + [0] * 5)

a = int(np.sum((treatment == 1) & (success == 1)))
b = int(np.sum((treatment == 1) & (success == 0)))
c = int(np.sum((treatment == 0) & (success == 1)))
d = int(np.sum((treatment == 0) & (success == 0)))

table = np.array([[a, b], [c, d]], dtype=int)
table


In [None]:
row_labels = ["Treatment", "Control"]
col_labels = ["Success", "Failure"]

fig = px.imshow(
    table,
    text_auto=True,
    aspect="auto",
    x=col_labels,
    y=row_labels,
    color_continuous_scale="Blues",
    title="Observed 2×2 contingency table",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()

n = table.sum()
expected = np.outer(table.sum(axis=1), table.sum(axis=0)) / n

fig = px.imshow(
    expected,
    text_auto=".2f",
    aspect="auto",
    x=col_labels,
    y=row_labels,
    color_continuous_scale="Greens",
    title="Expected counts under independence (H₀)",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()


## 3) Effect size: the odds ratio

The **odds ratio (OR)** is a common effect size for 2×2 tables:

$$\text{OR} = \frac{a\,d}{b\,c}$$

Interpretation:
- OR = 1: no association (what H₀ asserts)
- OR > 1: Group=1 is *more likely* to have Outcome=1 (positive association)
- OR < 1: Group=1 is *less likely* to have Outcome=1 (negative association)

Fisher’s exact test gives you a **p-value** for the association. You usually report **both** the p-value and an effect size (like OR).


In [None]:
a, b, c, d = table.ravel()
num = a * d
den = b * c

odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)
odds_ratio


## 4) What makes it “exact”: conditioning on the margins

The key idea behind Fisher’s exact test is **conditioning on the margins** (row sums and column sums).

For a 2×2 table, if the margins are fixed:
- row sums: $r_1 = a+b$, $r_2 = c+d$
- column sums: $c_1 = a+c$, $c_2 = b+d$
- total: $n = r_1 + r_2$

…then the whole table is determined by just **one** number: the top-left cell $a$.

Under $H_0$ (independence) and **given the margins**, the distribution of $a$ is **hypergeometric**:

$$\mathbb{P}(A=a \mid r_1, c_1, n)
= \frac{\binom{c_1}{a}\,\binom{c_2}{r_1-a}}{\binom{n}{r_1}}$$

So we can compute probabilities of **all possible** 2×2 tables with these same margins — exactly.


In [None]:
def log_factorials_upto(n: int) -> np.ndarray:
    """Return log(k!) for k=0..n as a NumPy array."""
    n = int(n)
    log_fact = np.zeros(n + 1, dtype=float)
    if n >= 1:
        log_fact[1:] = np.cumsum(np.log(np.arange(1, n + 1)))
    return log_fact


def hypergeom_pmf_for_a_values(
    a_values: np.ndarray,
    *,
    r1: int,
    c1: int,
    n: int,
    log_fact=None,
) -> np.ndarray:
    """Hypergeometric PMF for A (top-left cell) given fixed margins."""
    a_values = np.asarray(a_values, dtype=int)
    r1 = int(r1)
    c1 = int(c1)
    n = int(n)
    c2 = n - c1

    if log_fact is None:
        log_fact = log_factorials_upto(n)

    def log_choose(n_: int, k_: np.ndarray) -> np.ndarray:
        return log_fact[n_] - log_fact[k_] - log_fact[n_ - k_]

    log_p = (
        log_choose(c1, a_values)
        + log_choose(c2, r1 - a_values)
        - log_choose(n, np.array(r1, dtype=int))
    )

    # Stabilize with log-sum-exp via a shift
    log_p = log_p - np.max(log_p)
    p = np.exp(log_p)
    return p / np.sum(p)


a_obs = int(table[0, 0])
r1 = int(table[0, :].sum())
c1 = int(table[:, 0].sum())
n = int(table.sum())
c2 = n - c1

a_min = max(0, r1 - c2)
a_max = min(r1, c1)
a_values = np.arange(a_min, a_max + 1)

pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n)

np.column_stack([a_values, pmf])


In [None]:
def odds_ratio_for_a_values(a_values: np.ndarray, *, r1: int, c1: int, n: int) -> np.ndarray:
    """Compute odds ratios for all feasible tables with varying a and fixed margins."""
    a_values = np.asarray(a_values, dtype=int)
    r1 = int(r1)
    c1 = int(c1)
    n = int(n)
    r2 = n - r1

    b = r1 - a_values
    c = c1 - a_values
    d = r2 - c

    num = a_values.astype(float) * d
    den = b.astype(float) * c

    or_ = np.full_like(num, np.nan, dtype=float)
    mask = den != 0
    or_[mask] = num[mask] / den[mask]
    or_[~mask & (num > 0)] = np.inf
    return or_


or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)

colors = np.where(a_values == a_obs, "#111111", "#636EFA")
fig = go.Figure(
    go.Bar(
        x=a_values,
        y=pmf,
        marker_color=colors,
        customdata=np.column_stack([or_values]),
        hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
    )
)
fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
fig.update_layout(
    title="All feasible tables (fixed margins) → hypergeometric PMF for a",
    xaxis_title="a = count in the top-left cell",
    yaxis_title="Probability under H₀ (conditional on margins)",
)
fig.show()


## 5) From the PMF to a p-value

Once we have the probability of every feasible table (given the margins), we can define “extreme” outcomes.

### One-sided p-values
- `greater`: sum probabilities of tables with **a ≥ a_obs** (more evidence of positive association)
- `less`: sum probabilities of tables with **a ≤ a_obs** (more evidence of negative association)

### Two-sided p-value (common definition)
For a two-sided test, Fisher’s exact test is **discrete**, so “two-sided” needs a precise definition.

A widely-used definition (including SciPy) is:

> Sum probabilities of all tables whose probability is **≤** the observed table’s probability.

This produces a symmetric p-value that includes both tails.


In [None]:
p_obs = pmf[a_obs - a_min]
p_greater = float(pmf[a_values >= a_obs].sum())
p_less = float(pmf[a_values <= a_obs].sum())
p_two_sided = float(pmf[pmf <= p_obs + 1e-12].sum())

p_greater, p_less, p_two_sided


## 6) Fisher’s exact test from scratch (NumPy-only)

Below is a complete implementation of Fisher’s exact test for a 2×2 table.

- It enumerates all feasible tables (via the feasible values of $a$).
- It computes the hypergeometric probabilities in a numerically stable way.
- It supports `greater`, `less`, and `two-sided`.


In [None]:
def fisher_exact_numpy(table: np.ndarray, alternative: str = "two-sided", return_details: bool = False):
    """Fisher's exact test for a 2x2 contingency table (NumPy-only).

    Parameters
    ----------
    table : array-like, shape (2, 2)
        Non-negative counts.
    alternative : {'two-sided', 'greater', 'less'}
        Defines the alternative hypothesis.
    return_details : bool
        If True, also return the enumerated support and PMF.

    Returns
    -------
    odds_ratio : float
    p_value : float
    details : dict (optional)
    """
    table = np.asarray(table, dtype=int)
    if table.shape != (2, 2):
        raise ValueError("table must be shape (2, 2)")
    if np.any(table < 0):
        raise ValueError("counts must be non-negative")

    a, b, c, d = table.ravel()
    r1 = int(a + b)
    r2 = int(c + d)
    c1 = int(a + c)
    c2 = int(b + d)
    n = int(r1 + r2)

    # Sample odds ratio (effect size)
    num = a * d
    den = b * c
    odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)

    # Enumerate feasible values of a given fixed margins
    a_min = max(0, r1 - c2)
    a_max = min(r1, c1)
    a_values = np.arange(a_min, a_max + 1)

    log_fact = log_factorials_upto(n)
    pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n, log_fact=log_fact)
    p_obs = pmf[int(a - a_min)]

    alt = alternative.lower().replace("_", "-").strip()
    if alt in {"greater", "right", "right-sided", "right sided"}:
        p_value = float(pmf[a_values >= a].sum())
    elif alt in {"less", "left", "left-sided", "left sided"}:
        p_value = float(pmf[a_values <= a].sum())
    elif alt in {"two-sided", "two sided"}:
        p_value = float(pmf[pmf <= p_obs + 1e-12].sum())
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    p_value = float(min(p_value, 1.0))

    if not return_details:
        return odds_ratio, p_value

    details = {
        "a_values": a_values,
        "pmf": pmf,
        "a_obs": int(a),
        "p_obs": float(p_obs),
        "margins": {"r1": r1, "r2": r2, "c1": c1, "c2": c2, "n": n},
    }
    return odds_ratio, p_value, details


for alt in ["greater", "less", "two-sided"]:
    or_, p_ = fisher_exact_numpy(table, alternative=alt)
    print(f"{alt:>9} | odds ratio = {or_:>6.3g} | p-value = {p_:.6f}")


In [None]:
# Optional: verify against SciPy (if installed)
try:
    from scipy.stats import fisher_exact

    for alt in ["greater", "less", "two-sided"]:
        or_scipy, p_scipy = fisher_exact(table, alternative=alt)
        or_np, p_np = fisher_exact_numpy(table, alternative=alt)
        print(f"{alt:>9} | scipy p={p_scipy:.6f} | numpy p={p_np:.6f} | scipy OR={or_scipy:.3g}")
except Exception as e:
    print("SciPy check skipped:", e)


## 7) Visualizing “extreme” tables (greater / less / two-sided)

The plots below show which feasible tables are counted in the p-value.

- **Gray** bars: feasible tables that are *not* counted
- **Red** bars: feasible tables that *are* counted for that alternative
- The vertical dashed line marks the observed value $a_{obs}$


In [None]:
def plot_pmf_with_rejection_region(details: dict, *, alternative: str) -> go.Figure:
    a_values = details["a_values"]
    pmf = details["pmf"]
    a_obs = details["a_obs"]
    p_obs = details["p_obs"]
    r1 = details["margins"]["r1"]
    c1 = details["margins"]["c1"]
    n = details["margins"]["n"]

    or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)

    alt = alternative.lower().replace("_", "-").strip()
    if alt in {"greater", "right", "right-sided", "right sided"}:
        mask = a_values >= a_obs
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    elif alt in {"less", "left", "left-sided", "left sided"}:
        mask = a_values <= a_obs
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    elif alt in {"two-sided", "two sided"}:
        mask = pmf <= p_obs + 1e-12
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    colors = np.where(mask, "#EF553B", "#B0B0B0")

    fig = go.Figure(
        go.Bar(
            x=a_values,
            y=pmf,
            marker_color=colors,
            customdata=np.column_stack([or_values]),
            hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
        )
    )
    fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
    fig.update_layout(
        title=title,
        xaxis_title="a (top-left cell)",
        yaxis_title="Probability under H₀ (conditional on margins)",
    )
    return fig


_, _, details = fisher_exact_numpy(table, return_details=True)

for alt in ["greater", "less", "two-sided"]:
    plot_pmf_with_rejection_region(details, alternative=alt).show()


## 8) How to interpret Fisher’s exact test

### What the p-value means here
With Fisher’s exact test, the p-value is:

> The probability (under **H₀: independence**) of observing a table **at least as extreme** as the one you saw,
> **given that the margins are fixed**.

So:
- a **small** p-value suggests the observed association would be rare under independence → evidence against H₀
- a **large** p-value means the data are not surprising under independence → *not enough evidence* to reject H₀

### What it does *not* mean
- It does **not** say “the probability H₀ is true”.
- It does **not** tell you the *size* of the association (use OR / risk ratio / risk difference for that).

A good report typically includes:
- the 2×2 table
- the odds ratio (effect size)
- the p-value (and the chosen alternative)


## 9) A helpful sanity check: p-values under the null are discrete

Because Fisher’s exact test works with a **discrete** distribution (the hypergeometric), the set of possible p-values is discrete.

If we repeatedly sample tables from the null (with the same fixed margins) and compute the two-sided p-values, you’ll see spikes rather than a perfectly uniform distribution. The test remains valid (it is typically **conservative**).


In [None]:
r1 = details["margins"]["r1"]
c1 = details["margins"]["c1"]
n = details["margins"]["n"]
c2 = n - c1

a_values = details["a_values"]
pmf = details["pmf"]

two_sided_p_for_each_a = np.array([float(pmf[pmf <= p_i + 1e-12].sum()) for p_i in pmf])

n_sims = 20_000
a_sim = np.random.hypergeometric(ngood=c1, nbad=c2, nsample=r1, size=n_sims)
p_sim = two_sided_p_for_each_a[a_sim - a_values.min()]

alpha = 0.05
print("Pr(reject at alpha=0.05) under H0 (empirical):", float(np.mean(p_sim <= alpha)))

fig = px.histogram(
    p_sim,
    nbins=30,
    title="Two-sided Fisher exact p-values under H₀ (fixed margins)",
    labels={"value": "p-value"},
)
fig.add_vline(x=alpha, line_color="#EF553B", line_dash="dash")
fig.show()


## 10) Pitfalls + practical notes

- **Conditional on margins**: Fisher’s test conditions on fixed row/column totals. In some study designs (e.g., case–control), margins are naturally fixed; in others they aren’t, but the test is still commonly used.
- **Two-sided definition**: multiple “two-sided Fisher” definitions exist. Always specify which one you use (this notebook uses the common “probability ≤ observed probability” rule).
- **Zeros → infinite OR**: if a cell is 0, the sample odds ratio can be 0 or ∞. That’s not “wrong”, but interpret carefully and consider reporting confidence intervals with appropriate methods.
- **p-value vs effect size**: a tiny p-value can correspond to a small effect with large n; a large p-value can occur with a large OR but tiny n. Always look at the table and an effect size.
- **Multiple testing**: if you run many Fisher tests, adjust for multiple comparisons.


## 11) Exercises

1. Pick a different 2×2 table and compute Fisher’s exact p-values for `greater`, `less`, and `two-sided`.
2. Change the margins (row/column totals) while keeping the odds ratio similar — how does the p-value change?
3. For fixed margins, compute the PMF and plot it; identify which tables contribute to the two-sided p-value.

## References
- Fisher, R. A. (1922). On the interpretation of χ² from contingency tables.
- Hypergeometric distribution: see any standard probability text.
- SciPy: `scipy.stats.fisher_exact` (for a reference implementation).
