# class_likelihood_ratios (LR+ / LR-)

Compute the **positive** and **negative likelihood ratios** for a **binary** classifier.

In scikit-learn this is `sklearn.metrics.class_likelihood_ratios`.

## Learning goals
- Derive \(LR_+\) and \(LR_-\) from the confusion matrix
- Interpret them as **odds multipliers** (pre-test \(\to\) post-test probabilities)
- Implement the metric from scratch in NumPy (weights + label ordering)
- Visualize how likelihood ratios change with the decision threshold
- Use likelihood ratios to pick an operating point (screening vs confirmation)

## Prerequisites
- Confusion matrix, sensitivity/specificity
- Basic Bayes rule / odds
- Logistic regression + ROC curves (helpful, but not required)

In [None]:
import warnings

import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from sklearn.datasets import make_classification
from sklearn.metrics import class_likelihood_ratios, roc_curve
from sklearn.model_selection import train_test_split

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)

## 1) Definition: likelihood ratios as conditional probability ratios

Treat a classifier's prediction as a diagnostic test:

- **test positive** \(\iff\) predict the positive class
- **test negative** \(\iff\) predict the negative class

The likelihood ratios compare how often the test is positive/negative under each true class:

\[
LR_+ = \frac{P(\hat{y}=1 \mid y=1)}{P(\hat{y}=1 \mid y=0)}
\qquad
LR_- = \frac{P(\hat{y}=0 \mid y=1)}{P(\hat{y}=0 \mid y=0)}.
\]

**Why this is useful:** in *odds form* Bayes rule becomes a multiplication.

Define odds for a probability \(p\):

\[
\operatorname{odds}(p) = \frac{p}{1-p}.
\]

Then the update is:

\[
\operatorname{odds}(y=1 \mid \text{test}+) = \operatorname{odds}(y=1)\cdot LR_+,
\]

\[
\operatorname{odds}(y=1 \mid \text{test}-) = \operatorname{odds}(y=1)\cdot LR_-.
\]

Converting odds back to probability:

\[
p = \frac{\operatorname{odds}}{1 + \operatorname{odds}}.
\]

Equivalently in log-odds:

\[
\operatorname{logit}(p_{post}) = \operatorname{logit}(p_{pre}) + \log(LR).
\]

**Key point:** \(LR_+\) and \(LR_-\) are functions of *sensitivity* and *specificity* (not prevalence),
but turning them into **post-test probabilities** requires a **prior** (pre-test probability).

## 2) From confusion matrix to \(LR_+\) and \(LR_-\)

For a binary classifier with positive class \(y=1\) and negative class \(y=0\):

\[
\begin{array}{c|cc}
& \hat{y}=0 & \hat{y}=1\\\hline
y=0 & TN & FP\\
y=1 & FN & TP
\end{array}
\]

Define:

- **Sensitivity / recall / true positive rate (TPR)**

  \[
  \text{TPR} = \frac{TP}{TP+FN}
  \]

- **Specificity / true negative rate (TNR)**

  \[
  \text{TNR} = \frac{TN}{TN+FP}
  \]

- **False positive rate (FPR)**: \(\text{FPR} = 1-\text{TNR} = \frac{FP}{TN+FP}\)
- **False negative rate (FNR)**: \(\text{FNR} = 1-\text{TPR} = \frac{FN}{TP+FN}\)

Then:

\[
LR_+ = \frac{\text{TPR}}{\text{FPR}} = \frac{\text{sensitivity}}{1-\text{specificity}}
\]

\[
LR_- = \frac{\text{FNR}}{\text{TNR}} = \frac{1-\text{sensitivity}}{\text{specificity}}.
\]

A commonly used single-number summary is the **diagnostic odds ratio**:

\[
\text{DOR} = \frac{LR_+}{LR_-} = \frac{TP\cdot TN}{FP\cdot FN},
\]

but note it can be undefined/infinite when \(FP=0\) or \(FN=0\).

In [None]:
def _infer_binary_labels(y_true, y_pred, labels=None):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)

    if labels is None:
        labels = np.unique(np.concatenate([np.unique(y_true), np.unique(y_pred)]))
        if labels.shape[0] != 2:
            raise ValueError(f"Expected 2 labels for binary classification, got {labels!r}")
        labels = np.sort(labels)  # sklearn default
    else:
        labels = np.asarray(labels)
        if labels.shape[0] != 2:
            raise ValueError("labels must be of length 2: [negative_class, positive_class]")

    neg_label, pos_label = labels[0], labels[1]
    return neg_label, pos_label


def confusion_counts_binary(y_true, y_pred, *, labels=None, sample_weight=None):
    '''Return (tp, fp, tn, fn) as floats.'''
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    neg_label, pos_label = _infer_binary_labels(y_true, y_pred, labels=labels)

    if sample_weight is None:
        w = np.ones_like(y_true, dtype=float)
    else:
        w = np.asarray(sample_weight, dtype=float)
        if w.shape != y_true.shape:
            raise ValueError("sample_weight must have shape (n_samples,)")

    is_pos_true = y_true == pos_label
    is_pos_pred = y_pred == pos_label

    tp = np.sum(w * (is_pos_true & is_pos_pred))
    fp = np.sum(w * (~is_pos_true & is_pos_pred))
    tn = np.sum(w * (~is_pos_true & ~is_pos_pred))
    fn = np.sum(w * (is_pos_true & ~is_pos_pred))

    return float(tp), float(fp), float(tn), float(fn)


def class_likelihood_ratios_numpy(
    y_true,
    y_pred,
    *,
    labels=None,
    sample_weight=None,
    raise_warning=True,
):
    '''NumPy implementation matching sklearn.metrics.class_likelihood_ratios.'''
    tp, fp, tn, fn = confusion_counts_binary(
        y_true, y_pred, labels=labels, sample_weight=sample_weight
    )

    pos_total = tp + fn
    neg_total = tn + fp

    if pos_total == 0 or neg_total == 0:
        if raise_warning:
            warnings.warn(
                "No positive or no negative samples in y_true; likelihood ratios are undefined.",
                UserWarning,
            )
        return (np.nan, np.nan)

    tpr = tp / pos_total
    fnr = fn / pos_total
    fpr = fp / neg_total
    tnr = tn / neg_total

    lr_plus = np.nan
    lr_minus = np.nan

    if fpr == 0:
        if raise_warning:
            warnings.warn("When false positive == 0, the positive likelihood ratio is undefined.")
    else:
        lr_plus = tpr / fpr

    if tnr == 0:
        if raise_warning:
            warnings.warn("When true negative == 0, the negative likelihood ratio is undefined.")
    else:
        lr_minus = fnr / tnr

    return (lr_plus, lr_minus)

In [None]:
# Quick sanity checks vs scikit-learn

y_true = [0, 1, 0, 1, 0]
y_pred = [1, 1, 0, 0, 0]

print("sklearn:", class_likelihood_ratios(y_true, y_pred))
print("numpy :", class_likelihood_ratios_numpy(y_true, y_pred, raise_warning=False))

y_true = np.array(["non-cat", "cat", "non-cat", "cat", "non-cat"])
y_pred = np.array(["cat", "cat", "non-cat", "non-cat", "non-cat"])

print()
print("Default label order (sorted):")
print("sklearn:", class_likelihood_ratios(y_true, y_pred))

print()
print("Explicit labels=[negative, positive]:")
print("sklearn:", class_likelihood_ratios(y_true, y_pred, labels=["non-cat", "cat"]))


## 3) Interpretation and common pitfalls

**Valid ranges (for a useful classifier):**

- \(LR_+ \ge 1\). Values close to 1 mean “a positive prediction barely changes the odds”.
- \(0 \le LR_- \le 1\). Values close to 1 mean “a negative prediction barely changes the odds”.

If you ever see \(LR_+ < 1\) or \(LR_- > 1\), the classifier is often behaving like it has the labels flipped
(or your `labels=[negative, positive]` ordering is wrong).

**Rule-of-thumb strength of evidence (very domain dependent):**

| Evidence | \(LR_+\) | \(LR_-\) |
|---|---:|---:|
| small | 2–5 | 0.5–0.2 |
| moderate | 5–10 | 0.2–0.1 |
| large | > 10 | < 0.1 |

**Pitfalls**
- The metric needs **hard predictions** (class labels). If your model outputs probabilities, you must choose a **threshold** first.
- \(LR_+\) is undefined when \(FP=0\) (\(\text{FPR}=0\)). \(LR_-\) is undefined when \(TN=0\) (\(\text{TNR}=0\)).
  Small datasets can make this happen easily.
- Multi-class problems need a one-vs-rest reduction; scikit-learn's `class_likelihood_ratios` is **binary-only**.

In [None]:
def odds(p):
    p = np.asarray(p)
    return p / (1.0 - p)


def prob_from_odds(o):
    o = np.asarray(o)
    return o / (1.0 + o)


def update_probability(p_pre, lr):
    '''Bayes update in odds form.'''
    return prob_from_odds(odds(p_pre) * lr)


p_pre = np.linspace(0.001, 0.999, 400)

lr_plus_values = [2, 5, 10]
lr_minus_values = [0.5, 0.2, 0.1]

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        "Post-test probability after a POSITIVE prediction (use LR+)",
        "Post-test probability after a NEGATIVE prediction (use LR-)",
    ),
)

for lr in lr_plus_values:
    fig.add_trace(
        go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR+={lr}"),
        row=1,
        col=1,
    )

for lr in lr_minus_values:
    fig.add_trace(
        go.Scatter(x=p_pre, y=update_probability(p_pre, lr), mode="lines", name=f"LR-={lr}"),
        row=1,
        col=2,
    )

# Reference line: no change
fig.add_trace(
    go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), name="no change"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=p_pre, y=p_pre, mode="lines", line=dict(dash="dash"), showlegend=False),
    row=1,
    col=2,
)

fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=1)
fig.update_xaxes(title_text="pre-test probability", range=[0, 1], row=1, col=2)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=1)
fig.update_yaxes(title_text="post-test probability", range=[0, 1], row=1, col=2)

fig.update_layout(width=1000, height=420)
fig

## 4) Threshold dependence and ROC geometry

If your model outputs a **score** or **probability** \(\hat{p}\), you get hard predictions via a threshold \(t\):

\[
\hat{y}(t) = \mathbb{1}[\hat{p} \ge t].
\]

So \(LR_+\) and \(LR_-\) are *functions of the threshold*.

On the ROC plane (x = FPR, y = TPR) for a particular threshold:

- \(LR_+ = \frac{\text{TPR}}{\text{FPR}}\) is the **slope of the line from \((0,0)\)** to the ROC point.
- \(LR_- = \frac{1-\text{TPR}}{1-\text{FPR}}\) is the **slope of the line from \((1,1)\)** to the ROC point.

This makes the metric visually interpretable: to get a large \(LR_+\) you want a ROC point that is steep above the origin;
to get a small \(LR_-\) you want a point close to the top-left.

In [None]:
# Synthetic 2D dataset for visualization
X, y = make_classification(
    n_samples=2200,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    n_clusters_per_class=1,
    class_sep=1.2,
    flip_y=0.05,
    random_state=7,
)

X_train_val, X_test, y_train_val, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=7
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_val, y_train_val, test_size=0.25, stratify=y_train_val, random_state=7
)

# Standardize (helps gradient descent)
mean_ = X_train.mean(axis=0)
std_ = X_train.std(axis=0)
X_train_s = (X_train - mean_) / std_
X_val_s = (X_val - mean_) / std_
X_test_s = (X_test - mean_) / std_


def sigmoid(z):
    # Stable sigmoid
    z = np.asarray(z)
    out = np.empty_like(z, dtype=float)
    pos = z >= 0
    out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))
    ez = np.exp(z[~pos])
    out[~pos] = ez / (1.0 + ez)
    return out


def fit_logreg_gd(X, y, *, lr=0.15, n_steps=2500, l2=0.01, seed=7):
    rng_local = np.random.default_rng(seed)
    n, d = X.shape
    w = rng_local.normal(scale=0.1, size=d)
    b = 0.0

    eps = 1e-12
    losses = []

    for step in range(n_steps):
        z = X @ w + b
        p = sigmoid(z)

        # Binary cross-entropy + L2
        loss = -np.mean(y * np.log(p + eps) + (1 - y) * np.log(1 - p + eps)) + 0.5 * l2 * np.sum(w * w)

        # Gradients
        grad_w = (X.T @ (p - y)) / n + l2 * w
        grad_b = np.mean(p - y)

        w -= lr * grad_w
        b -= lr * grad_b

        if step % 25 == 0:
            losses.append(loss)

    return w, b, np.array(losses)


w, b, losses = fit_logreg_gd(X_train_s, y_train)

fig = go.Figure()
fig.add_trace(go.Scatter(y=losses, mode="lines", name="train loss"))
fig.update_layout(
    title="Logistic regression from scratch (gradient descent)",
    xaxis_title="checkpoint (every 25 steps)",
    yaxis_title="cross-entropy loss",
    width=900,
    height=380,
)
fig.show()

p_val = sigmoid(X_val_s @ w + b)

In [None]:
# Probability distributions by class (validation set)

df = {
    "p_hat": p_val,
    "y": y_val.astype(int),
}

fig = px.histogram(
    df,
    x="p_hat",
    color="y",
    nbins=50,
    opacity=0.6,
    barmode="overlay",
    histnorm="probability",
    title="Predicted probabilities by true class (validation set)",
    labels={"p_hat": "predicted P(y=1|x)", "y": "true class"},
)
fig.update_layout(width=900, height=420)
fig

In [None]:
def sweep_thresholds(y_true, y_proba, thresholds):
    rows = []
    for t in thresholds:
        y_pred = (y_proba >= t).astype(int)
        tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])

        pos_total = tp + fn
        neg_total = tn + fp

        tpr = tp / pos_total if pos_total > 0 else np.nan
        fnr = fn / pos_total if pos_total > 0 else np.nan
        fpr = fp / neg_total if neg_total > 0 else np.nan
        tnr = tn / neg_total if neg_total > 0 else np.nan

        lr_plus = tpr / fpr if (np.isfinite(fpr) and fpr > 0) else np.nan
        lr_minus = fnr / tnr if (np.isfinite(tnr) and tnr > 0) else np.nan

        dor = (
            lr_plus / lr_minus
            if (np.isfinite(lr_plus) and np.isfinite(lr_minus) and lr_plus > 0 and lr_minus > 0)
            else np.nan
        )

        rows.append((t, tp, fp, tn, fn, tpr, tnr, lr_plus, lr_minus, dor))

    arr = np.array(rows, dtype=float)
    return {
        "threshold": arr[:, 0],
        "tp": arr[:, 1],
        "fp": arr[:, 2],
        "tn": arr[:, 3],
        "fn": arr[:, 4],
        "tpr": arr[:, 5],
        "tnr": arr[:, 6],
        "lr_plus": arr[:, 7],
        "lr_minus": arr[:, 8],
        "dor": arr[:, 9],
    }


thresholds = np.linspace(0.01, 0.99, 99)
sweep = sweep_thresholds(y_val, p_val, thresholds)


def pick_operating_points(sweep, *, min_sensitivity=0.95, min_specificity=0.95):
    thresholds = sweep["threshold"]

    sens = sweep["tpr"]  # sensitivity
    spec = sweep["tnr"]  # specificity
    lr_plus = sweep["lr_plus"]
    lr_minus = sweep["lr_minus"]

    # A generic way to combine LR+ and LR- into one objective: diagnostic odds ratio (DOR)
    # Fallback: Youden's J = sensitivity + specificity - 1 (always defined as long as rates are defined)
    dor = sweep["dor"]
    youden_j = sens + spec - 1

    if np.any(np.isfinite(dor)):
        t_best = thresholds[np.nanargmax(dor)]
        best_label = "max DOR"
    else:
        t_best = thresholds[np.nanargmax(youden_j)]
        best_label = "max Youden J (fallback)"

    # Screening: prioritize ruling OUT => minimize LR- while keeping sensitivity high
    mask_screen = (sens >= min_sensitivity) & np.isfinite(lr_minus)
    if mask_screen.any():
        t_screen = thresholds[mask_screen][np.nanargmin(lr_minus[mask_screen])]
    else:
        t_screen = thresholds[np.nanargmin(lr_minus)]

    # Confirmation: prioritize ruling IN => maximize LR+ while keeping specificity high
    mask_confirm = (spec >= min_specificity) & np.isfinite(lr_plus)
    if mask_confirm.any():
        t_confirm = thresholds[mask_confirm][np.nanargmax(lr_plus[mask_confirm])]
    else:
        t_confirm = thresholds[np.nanargmax(lr_plus)]

    return t_best, t_screen, t_confirm, best_label


t_best, t_screen, t_confirm, best_label = pick_operating_points(sweep)

(t_best, t_screen, t_confirm, best_label)


In [None]:
def _vline(fig, x, *, label, color):
    fig.add_vline(x=x, line_width=2, line_dash="dash", line_color=color)
    fig.add_annotation(
        x=x,
        y=1.02,
        xref="x",
        yref="paper",
        text=label,
        showarrow=False,
        font=dict(color=color),
    )


fig = make_subplots(rows=1, cols=2, subplot_titles=("LR+ vs threshold", "LR- vs threshold"))

fig.add_trace(
    go.Scatter(x=sweep["threshold"], y=sweep["lr_plus"], mode="lines", name="LR+"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=sweep["threshold"], y=sweep["lr_minus"], mode="lines", name="LR-"),
    row=1,
    col=2,
)

for x, label, color in [
    (t_best, best_label, "#1f77b4"),
    (t_screen, "screening", "#2ca02c"),
    (t_confirm, "confirm", "#d62728"),
]:
    _vline(fig, x, label=label, color=color)

fig.update_yaxes(type="log", row=1, col=1)
fig.update_yaxes(type="log", row=1, col=2)
fig.update_xaxes(title_text="threshold t", row=1, col=1)
fig.update_xaxes(title_text="threshold t", row=1, col=2)
fig.update_yaxes(title_text="LR+ (log scale)", row=1, col=1)
fig.update_yaxes(title_text="LR- (log scale)", row=1, col=2)

fig.update_layout(width=1000, height=420)
fig

In [None]:
# ROC curve (validation set) + geometric interpretation of LR

fpr, tpr, thr = roc_curve(y_val, p_val)

fig = go.Figure()
fig.add_trace(go.Scatter(x=fpr, y=tpr, mode="lines", name="ROC"))
fig.add_trace(
    go.Scatter(x=[0, 1], y=[0, 1], mode="lines", line=dict(dash="dash"), name="random")
)

# Get the ROC point closest to our chosen threshold t_best
# (roc_curve returns thresholds in decreasing order)
idx = np.argmin(np.abs(thr - t_best))
x_pt, y_pt = fpr[idx], tpr[idx]

# LR slopes at that operating point
lr_plus = y_pt / x_pt if x_pt > 0 else np.inf
lr_minus = (1 - y_pt) / (1 - x_pt) if (1 - x_pt) > 0 else np.inf

fig.add_trace(
    go.Scatter(
        x=[x_pt],
        y=[y_pt],
        mode="markers",
        marker=dict(size=10, color="#1f77b4"),
        name=f"t≈{t_best:.2f}",
    )
)

# Lines showing the slopes
fig.add_trace(
    go.Scatter(x=[0, x_pt], y=[0, y_pt], mode="lines", line=dict(color="#1f77b4"), showlegend=False)
)
fig.add_trace(
    go.Scatter(x=[1, x_pt], y=[1, y_pt], mode="lines", line=dict(color="#d62728"), showlegend=False)
)

fig.update_layout(
    title=f"ROC geometry at t≈{t_best:.2f}:  LR+≈{lr_plus:.2f},  LR-≈{lr_minus:.2f}",
    xaxis_title="FPR",
    yaxis_title="TPR",
    width=900,
    height=500,
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 1]),
)
fig

In [None]:
def metrics_at_threshold(y_true, y_proba, t):
    y_pred = (y_proba >= t).astype(int)
    lr_p, lr_m = class_likelihood_ratios_numpy(y_true, y_pred, labels=[0, 1], raise_warning=False)
    tp, fp, tn, fn = confusion_counts_binary(y_true, y_pred, labels=[0, 1])
    tpr = tp / (tp + fn)
    tnr = tn / (tn + fp)
    return {
        "t": t,
        "tp": tp,
        "fp": fp,
        "tn": tn,
        "fn": fn,
        "tpr": tpr,
        "tnr": tnr,
        "lr_plus": lr_p,
        "lr_minus": lr_m,
    }


m_best = metrics_at_threshold(y_val, p_val, t_best)
m_screen = metrics_at_threshold(y_val, p_val, t_screen)
m_confirm = metrics_at_threshold(y_val, p_val, t_confirm)

m_best, m_screen, m_confirm

In [None]:
fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=(
        f"Screening (t={t_screen:.2f})",
        f"Max DOR (t={t_best:.2f})",
        f"Confirm (t={t_confirm:.2f})",
    ),
)

for j, m in enumerate([m_screen, m_best, m_confirm], start=1):
    cm = np.array([[m["tn"], m["fp"]], [m["fn"], m["tp"]]], dtype=float)
    fig.add_trace(
        go.Heatmap(
            z=cm,
            x=["pred 0", "pred 1"],
            y=["true 0", "true 1"],
            colorscale="Blues",
            showscale=False,
            text=cm.astype(int),
            texttemplate="%{text}",
            textfont=dict(size=16),
        ),
        row=1,
        col=j,
    )

fig.update_layout(
    width=1050,
    height=420,
    title="Confusion matrices at three operating points (validation set)",
)
fig

In [None]:
# How the chosen operating point changes post-test probability

p_pre = 0.10  # example prior/prevalence

for name, m in [("screening", m_screen), ("max_dor", m_best), ("confirm", m_confirm)]:
    p_pos = update_probability(p_pre, m["lr_plus"])   # after a positive prediction
    p_neg = update_probability(p_pre, m["lr_minus"])  # after a negative prediction
    print(
        f"{name:9s}  t={m['t']:.2f}  LR+={m['lr_plus']:.2f}  LR-={m['lr_minus']:.2f}  "
        f"p(y=1|+)= {p_pos:.3f}  p(y=1|-)= {p_neg:.3f}"
    )

## 5) Using likelihood ratios to optimize a simple algorithm

\(LR_+\) and \(LR_-\) are defined through *counts* (TP/FP/TN/FN), so they are **not differentiable** w.r.t. model parameters.

A common workflow is therefore:

1) Train a probabilistic model (e.g. logistic regression) using a differentiable loss (cross-entropy)
2) Use likelihood ratios on a **validation set** to pick an operating point (decision threshold)

Example strategies:
- **Screening test** (rule out): pick a threshold with high sensitivity and minimal \(LR_-\)
- **Confirmatory test** (rule in): pick a threshold with high specificity and maximal \(LR_+\)
- **Single-number optimization**: maximize DOR = \(LR_+/LR_-\) (useful, but can be unstable if FP or FN are small)

In [None]:
# Final check on a held-out test set using the max-DOR threshold from validation

p_test = sigmoid(X_test_s @ w + b)
y_pred_test = (p_test >= t_best).astype(int)

print("Test set LR (sklearn):", class_likelihood_ratios(y_test, y_pred_test, labels=[0, 1]))
print("Test set LR (numpy) :", class_likelihood_ratios_numpy(y_test, y_pred_test, labels=[0, 1], raise_warning=False))

## Pros / cons / when to use

**Pros**
- Interpretable: directly tells you how to update odds (prevalence + test result \(\to\) posterior)
- Uses sensitivity/specificity, so it is more stable across different prevalences than precision/NPV
- Naturally supports “rule-in” (large \(LR_+\)) vs “rule-out” (small \(LR_-\)) thinking

**Cons**
- Threshold-dependent and based on hard predictions (not a ranking metric like AUC)
- Can be undefined/infinite when \(FP=0\) or \(TN=0\), especially on small datasets
- Binary-only; multi-class needs one-vs-rest and careful reporting

**Good fits**
- Medical diagnostic tests, screening vs confirmation
- Any binary decision where base rate/prevalence is known or can be estimated and you need a domain-friendly “odds update” explanation

## Exercises
1) On a dataset you care about, sweep thresholds and compare:
   - max \(LR_+\) at specificity \(\ge 0.95\)
   - min \(LR_-\) at sensitivity \(\ge 0.95\)
   - max DOR
   Do these thresholds match what you would pick using accuracy or F1?

2) Implement one-vs-rest likelihood ratios for multi-class classification and report the per-class \(LR_+\) and \(LR_-\).

## References
- scikit-learn: `sklearn.metrics.class_likelihood_ratios`
- Wikipedia: https://en.wikipedia.org/wiki/Likelihood_ratios_in_diagnostic_testing