# AUC / ROC AUC (Area Under the ROC Curve)

AUC-ROC is a **ranking metric** for binary classification. It answers a simple question:

> If we randomly pick one positive and one negative example, what’s the probability the model assigns a **higher score** to the positive one?

This notebook focuses on **ROC AUC** (the most common meaning of “AUC” in classification).

**Goals**
- Build the ROC curve from the confusion matrix by sweeping a threshold
- Compute AUC via the trapezoidal rule (what `sklearn.metrics.auc` does)
- Implement `roc_curve` + `roc_auc_score` from scratch in NumPy
- Visualize why AUC is threshold-free and how it relates to ranking
- Use an AUC-inspired surrogate loss to train a simple linear model

**Quick import (scikit-learn)**
```python
from sklearn.metrics import roc_auc_score, roc_curve, auc
```

**Assumption throughout:** binary labels `y ∈ {0,1}` and a real-valued score `s(x)` where larger means “more positive”.

In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import auc, roc_auc_score, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)

## 1) Intuition: AUC is about *ordering*, not thresholds

Suppose your model outputs a score $s(x)$ (a probability, logit, margin, etc.).

- If we pick a threshold $\tau$, we turn scores into hard predictions.
- If we **sweep** $\tau$ from $+\infty$ down to $-\infty$, we trace out the ROC curve.
- AUC summarizes the entire ROC curve into one number in $[0,1]$.

Two key takeaways:

1. **Only ranking matters:** any strictly increasing transform $g$ preserves AUC.

   $$\mathrm{AUC}(s) = \mathrm{AUC}(g\circ s)$$

2. **Pairwise probability interpretation:**

   $$
   \mathrm{AUC}
   = \mathbb{P}(s(X^+) > s(X^-))
   + \tfrac{1}{2}\,\mathbb{P}(s(X^+) = s(X^-))
   $$

   where $X^+$ is a random positive example and $X^-$ is a random negative example.

In [None]:
# A toy "score" example: overlapping score distributions
n_pos, n_neg = 400, 600

pos_scores = rng.normal(loc=1.0, scale=1.0, size=n_pos)
neg_scores = rng.normal(loc=0.0, scale=1.0, size=n_neg)

y_true = np.r_[np.ones(n_pos, dtype=int), np.zeros(n_neg, dtype=int)]
y_score = np.r_[pos_scores, neg_scores]

toy_auc = roc_auc_score(y_true, y_score)
print(f"Toy ROC AUC = {toy_auc:.3f}")

fig = px.histogram(
    {"score": y_score, "class": np.where(y_true == 1, "positive", "negative")},
    x="score",
    color="class",
    opacity=0.6,
    barmode="overlay",
    marginal="box",
    title="Toy scores: overlap between classes",
    labels={"score": "model score s(x)", "class": "true class"},
)
fig.show()

## 2) From thresholds to the ROC curve

Given a threshold $\tau$, convert scores to predicted labels:

$$\hat{y}_i(\tau) = \mathbb{1}[s_i \ge \tau]$$

Confusion-matrix counts (as functions of $\tau$):

$$
\mathrm{TP}(\tau) = \sum_{i=1}^n \mathbb{1}[y_i=1 \land \hat{y}_i(\tau)=1]
\quad
\mathrm{FP}(\tau) = \sum_{i=1}^n \mathbb{1}[y_i=0 \land \hat{y}_i(\tau)=1]
$$
$$
\mathrm{TN}(\tau) = \sum_{i=1}^n \mathbb{1}[y_i=0 \land \hat{y}_i(\tau)=0]
\quad
\mathrm{FN}(\tau) = \sum_{i=1}^n \mathbb{1}[y_i=1 \land \hat{y}_i(\tau)=0]
$$

Normalize to rates:

$$
\mathrm{TPR}(\tau) = \frac{\mathrm{TP}(\tau)}{P}
\qquad
\mathrm{FPR}(\tau) = \frac{\mathrm{FP}(\tau)}{N}
$$

where $P$ is the number of positives and $N$ the number of negatives.

The ROC curve is the set of points:

$$\{(\mathrm{FPR}(\tau), \mathrm{TPR}(\tau)) : \tau \in \mathbb{R}\}$$

In [None]:
def confusion_counts_at_threshold(y_true, y_score, threshold):
    y_true = np.asarray(y_true).astype(int)
    y_score = np.asarray(y_score)
    y_pred = (y_score >= threshold).astype(int)

    tp = int(np.sum((y_true == 1) & (y_pred == 1)))
    fp = int(np.sum((y_true == 0) & (y_pred == 1)))
    tn = int(np.sum((y_true == 0) & (y_pred == 0)))
    fn = int(np.sum((y_true == 1) & (y_pred == 0)))
    return tp, fp, tn, fn


for thr in [2.0, 1.0, 0.0, -1.0]:
    tp, fp, tn, fn = confusion_counts_at_threshold(y_true, y_score, threshold=thr)
    tpr = tp / (tp + fn)
    fpr = fp / (fp + tn)
    print(f"thr={thr:>4}: TP={tp:>3} FP={fp:>3} TN={tn:>3} FN={fn:>3} | TPR={tpr:.3f} FPR={fpr:.3f}")

In [None]:
fpr, tpr, thresholds = roc_curve(y_true, y_score)
roc_auc = auc(fpr, tpr)

fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=fpr,
        y=tpr,
        mode="lines",
        name=f"ROC (AUC = {roc_auc:.3f})",
        line=dict(width=3),
    )
)

fig.add_trace(
    go.Scatter(
        x=[0, 1],
        y=[0, 1],
        mode="lines",
        name="Random (AUC = 0.5)",
        line=dict(dash="dash", color="gray"),
    )
)

# Color ROC points by threshold (skip the first threshold = +inf)
fig.add_trace(
    go.Scatter(
        x=fpr[1:],
        y=tpr[1:],
        mode="markers",
        name="Threshold points",
        marker=dict(
            size=6,
            color=thresholds[1:],
            colorscale="Viridis",
            showscale=True,
            colorbar=dict(title="threshold"),
        ),
    )
)

fig.update_layout(
    title="ROC curve (TPR vs FPR) from a threshold sweep",
    xaxis_title="False Positive Rate (FPR)",
    yaxis_title="True Positive Rate (TPR)",
    width=750,
    height=520,
)

# Make the plot square-ish so the diagonal really looks like 45 degrees
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()

## 3) AUC = area under the ROC curve (trapezoidal rule)

Once you have ROC points $(x_k, y_k)$ with $x_k$ = FPR and $y_k$ = TPR sorted by increasing FPR,
you can approximate the integral using trapezoids:

$$
\mathrm{AUC} \approx \sum_{k=0}^{m-2} (x_{k+1} - x_k)\,\frac{y_{k+1}+y_k}{2}
$$

This is exactly what `sklearn.metrics.auc(x, y)` computes (for any curve, not only ROC).

In [None]:
def auc_trapezoid(x, y):
    '''Compute area under a curve via the trapezoidal rule.

    Parameters
    ----------
    x, y : 1D arrays of same length
        x must be monotonically increasing.
    '''
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    if x.ndim != 1 or y.ndim != 1 or x.shape != y.shape:
        raise ValueError("x and y must be 1D arrays with the same shape")

    dx = np.diff(x)
    if np.any(dx < 0):
        raise ValueError("x must be monotonically increasing")

    return float(np.sum(dx * (y[1:] + y[:-1]) / 2.0))


print(f"sklearn auc(fpr,tpr)          = {auc(fpr, tpr):.10f}")
print(f"NumPy  auc_trapezoid(fpr,tpr) = {auc_trapezoid(fpr, tpr):.10f}")

## 4) From scratch: ROC curve and ROC AUC in NumPy

A convenient way to build the ROC curve:

1. Sort examples by score (descending).
2. Sweep a threshold from high to low (equivalently: move down the sorted list).
3. Track how many positives/negatives we’ve included so far.
4. Record $(\mathrm{FPR},\mathrm{TPR})$ whenever the score changes.

This produces the same step-shaped ROC curve you get from scikit-learn.

In [None]:
def roc_curve_numpy(y_true, y_score):
    '''Compute ROC curve (FPR, TPR, thresholds) for binary classification.

    Matches the standard approach used by scikit-learn:
    - thresholds are unique score values in descending order, with an initial +inf
    - the returned curve starts at (0,0)

    Parameters
    ----------
    y_true : array-like of shape (n,)
        Binary labels in {0,1}
    y_score : array-like of shape (n,)
        Scores where larger means more positive
    '''
    y_true = np.asarray(y_true)
    y_score = np.asarray(y_score)
    if y_true.shape != y_score.shape:
        raise ValueError("y_true and y_score must have the same shape")

    unique = np.unique(y_true)
    if not set(unique.tolist()).issubset({0, 1}):
        raise ValueError("y_true must be binary with values in {0,1}")

    y_true = y_true.astype(int)
    n_pos = int(np.sum(y_true == 1))
    n_neg = int(np.sum(y_true == 0))
    if n_pos == 0 or n_neg == 0:
        raise ValueError("ROC is undefined when only one class is present")

    # Sort by score descending (stable sort so ties are handled consistently)
    order = np.argsort(-y_score, kind="mergesort")
    y_true_sorted = y_true[order]
    y_score_sorted = y_score[order]

    # Indices where score changes (each threshold is a distinct score value)
    distinct = np.where(y_score_sorted[1:] != y_score_sorted[:-1])[0]
    threshold_idxs = np.r_[distinct, y_true_sorted.size - 1]

    tps = np.cumsum(y_true_sorted)[threshold_idxs]
    fps = (threshold_idxs + 1) - tps

    tpr = tps / n_pos
    fpr = fps / n_neg
    thresholds = y_score_sorted[threshold_idxs]

    # Prepend the (0,0) point with threshold = +inf
    tpr = np.r_[0.0, tpr]
    fpr = np.r_[0.0, fpr]
    thresholds = np.r_[np.inf, thresholds]

    return fpr, tpr, thresholds


def roc_auc_score_numpy(y_true, y_score):
    fpr, tpr, _ = roc_curve_numpy(y_true, y_score)
    return auc_trapezoid(fpr, tpr)


fpr_np, tpr_np, thr_np = roc_curve_numpy(y_true, y_score)
auc_np = roc_auc_score_numpy(y_true, y_score)

print(f"sklearn roc_auc_score = {roc_auc_score(y_true, y_score):.10f}")
print(f"NumPy   roc_auc_score = {auc_np:.10f}")

# Sanity check: same AUC
assert np.isclose(auc_np, roc_auc_score(y_true, y_score))

## 5) AUC as a rank statistic (Mann–Whitney / Wilcoxon)

The “pairwise probability” view can be written explicitly as:

$$
\mathrm{AUC}
= \frac{1}{PN}\sum_{i\in\mathcal{P}}\sum_{j\in\mathcal{N}}
\Big(\mathbb{1}[s_i > s_j] + \tfrac{1}{2}\,\mathbb{1}[s_i = s_j]\Big)
$$

There’s an equivalent computation using **ranks**.
Let $r_i$ be the rank of $s_i$ among all scores (rank 1 = smallest score), using **average ranks for ties**.
Then:

$$
\mathrm{AUC} = \frac{\sum_{i\in\mathcal{P}} r_i - \frac{P(P+1)}{2}}{PN}
$$

This is useful because it can be computed in $O(n\log n)$ via sorting.

In [None]:
def rankdata_average_ties(x):
    '''Rank data with average ranks for ties (1 = smallest).'''
    x = np.asarray(x)
    order = np.argsort(x, kind="mergesort")
    x_sorted = x[order]

    ranks_sorted = np.empty(x_sorted.shape[0], dtype=float)
    i = 0
    n = x_sorted.shape[0]
    while i < n:
        j = i
        while j + 1 < n and x_sorted[j + 1] == x_sorted[i]:
            j += 1

        # ranks are 1..n, positions i..j correspond to ranks (i+1)..(j+1)
        avg_rank = 0.5 * ((i + 1) + (j + 1))
        ranks_sorted[i : j + 1] = avg_rank
        i = j + 1

    ranks = np.empty_like(ranks_sorted)
    ranks[order] = ranks_sorted
    return ranks


def roc_auc_score_rank_numpy(y_true, y_score):
    y_true = np.asarray(y_true).astype(int)
    y_score = np.asarray(y_score)
    n_pos = int(np.sum(y_true == 1))
    n_neg = int(np.sum(y_true == 0))
    if n_pos == 0 or n_neg == 0:
        raise ValueError("ROC AUC is undefined when only one class is present")

    ranks = rankdata_average_ties(y_score)
    sum_pos_ranks = float(np.sum(ranks[y_true == 1]))
    return (sum_pos_ranks - n_pos * (n_pos + 1) / 2.0) / (n_pos * n_neg)


auc_rank = roc_auc_score_rank_numpy(y_true, y_score)
print(f"ROC AUC (rank formula) = {auc_rank:.10f}")
print(f"ROC AUC (ROC+trapz)    = {roc_auc_score_numpy(y_true, y_score):.10f}")
assert np.isclose(auc_rank, roc_auc_score_numpy(y_true, y_score))

# Direct pairwise computation: O(P*N), only feasible for small n
pos_idx = np.flatnonzero(y_true == 1)
neg_idx = np.flatnonzero(y_true == 0)

sub_idx = np.r_[
    rng.choice(pos_idx, size=60, replace=False),
    rng.choice(neg_idx, size=60, replace=False),
]
sub_idx = rng.permutation(sub_idx)

y_sub = y_true[sub_idx]
s_sub = y_score[sub_idx]

pos = s_sub[y_sub == 1]
neg = s_sub[y_sub == 0]

diffs = pos[:, None] - neg[None, :]
auc_pairwise = (np.sum(diffs > 0) + 0.5 * np.sum(diffs == 0)) / (pos.size * neg.size)

print(f"ROC AUC (pairwise, small) = {auc_pairwise:.10f}")
assert np.isclose(auc_pairwise, roc_auc_score_rank_numpy(y_sub, s_sub))

### Monotonic transforms don’t change AUC

Because AUC depends only on the ordering of scores, any strictly increasing transform keeps the ROC curve (and AUC) the same.

In [None]:
def softplus(z):
    # stable log(1 + exp(z))
    return np.logaddexp(0.0, z)


s = y_score
s_transformed = softplus(3.0 * s)  # strictly increasing

print(f"AUC(s)            = {roc_auc_score_numpy(y_true, s):.10f}")
print(f"AUC(softplus(3s)) = {roc_auc_score_numpy(y_true, s_transformed):.10f}")

## 6) A threshold view: how TPR and FPR change as you move the cutoff

ROC plots *TPR vs FPR*, but sometimes it’s useful to see both rates as explicit functions of the threshold $\tau$.

In [None]:
# Use the NumPy ROC implementation so we have (fpr,tpr,threshold) aligned
fpr_np, tpr_np, thr_np = roc_curve_numpy(y_true, y_score)

# Drop the first threshold (+inf) for plotting
thr_plot = thr_np[1:]
tpr_plot = tpr_np[1:]
fpr_plot = fpr_np[1:]

fig = go.Figure()
fig.add_trace(go.Scatter(x=thr_plot, y=tpr_plot, mode="lines+markers", name="TPR"))
fig.add_trace(go.Scatter(x=thr_plot, y=fpr_plot, mode="lines+markers", name="FPR"))

fig.update_layout(
    title="TPR and FPR as functions of the threshold",
    xaxis_title="threshold τ (higher = stricter)",
    yaxis_title="rate",
    width=800,
    height=450,
)

# thresholds are in descending score order; flip the x-axis so we read left→right as τ decreases
fig.update_xaxes(autorange="reversed")
fig.show()

## 7) Practical usage: ROC AUC for a classifier

In practice, you usually:

1. Train a model with a differentiable loss (log-loss, hinge, etc.).
2. Evaluate **ROC AUC** on validation/test using *scores* (`predict_proba` or `decision_function`).

**Important:** AUC expects *scores*, not hard class labels.

In [None]:
X, y = make_classification(
    n_samples=2500,
    n_features=10,
    n_informative=4,
    n_redundant=0,
    weights=[0.85, 0.15],
    class_sep=1.0,
    random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y, random_state=42
)

clf = make_pipeline(
    StandardScaler(),
    LogisticRegression(max_iter=2000),
)
clf.fit(X_train, y_train)

# Probability scores for the positive class
y_score_test = clf.predict_proba(X_test)[:, 1]

auc_test = roc_auc_score(y_test, y_score_test)
print(f"Test ROC AUC = {auc_test:.3f}")

fpr_test, tpr_test, _ = roc_curve(y_test, y_score_test)

fig = go.Figure()
fig.add_trace(go.Scatter(x=fpr_test, y=tpr_test, mode="lines", name=f"Test ROC (AUC={auc_test:.3f})"))
fig.add_trace(go.Scatter(x=[0, 1], y=[0, 1], mode="lines", line=dict(dash="dash", color="gray"), name="Random"))
fig.update_layout(
    title="ROC curve on a held-out test set",
    xaxis_title="False Positive Rate (FPR)",
    yaxis_title="True Positive Rate (TPR)",
    width=750,
    height=520,
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()

# Same AUC via our NumPy implementation
print(f"NumPy ROC AUC = {roc_auc_score_numpy(y_test, y_score_test):.10f}")

## 8) Using AUC for optimization (via a smooth surrogate)

The “true” AUC objective is based on an indicator:

$$
\max_w\; \frac{1}{PN}\sum_{i\in\mathcal{P}}\sum_{j\in\mathcal{N}} \mathbb{1}[s_w(x_i) > s_w(x_j)]
$$

This is hard to optimize directly because the indicator is non-differentiable and the double sum over all pairs is $O(PN)$.

A common trick: replace the indicator with a **pairwise surrogate loss**.
For a linear scoring model $s_w(x)=w^\top x$, define $d_{ij}=w^\top(x_i-x_j)$ and use the pairwise logistic loss:

$$
\ell(d) = \log(1+e^{-d})
$$

Then minimize:

$$
L(w)=\frac{1}{PN}\sum_{i\in\mathcal{P}}\sum_{j\in\mathcal{N}} \ell\big(w^\top(x_i-x_j)\big)
$$

The gradient of one pair is:

$$
\nabla_w\ell\big(w^\top(x_i-x_j)\big)
= -\sigma\big(-w^\top(x_i-x_j)\big)\,(x_i-x_j)
$$

where $\sigma(z)=\frac{1}{1+e^{-z}}$ is the sigmoid.

In code, we approximate the full double sum with **mini-batches of random positive-negative pairs**.

In [None]:
def sigmoid(z):
    z = np.asarray(z, dtype=float)
    out = np.empty_like(z)
    pos = z >= 0
    out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))
    exp_z = np.exp(z[~pos])
    out[~pos] = exp_z / (1.0 + exp_z)
    return out


def standardize_fit(X):
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    std = np.where(std == 0, 1.0, std)
    return mean, std


def standardize_transform(X, mean, std):
    return (X - mean) / std


def add_intercept(X):
    return np.concatenate([np.ones((X.shape[0], 1)), X], axis=1)


# Prepare a smaller dataset for fast training curves
X_small, y_small = make_classification(
    n_samples=1400,
    n_features=8,
    n_informative=3,
    n_redundant=0,
    weights=[0.9, 0.1],
    class_sep=0.9,
    flip_y=0.03,
    random_state=7,
)

X_tr, X_va, y_tr, y_va = train_test_split(
    X_small, y_small, test_size=0.35, stratify=y_small, random_state=7
)

mean, std = standardize_fit(X_tr)
X_tr = standardize_transform(X_tr, mean, std)
X_va = standardize_transform(X_va, mean, std)

X_tr = add_intercept(X_tr)
X_va = add_intercept(X_va)


def train_logistic_ce(X, y, X_val, y_val, lr=0.1, reg=1e-3, epochs=30, batch_size=256, seed=0):
    rng_local = np.random.default_rng(seed)
    n, d = X.shape
    w = np.zeros(d)
    auc_hist = []

    for _ in range(epochs):
        idx = rng_local.permutation(n)
        for start in range(0, n, batch_size):
            batch = idx[start : start + batch_size]
            xb = X[batch]
            yb = y[batch]

            logits = xb @ w
            p = sigmoid(logits)

            grad = (xb.T @ (p - yb)) / xb.shape[0] + reg * w
            w -= lr * grad

        auc_hist.append(roc_auc_score_numpy(y_val, X_val @ w))

    return w, np.array(auc_hist)


def train_auc_pairwise(
    X,
    y,
    X_val,
    y_val,
    lr=0.1,
    reg=1e-3,
    epochs=30,
    steps_per_epoch=200,
    batch_pairs=512,
    seed=0,
):
    rng_local = np.random.default_rng(seed)
    n, d = X.shape
    w = np.zeros(d)

    pos_idx = np.flatnonzero(y == 1)
    neg_idx = np.flatnonzero(y == 0)
    if pos_idx.size == 0 or neg_idx.size == 0:
        raise ValueError("Need both classes for AUC training")

    auc_hist = []

    for _ in range(epochs):
        for _ in range(steps_per_epoch):
            pi = rng_local.choice(pos_idx, size=batch_pairs, replace=True)
            ni = rng_local.choice(neg_idx, size=batch_pairs, replace=True)

            diff = X[pi] - X[ni]
            d_scores = diff @ w

            # loss per pair: log(1 + exp(-d)) = softplus(-d)
            # grad: -sigmoid(-d) * diff
            weight = sigmoid(-d_scores)
            grad = -(weight[:, None] * diff).mean(axis=0) + reg * w
            w -= lr * grad

        auc_hist.append(roc_auc_score_numpy(y_val, X_val @ w))

    return w, np.array(auc_hist)


w_ce, auc_ce = train_logistic_ce(
    X_tr,
    y_tr,
    X_va,
    y_va,
    lr=0.2,
    reg=1e-3,
    epochs=30,
    batch_size=256,
    seed=1,
)
w_auc, auc_auc = train_auc_pairwise(
    X_tr,
    y_tr,
    X_va,
    y_va,
    lr=0.2,
    reg=1e-3,
    epochs=30,
    steps_per_epoch=150,
    batch_pairs=512,
    seed=1,
)

print(f"Final val AUC (cross-entropy training) = {auc_ce[-1]:.3f}")
print(f"Final val AUC (pairwise AUC training)  = {auc_auc[-1]:.3f}")

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=auc_ce, mode="lines+markers", name="Train w/ log-loss (CE)"))
fig.add_trace(go.Scatter(y=auc_auc, mode="lines+markers", name="Train w/ pairwise AUC surrogate"))

fig.update_layout(
    title="Validation ROC AUC during training",
    xaxis_title="epoch",
    yaxis_title="ROC AUC",
    width=850,
    height=450,
    yaxis=dict(range=[0.45, 1.0]),
)
fig.show()

## 9) Pros, cons, and when to use ROC AUC

**Pros**
- **Threshold-free**: summarizes performance across all possible thresholds.
- **Ranking interpretation**: $\mathrm{AUC}\approx \mathbb{P}(s(X^+) > s(X^-))$ is intuitive.
- **Insensitive to score calibration**: if your ranking is good but probabilities aren’t calibrated, AUC can still be high.
- **Works well for model comparison** when you don’t yet know the operating threshold.

**Cons / pitfalls**
- **Not about calibration**: a model can have great AUC and terrible probability estimates.
- **May be misleading under heavy class imbalance** if you care about the *precision* regime (consider **PR AUC**).
- **Averages over regions you may not care about** (e.g., you might only tolerate FPR < 1%).
- **Hard to optimize directly**: the exact AUC objective is non-smooth and pairwise.
- **Depends on label meaning and score direction**: flipping the positive class changes AUC to $1-\mathrm{AUC}$.

**Good use cases**
- When you care about **ranking** (who is more likely positive) more than a single fixed decision threshold.
- When you’ll choose the operating threshold later (business constraints, costs, etc.).
- As a model-selection metric for classifiers that output usable scores.

## 10) Exercises

1. Create a highly imbalanced dataset (e.g. 1% positives) and compare ROC AUC vs PR AUC.
2. Show two models with the same ROC AUC but very different TPR when FPR < 0.01.
3. Extend the NumPy ROC code to support **sample weights**.
4. For the pairwise AUC training, try a hinge loss $\ell(d)=\max(0, 1-d)$ and compare curves.

## References

- scikit-learn: `roc_curve`, `roc_auc_score`, `auc`
  - https://scikit-learn.org/stable/modules/model_evaluation.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html
- Tom Fawcett (2006): *An introduction to ROC analysis*
- Wilcoxon / Mann–Whitney rank-sum statistic interpretation of AUC