# `fbeta_score` (Fβ score)

The **Fβ score** measures the quality of a **binary classifier** by combining **precision** and **recall** into a single number.

It generalizes the F1 score by letting you choose how much more you care about **recall** vs **precision**.

## Learning goals

- Define Fβ from the confusion matrix (math + intuition)
- Implement `fbeta_score` from scratch in NumPy (with edge cases)
- Visualize how **β** and the **decision threshold** change the score (Plotly)
- Use Fβ to *optimize* a simple classifier (threshold tuning + a smooth surrogate)

## Quick import (reference)

```python
from sklearn.metrics import fbeta_score
```


## Prerequisites

- Binary classification with labels in `{0, 1}` (we treat `1` as the **positive** class)
- Confusion matrix terms: TP, FP, FN, TN
- Basic NumPy

This notebook focuses on **binary** Fβ. For multiclass/multilabel, most libraries compute Fβ via one-vs-rest + averaging (micro/macro/weighted).


In [None]:
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
pio.templates.default = "plotly_white"

rng = np.random.default_rng(0)
np.set_printoptions(precision=4, suppress=True)


## Confusion matrix (binary)

Let:

- true labels: \(y \in \{0, 1\}\)
- predicted labels: \(\hat{y} \in \{0, 1\}\)
- `1` is the **positive** class

|              | \(\hat{y}=1\) | \(\hat{y}=0\) |
|--------------|--------------|--------------|
| \(y=1\)      | TP           | FN           |
| \(y=0\)      | FP           | TN           |

Important: **Fβ does not use TN**. That’s a feature (when you care mostly about the positive class), but also a limitation.


In [None]:
def sigmoid(z):
    z = np.asarray(z, dtype=float)
    z = np.clip(z, -50.0, 50.0)
    return 1.0 / (1.0 + np.exp(-z))


def safe_divide(numer, denom, *, zero_division=0.0):
    """Elementwise numer/denom with a configurable value when denom == 0."""
    numer = np.asarray(numer, dtype=float)
    denom = np.asarray(denom, dtype=float)
    out = np.full_like(numer + denom, fill_value=float(zero_division), dtype=float)
    np.divide(numer, denom, out=out, where=denom != 0)
    return out


def confusion_counts_binary(y_true, y_pred, *, pos_label=1):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    if y_true.shape != y_pred.shape:
        raise ValueError(f"shape mismatch: y_true{y_true.shape} vs y_pred{y_pred.shape}")

    y_true_pos = y_true == pos_label
    y_pred_pos = y_pred == pos_label

    tp = int(np.sum(y_true_pos & y_pred_pos))
    fp = int(np.sum(~y_true_pos & y_pred_pos))
    fn = int(np.sum(y_true_pos & ~y_pred_pos))
    tn = int(np.sum(~y_true_pos & ~y_pred_pos))
    return tp, fp, fn, tn


def precision_recall_fbeta_from_counts(tp, fp, fn, *, beta=1.0, zero_division=0.0):
    if beta <= 0:
        raise ValueError("beta must be > 0")
    beta2 = beta**2

    precision = float(safe_divide(tp, tp + fp, zero_division=zero_division))
    recall = float(safe_divide(tp, tp + fn, zero_division=zero_division))

    fbeta = float(
        safe_divide(
            (1.0 + beta2) * tp,
            (1.0 + beta2) * tp + beta2 * fn + fp,
            zero_division=zero_division,
        )
    )
    return precision, recall, fbeta


def precision_recall_fbeta(y_true, y_pred, *, beta=1.0, pos_label=1, zero_division=0.0):
    tp, fp, fn, tn = confusion_counts_binary(y_true, y_pred, pos_label=pos_label)
    return precision_recall_fbeta_from_counts(tp, fp, fn, beta=beta, zero_division=zero_division)


def fbeta_score_numpy(y_true, y_pred, *, beta=1.0, pos_label=1, zero_division=0.0):
    _, _, fbeta = precision_recall_fbeta(
        y_true, y_pred, beta=beta, pos_label=pos_label, zero_division=zero_division
    )
    return fbeta


## Precision, recall, and Fβ (math)

Precision and recall are:

$$
P = \frac{TP}{TP + FP}, \qquad R = \frac{TP}{TP + FN}
$$

The **Fβ score** is a weighted harmonic mean of \(P\) and \(R\):

$$
F_\beta = \frac{(1+\beta^2)PR}{\beta^2 P + R}
$$

A very useful confusion-matrix form is:

$$
F_\beta = \frac{(1+\beta^2)\,TP}{(1+\beta^2)\,TP + \beta^2\,FN + FP}
$$

**How β changes the trade-off**

- \(\beta = 1\) gives **F1** (precision and recall weighted equally)
- \(\beta > 1\) favors **recall** (it upweights FN by \(\beta^2\))
- \(\beta < 1\) favors **precision**


In [None]:
y_true = np.array([1, 1, 1, 0, 0, 0, 0])
y_pred = np.array([1, 0, 1, 1, 0, 0, 0])

for beta in [0.5, 1.0, 2.0]:
    p, r, f = precision_recall_fbeta(y_true, y_pred, beta=beta)
    print(f"beta={beta:>3}: precision={p:.3f}, recall={r:.3f}, Fbeta={f:.3f}")

try:
    from sklearn.metrics import fbeta_score as skl_fbeta_score

    print("\nscikit-learn check:")
    for beta in [0.5, 1.0, 2.0]:
        print(f"beta={beta:>3}: sklearn={skl_fbeta_score(y_true, y_pred, beta=beta):.3f}")
except Exception as e:
    print("\n(scikit-learn not available for comparison)")
    print("Reason:", repr(e))


## Scores vs labels: the role of the decision threshold

Many models output a **score** or **probability** \(s(x) \in [0,1]\), then convert it to a label using a threshold \(t\):

$$
\hat{y}(x) = \mathbb{1}[s(x) \ge t]
$$

- Increasing \(t\) usually increases **precision** (fewer predicted positives) but decreases **recall**.
- Since **Fβ depends on TP/FP/FN**, it depends on the choice of \(t\).

A very common workflow is:

1. Train a model with a differentiable loss (e.g., log loss)
2. Choose \(t\) on a validation set to maximize your target metric (e.g., F2)


In [None]:
def pr_fbeta_curve(y_true, y_score, *, beta=1.0, thresholds=None, zero_division=0.0):
    y_true = np.asarray(y_true).astype(int)
    y_score = np.asarray(y_score, dtype=float)
    if thresholds is None:
        thresholds = np.linspace(0.0, 1.0, 301)
    thresholds = np.asarray(thresholds, dtype=float)

    pred_pos = y_score[:, None] >= thresholds[None, :]
    y_pos = (y_true == 1)[:, None]

    tp = np.sum(pred_pos & y_pos, axis=0)
    fp = np.sum(pred_pos & ~y_pos, axis=0)
    fn = np.sum(~pred_pos & y_pos, axis=0)

    precision = safe_divide(tp, tp + fp, zero_division=zero_division)
    recall = safe_divide(tp, tp + fn, zero_division=zero_division)

    beta2 = beta**2
    fbeta = safe_divide(
        (1.0 + beta2) * tp,
        (1.0 + beta2) * tp + beta2 * fn + fp,
        zero_division=zero_division,
    )

    return thresholds, precision, recall, fbeta, tp, fp, fn


def best_threshold_for_fbeta(y_true, y_score, *, beta=1.0, thresholds=None, zero_division=0.0):
    thresholds, precision, recall, fbeta, tp, fp, fn = pr_fbeta_curve(
        y_true, y_score, beta=beta, thresholds=thresholds, zero_division=zero_division
    )
    i = int(np.nanargmax(fbeta))
    return {
        "threshold": float(thresholds[i]),
        "fbeta": float(fbeta[i]),
        "precision": float(precision[i]),
        "recall": float(recall[i]),
        "tp": int(tp[i]),
        "fp": int(fp[i]),
        "fn": int(fn[i]),
        "index": i,
        "thresholds": thresholds,
        "precision_curve": precision,
        "recall_curve": recall,
        "fbeta_curve": fbeta,
    }


In [None]:
# Toy example: probability-like scores with overlap + class imbalance
n_pos, n_neg = 180, 820
y_true = np.r_[np.ones(n_pos, dtype=int), np.zeros(n_neg, dtype=int)]
y_score = np.r_[rng.beta(5, 2, size=n_pos), rng.beta(2, 5, size=n_neg)]

perm = rng.permutation(len(y_true))
y_true, y_score = y_true[perm], y_score[perm]

thresholds = np.linspace(0.0, 1.0, 301)
_, precision, recall, _, _, _, _ = pr_fbeta_curve(y_true, y_score, beta=1.0, thresholds=thresholds)

best_05 = best_threshold_for_fbeta(y_true, y_score, beta=0.5, thresholds=thresholds)
best_1 = best_threshold_for_fbeta(y_true, y_score, beta=1.0, thresholds=thresholds)
best_2 = best_threshold_for_fbeta(y_true, y_score, beta=2.0, thresholds=thresholds)

best_05["threshold"], best_1["threshold"], best_2["threshold"]


In [None]:
fig = make_subplots(
    rows=2,
    cols=1,
    shared_xaxes=True,
    vertical_spacing=0.12,
    subplot_titles=("Precision & recall vs threshold", "Fβ vs threshold"),
)

fig.add_trace(go.Scatter(x=thresholds, y=precision, mode="lines", name="precision"), row=1, col=1)
fig.add_trace(go.Scatter(x=thresholds, y=recall, mode="lines", name="recall"), row=1, col=1)

for beta, best in [(0.5, best_05), (1.0, best_1), (2.0, best_2)]:
    fig.add_trace(
        go.Scatter(
            x=best["thresholds"],
            y=best["fbeta_curve"],
            mode="lines",
            name=f"F{beta:g}",
        ),
        row=2,
        col=1,
    )
    fig.add_vline(
        x=best["threshold"],
        line_width=1,
        line_dash="dot",
        line_color="gray",
        row="all",
        col=1,
    )

fig.update_xaxes(title_text="threshold t", row=2, col=1)
fig.update_yaxes(title_text="value", row=1, col=1)
fig.update_yaxes(title_text="Fβ", row=2, col=1)
fig.update_layout(height=700, legend_orientation="h")
fig.show()


### Precision–recall curve + iso-Fβ lines

As you sweep the threshold \(t\), you trace out a curve in (recall, precision) space.

For a fixed \(\beta\), you can also draw **iso-Fβ** curves. Points on higher iso-curves have higher Fβ.


In [None]:
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=recall,
        y=precision,
        mode="markers+lines",
        marker=dict(
            color=thresholds,
            colorscale="Viridis",
            showscale=True,
            colorbar=dict(title="threshold"),
            size=6,
        ),
        name="PR (threshold sweep)",
    )
)

# Iso-Fβ lines for beta=2
beta_iso = 2.0
beta2 = beta_iso**2
p_grid = np.linspace(1e-3, 1.0, 400)
for F in [0.2, 0.4, 0.6, 0.8]:
    denom = (1.0 + beta2) * p_grid - F
    r = (F * beta2 * p_grid) / denom
    mask = (denom > 0) & (r >= 0) & (r <= 1)
    fig.add_trace(
        go.Scatter(
            x=r[mask],
            y=p_grid[mask],
            mode="lines",
            line=dict(width=1, dash="dot"),
            name=f"iso-F{beta_iso:g}={F}",
            opacity=0.8,
        )
    )

fig.update_layout(
    title="Precision–Recall curve with iso-F2 lines",
    xaxis_title="recall",
    yaxis_title="precision",
    height=600,
)
fig.update_xaxes(range=[0, 1])
fig.update_yaxes(range=[0, 1])
fig.show()


## Using Fβ to optimize a simple classifier

Two practical ways to “optimize for Fβ” are:

1. **Train** a model with a standard loss (e.g., log loss), then **tune the threshold** \(t\) to maximize Fβ on a validation set.
2. Optimize a **smooth surrogate** of Fβ (use probabilities instead of hard labels) with gradient-based methods.

We’ll do both with a from-scratch logistic regression.


In [None]:
def find_bias_for_target_rate(logits, target_rate, *, iters=60):
    lo, hi = -20.0, 20.0
    for _ in range(iters):
        mid = (lo + hi) / 2.0
        rate = sigmoid(logits + mid).mean()
        if rate > target_rate:
            hi = mid
        else:
            lo = mid
    return (lo + hi) / 2.0


def make_synthetic_logistic_data(n=4000, *, target_pos_rate=0.15, seed=0):
    rng_local = np.random.default_rng(seed)
    X = rng_local.normal(size=(n, 2))
    true_w = np.array([2.0, -1.2])
    base_logits = X @ true_w
    true_b = find_bias_for_target_rate(base_logits, target_pos_rate)
    probs = sigmoid(base_logits + true_b)
    y = rng_local.binomial(1, probs).astype(int)
    return X, y, probs, (true_w, true_b)


def train_val_test_split(X, y, *, ratios=(0.6, 0.2, 0.2), seed=0):
    if not np.isclose(sum(ratios), 1.0):
        raise ValueError("ratios must sum to 1")
    rng_local = np.random.default_rng(seed)
    n = X.shape[0]
    perm = rng_local.permutation(n)
    X, y = X[perm], y[perm]

    n_train = int(ratios[0] * n)
    n_val = int(ratios[1] * n)
    X_train, y_train = X[:n_train], y[:n_train]
    X_val, y_val = X[n_train : n_train + n_val], y[n_train : n_train + n_val]
    X_test, y_test = X[n_train + n_val :], y[n_train + n_val :]
    return X_train, y_train, X_val, y_val, X_test, y_test


X, y, _, (true_w, true_b) = make_synthetic_logistic_data(n=4000, target_pos_rate=0.12, seed=1)
X_train, y_train, X_val, y_val, X_test, y_test = train_val_test_split(X, y, seed=1)

print("positive rate (train/val/test):", y_train.mean(), y_val.mean(), y_test.mean())


In [None]:
def add_intercept(X):
    X = np.asarray(X, dtype=float)
    return np.c_[np.ones((X.shape[0], 1)), X]


def log_loss_and_grad(w, Xb, y, *, l2=0.0, eps=1e-12):
    z = Xb @ w
    p = sigmoid(z)

    y = y.astype(float)
    loss = -np.mean(y * np.log(p + eps) + (1.0 - y) * np.log(1.0 - p + eps))
    if l2:
        loss += 0.5 * l2 * np.sum(w[1:] ** 2)

    grad = (Xb.T @ (p - y)) / Xb.shape[0]
    if l2:
        grad[1:] += l2 * w[1:]
    return loss, grad


def fit_logistic_regression_ce(X, y, *, lr=0.2, steps=1500, l2=0.0, seed=0):
    rng_local = np.random.default_rng(seed)
    Xb = add_intercept(X)
    w = rng_local.normal(scale=0.01, size=Xb.shape[1])

    history = []
    for step in range(steps):
        loss, grad = log_loss_and_grad(w, Xb, y, l2=l2)
        w -= lr * grad
        if step % 20 == 0 or step == steps - 1:
            history.append((step, loss))
    return w, np.array(history)


w_ce, hist_ce = fit_logistic_regression_ce(X_train, y_train, lr=0.3, steps=1200, l2=1e-3, seed=1)
hist_ce[:5], hist_ce[-5:]


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=hist_ce[:, 0], y=hist_ce[:, 1], mode="lines", name="log loss"))
fig.update_layout(title="Logistic regression training (cross-entropy)", xaxis_title="step", yaxis_title="log loss")
fig.show()


In [None]:
def predict_proba(w, X):
    Xb = add_intercept(X)
    return sigmoid(Xb @ w)


def evaluate_thresholded(y_true, y_score, *, threshold, beta=1.0):
    y_pred = (y_score >= threshold).astype(int)
    tp, fp, fn, tn = confusion_counts_binary(y_true, y_pred)
    precision, recall, fbeta = precision_recall_fbeta_from_counts(tp, fp, fn, beta=beta)
    return {
        "threshold": float(threshold),
        "beta": float(beta),
        "precision": float(precision),
        "recall": float(recall),
        "fbeta": float(fbeta),
        "tp": int(tp),
        "fp": int(fp),
        "fn": int(fn),
        "tn": int(tn),
    }


val_scores_ce = predict_proba(w_ce, X_val)
test_scores_ce = predict_proba(w_ce, X_test)

betas = [0.5, 1.0, 2.0]
rows = []
for beta in betas:
    best = best_threshold_for_fbeta(y_val, val_scores_ce, beta=beta, thresholds=np.linspace(0, 1, 501))
    test_eval = evaluate_thresholded(y_test, test_scores_ce, threshold=best["threshold"], beta=beta)
    rows.append({
        "beta": beta,
        "best_val_threshold": best["threshold"],
        "val_fbeta": best["fbeta"],
        "test_precision": test_eval["precision"],
        "test_recall": test_eval["recall"],
        "test_fbeta": test_eval["fbeta"],
    })

try:
    import pandas as pd

    pd.DataFrame(rows)
except Exception:
    rows


Notice how the **optimal threshold changes with β**.

- With \(\beta > 1\), you typically pick a **lower** threshold to increase recall.
- With \(\beta < 1\), you often pick a **higher** threshold to increase precision.


In [None]:
thresholds = np.linspace(0.0, 1.0, 501)

fig = go.Figure()
for beta in [0.5, 1.0, 2.0]:
    best = best_threshold_for_fbeta(y_val, val_scores_ce, beta=beta, thresholds=thresholds)
    fig.add_trace(go.Scatter(x=thresholds, y=best["fbeta_curve"], mode="lines", name=f"val F{beta:g}"))
    fig.add_trace(
        go.Scatter(
            x=[best["threshold"]],
            y=[best["fbeta"]],
            mode="markers",
            marker=dict(size=10),
            name=f"best t for F{beta:g}",
        )
    )

fig.update_layout(
    title="Validation: Fβ vs threshold (same model, different β)",
    xaxis_title="threshold",
    yaxis_title="Fβ",
    height=500,
)
fig.show()


## Direct optimization (optional): a differentiable “soft Fβ” surrogate

Hard Fβ uses **thresholded** predictions \(\hat{y} \in \{0,1\}\), so it’s not differentiable in the model parameters.

A common trick is to replace \(\hat{y}\) with the model probability \(p\in[0,1]\) and define “soft” counts:

$$
\widetilde{TP} = \sum_i y_i p_i, \quad
\widetilde{FP} = \sum_i (1-y_i)p_i, \quad
\widetilde{FN} = \sum_i y_i (1-p_i)
$$

Then plug them into the same formula:

$$
\widetilde{F}_\beta = \frac{(1+\beta^2)\widetilde{TP}}{(1+\beta^2)\widetilde{TP} + \beta^2\widetilde{FN} + \widetilde{FP}}
$$

This surrogate is smooth in \(p\), so we can do gradient ascent on a logistic regression model.

Caveat: optimizing \(\widetilde{F}_\beta\) is **not identical** to optimizing the hard-thresholded Fβ, but it can be a useful demonstration (and sometimes a practical heuristic).


In [None]:
def soft_fbeta_and_grad(w, Xb, y, *, beta=2.0, l2=0.0, eps=1e-12):
    """Return (soft_fbeta, grad_w) for a logistic regression model p = sigmoid(Xb @ w)."""
    if beta <= 0:
        raise ValueError("beta must be > 0")
    beta2 = beta**2

    z = Xb @ w
    p = sigmoid(z)
    y = y.astype(float)

    tp = np.sum(y * p)
    sp = np.sum(p)  # tp + fp
    pos = np.sum(y)
    denom = sp + beta2 * pos + eps
    f = (1.0 + beta2) * tp / denom

    # dF/dp_i
    dF_dp = (1.0 + beta2) * (y * denom - tp) / (denom**2)
    dF_dz = dF_dp * p * (1.0 - p)
    grad = Xb.T @ dF_dz

    if l2:
        f -= 0.5 * l2 * np.sum(w[1:] ** 2)
        grad[1:] -= l2 * w[1:]

    return float(f), grad


def fit_logistic_regression_soft_fbeta(X, y, *, beta=2.0, lr=1e-3, steps=4000, l2=1e-3, seed=0):
    rng_local = np.random.default_rng(seed)
    Xb = add_intercept(X)
    w = rng_local.normal(scale=0.01, size=Xb.shape[1])

    history = []
    for step in range(steps):
        f, grad = soft_fbeta_and_grad(w, Xb, y, beta=beta, l2=l2)
        w += lr * grad
        if step % 50 == 0 or step == steps - 1:
            history.append((step, f))
    return w, np.array(history)


beta_opt = 2.0
w_soft, hist_soft = fit_logistic_regression_soft_fbeta(
    X_train, y_train, beta=beta_opt, lr=2e-3, steps=6000, l2=1e-3, seed=2
)
hist_soft[:5], hist_soft[-5:]


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=hist_soft[:, 0], y=hist_soft[:, 1], mode="lines", name=f"soft F{beta_opt:g}"))
fig.update_layout(
    title=f"Logistic regression training (maximize soft F{beta_opt:g})",
    xaxis_title="step",
    yaxis_title=f"soft F{beta_opt:g}",
)
fig.show()


In [None]:
val_scores_soft = predict_proba(w_soft, X_val)
test_scores_soft = predict_proba(w_soft, X_test)

best_ce = best_threshold_for_fbeta(y_val, val_scores_ce, beta=beta_opt, thresholds=np.linspace(0, 1, 501))
best_soft = best_threshold_for_fbeta(y_val, val_scores_soft, beta=beta_opt, thresholds=np.linspace(0, 1, 501))

test_ce = evaluate_thresholded(y_test, test_scores_ce, threshold=best_ce["threshold"], beta=beta_opt)
test_soft = evaluate_thresholded(y_test, test_scores_soft, threshold=best_soft["threshold"], beta=beta_opt)

rows = [
    {
        "model": "cross-entropy + threshold tuning",
        "val_best_threshold": best_ce["threshold"],
        "val_Fbeta": best_ce["fbeta"],
        "test_precision": test_ce["precision"],
        "test_recall": test_ce["recall"],
        "test_Fbeta": test_ce["fbeta"],
    },
    {
        "model": f"maximize soft F{beta_opt:g} + threshold tuning",
        "val_best_threshold": best_soft["threshold"],
        "val_Fbeta": best_soft["fbeta"],
        "test_precision": test_soft["precision"],
        "test_recall": test_soft["recall"],
        "test_Fbeta": test_soft["fbeta"],
    },
]

try:
    import pandas as pd

    pd.DataFrame(rows)
except Exception:
    rows


## Pros / cons and when to use Fβ

### Pros

- **Focuses on the positive class** (TP/FP/FN) — useful for imbalanced problems where TN is less informative
- **Adjustable trade-off** via \(\beta\): pick recall-heavy (\(\beta>1\)) or precision-heavy (\(\beta<1\))
- **Single number** summarizing the precision–recall trade-off (easy to compare models)

### Cons

- **Threshold-dependent**: you must choose a threshold (or a policy) to get a meaningful number
- **Not a proper scoring rule**: it does not reward well-calibrated probabilities the way log loss / Brier score do
- **Ignores TN**: can be misleading when TN matters (e.g., overall error rate is critical)
- **Not smooth** in the hard form (can’t be directly optimized with gradient descent without surrogates)

### When it’s a good fit

- Information retrieval / search / recommendation (relevant items are “positive”)
- Medical screening or safety monitoring (often \(\beta>1\) to favor recall)
- Fraud / abuse detection (often \(\beta<1\) if false positives are expensive)


## Pitfalls + diagnostics

- Pick \(\beta\) based on **real costs** (FN vs FP), not after looking at the test set.
- Always tune thresholds on a **validation** set (or use cross-validation).
- For highly imbalanced data, look at the full **precision–recall curve**; a single Fβ can hide failure modes.
- Be explicit about the **positive class** (`pos_label`).
- Decide how to handle **zero-division** cases (no predicted positives, or no actual positives).


## Exercises

1. Implement macro-averaged Fβ for multiclass classification via one-vs-rest.
2. For a fixed model, show how the best threshold changes when \(\beta\in\{0.25,0.5,1,2,4\}\).
3. Compare optimizing (a) log loss + threshold tuning vs (b) a soft-Fβ surrogate on a more extreme imbalance (e.g., 1% positives).


## References

- scikit-learn `fbeta_score`: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html
- scikit-learn metrics overview: https://scikit-learn.org/stable/modules/model_evaluation.html
- C. J. van Rijsbergen, *Information Retrieval* (discussion of the \(F_\beta\) measure)
