devlow-bench: percentile-based comparison and run retries by wbinnssmith · Pull Request #93950 · vercel/next.js

wbinnssmith · 2026-05-19T22:44:54Z

A few changes to devlow-bench so the comparison output is easier to read and so a couple of flaky page-load runs don't quietly poison the stats.

compare

Show p50 / p90 / p99 instead of mean / p50 / p90, with Δ p50 and a single Mann–Whitney p-value (Welch's t-test and Δ mean are gone — the mean was dragging on bad runs).
Hoist the modal sample count into an n = 7 per metric banner so only the outlier rows carry (n).

runner

Each attempt's measurements are buffered locally and only merged on success. Failed runs are retried (capped at 2× warmup+n) until we have a clean n samples.
Per-variant retry line plus an end-of-run summary noting which variants recovered and which fell short.

browser

hardNavigation / reload now throw when the navigation response is missing or non-2xx. Previously a 200-committed-but-broken page was being recorded as a real sample.

The trigger was a run where 2 of 7 cold-build attempts hit an error page, dragging the mean response size from 45 MB to 33 MB while the p50 was unchanged.

…etection - compare: drop mean and Welch's p; show p50/p90/p99, Δ p50, and a single Mann–Whitney p-value. Hoist the modal n into a banner so only outlier rows carry (n). - runner: buffer each attempt's samples and merge only on success. Retry failed runs until n clean samples are collected (capped at 2× warmup+n). Per-variant and end-of-run failure summaries. - browser: hardNavigation/reload throw when the response is missing or non-2xx, so error pages become failed runs instead of polluted samples.

github-actions · 2026-05-19T23:08:21Z

Tests Passed

Commit: bea3012

Version bump so we can publish the changes from #93950 (and the previously-unpublished 0.3.5 work) to npm. Stacked on top of #93950.

wbinnssmith requested review from lukesandberg and sokra May 19, 2026 22:45

wbinnssmith mentioned this pull request May 19, 2026

@vercel/devlow-bench: 0.3.5 → 0.4.0 #93951

Merged

wbinnssmith marked this pull request as ready for review May 19, 2026 22:52

sokra approved these changes May 19, 2026

View reviewed changes

wbinnssmith merged commit 56825e5 into canary May 20, 2026
286 of 294 checks passed

wbinnssmith deleted the wbinnssmith/devlow-bench-retries-percentiles branch May 20, 2026 21:27

wbinnssmith added a commit that referenced this pull request May 20, 2026

@vercel/devlow-bench: 0.3.5 → 0.4.0 (#93951)

3e97d89

Version bump so we can publish the changes from #93950 (and the previously-unpublished 0.3.5 work) to npm. Stacked on top of #93950.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

devlow-bench: percentile-based comparison and run retries#93950

devlow-bench: percentile-based comparison and run retries#93950
wbinnssmith merged 1 commit into
canaryfrom
wbinnssmith/devlow-bench-retries-percentiles

wbinnssmith commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wbinnssmith commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 19, 2026 •

edited

Loading