[claude] feat(benchmarks-website): historical comparison UX + mobile#7681
Merged
connortsui20 merged 6 commits intoct/benchmarks-v3from Apr 27, 2026
Merged
Conversation
Phase 1+2 of the historical-comparison UX. Read-side API and slug
plumbing only; the HTML/JS rebuild lands in a follow-up commit.
* `CommitWindow::Last(n) / All` plus `?n=NNN|all` parsing on
`/api/chart/{slug}`. Default cap is 100; numeric values clamp to
`[1, 1000]`; malformed falls back to default. SQL filter splices in
a `commit_sha IN (SELECT ... LIMIT n)` subquery so the unbounded
path stays plan-clean.
* `GroupKey` enum mirroring `ChartKey` with distinct prefixes
(`qmg/ctg/csg/rag/vsg`). `Group.slug` populated for each group.
* `/api/group/{slug}` returns every chart in the group with its data
embedded (so the HTML page can render lazily without per-chart
fetches).
* Round-trip + clamp tests for `CommitWindow` and `GroupKey`.
Signed-off-by: Claude <claude@anthropic.com>
Wires the historical-comparison UX through the HTML/JS/CSS layer.
URL query string is the source of truth for every UI control so
permalinks reproduce the view exactly.
* Toolbar (single component, both `/chart/{slug}` and `/group/{slug}`):
scope buttons (25/50/100/250/all) + slider, linear/log Y-axis,
absolute/`% of baseline` mode. Active value highlighted; subtitle
reflects active state. Plain `<a>` navigation; URL is canonical.
* `/group/{slug}` HTML page renders each chart in a card, embedding
payloads inline. `IntersectionObserver` defers `Chart` construction
until the canvas scrolls into view (mobile + 22-chart TPC-H groups).
* `chart-init.js` generalised: discovers `<script id="chart-data-N">`
+ `<canvas data-chart-index="N">` pairs and instantiates one chart
per pair; single-chart page reuses the same path.
* Rich custom external tooltip: `<short-sha> · YYYY-MM-DD` title;
per-series row with friendly value (ns/µs/ms/s, B/KiB/MiB/GiB) +
coloured `% delta` vs prior visible commit; footer with truncated
message + GitHub commit link. Document-level click closes.
* Legend toggle rewrites `?hidden=engine:format,...` via
`history.replaceState` (no back-button hostility).
* Landing page filter: client-side `?` search box that hides
groups whose name doesn't match.
* Mobile breakpoint at 768px: toolbar wraps with ≥40px touch
targets; chart grid collapses to one column; legend renders above
the chart on narrow viewports so it doesn't push the chart off
screen.
* Snapshot tests for landing/chart/group pages (?n pinned for
stability) plus `?n` cap behaviour test on the API.
Signed-off-by: Claude <claude@anthropic.com>
Merging this PR will degrade performance by 21.64%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | varbinview_zip_block_mask |
2.9 ms | 3.7 ms | -21.64% |
| ❌ | Simulation | varbinview_zip_fragmented_mask |
6.5 ms | 7.3 ms | -10.28% |
| ⚡ | Simulation | patched_take_10k_dispersed |
315.4 µs | 284.5 µs | +10.88% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
301.7 µs | 270.9 µs | +11.38% |
| ⚡ | Simulation | patched_take_10k_adversarial |
258.3 µs | 227.4 µs | +13.57% |
| ⚡ | Simulation | take_10k_dispersed |
284.3 µs | 238.8 µs | +19.07% |
| ⚡ | Simulation | take_10k_first_chunk_only |
270.2 µs | 224.9 µs | +20.14% |
Comparing claude/demo-ready-benchmarks-v3-H5ECI (da668a4) with ct/benchmarks-v3 (8697731)
Address review feedback: `CommitWindow::sql_filter` was interpolating the LIMIT integer into the SQL string. The value is server-clamped to `[1, 1000]` so it's safe today, but the rest of the file binds with `params![...]` and the inconsistency would invite a real injection later. * `sql_filter` now returns a `&'static str` with a `LIMIT ?` placeholder. * New `limit_param() -> Option<i64>` returns `Some(n)` for `Last(n)` and `None` for `All`. * Each `collect_*_chart` builds a `Vec<Box<dyn ToSql>>` and threads it through `params_from_iter`; the limit is appended only when present. * `commit_window_sql_filter_shape` updated to assert the placeholder shape (and absence of the literal integer); new `commit_window_limit_param` test pins `Last(N).limit_param() == Some(N as i64)` and `All.limit_param() == None`. Signed-off-by: Claude <claude@anthropic.com>
The toolbar only ships absolute and `% of baseline` buttons. The `?mode=delta` branch in both `UiQuery::mode` and `parseUrl` was dead code that would have rendered the page with a non-functional mode. Implementing the third mode is deferred to a follow-up — chose deletion here so the parser surface matches what the UI actually exposes, and unknown values fall through to `abs` like everything else. Signed-off-by: Claude <claude@anthropic.com>
Series labels are `engine:format`-shaped today and don't contain `|`, but the comma delimiter for `?hidden=` was a fragile assumption — a dataset variant with a comma in its name would silently corrupt the URL state. Switch to `|`, which is URL-safe per RFC 3986 unreserved plus `sub-delims` rules and which our internal labels never produce. * `chart-init.js`: `parseHiddenParam` / `serializeHidden` use `|` via a `HIDDEN_DELIM` constant. * `html.rs::urlencode` allowlist swaps `,` for `|` so the server round-trips a permalink with multiple hidden series without percent-encoding the delimiter. * New test `ui_query_with_override_preserves_pipe_delimited_hidden` pins server/client wire agreement (`?hidden=a:b|c:d` survives `with_override`, the pipe is not `%7C`-encoded). Signed-off-by: Claude <claude@anthropic.com>
The toolbar's scope buttons are `25/50/100/250/all` but the slider was `step=10`, so dragging it could land on 50 and 100 but never on 25 or 250. Lower the step to 5 (and `min` to 5 for symmetry) so the slider can reach every preset value the buttons advertise. Also keeps the slider granular enough to be useful as a custom selector. Snapshots refreshed for the changed slider attributes. Signed-off-by: Claude <claude@anthropic.com>
connortsui20
added a commit
that referenced
this pull request
Apr 28, 2026
…7681) ## Summary Brings the v3 benchmarks website to a demo-ready state focused on the historical-comparison use case (Vortex vs other engines on the same commit, HEAD vs N commits ago, latest vs first as % delta). Single process, single binary; SSR `maud` + inline JSON `<script>` + Chart.js — no client-side framework, no build step, no post-load API round-trips. > Branch note: this PR was developed on the harness-assigned branch > `claude/demo-ready-benchmarks-v3-H5ECI` rather than the > `claude/benchmarks-v3-ui-historical-comparison` branch the task > request mentioned, because the session's harness pins the working > branch (`Develop on branch …`, `NEVER push to a different branch > without explicit permission`). ## CI note The `Rust tests (windows-x64)` job is failing on this PR but the **same job is also failing on the merge commit at the tip of `ct/benchmarks-v3`** (PR #7671's run, job id `73229326105`, the commit `8697731` we branched from). The base branch shipped with that failure tolerated, and our diff only touches `benchmarks-website/server/` (no Windows-specific paths, no FFI, no new dependencies on Windows-fragile crates), so this failure is pre-existing and not caused by the PR. CodSpeed flagged two `varbinview_zip` regressions in `vortex-array/` — also untouched by this PR. ## What's new * **Scoped commit window** — `?n=25|50|100|250|all`, default 100, server-side clamp to `[1, 1000]`. SQL splices in a `LIMIT ?` filter and binds the value as a parameter (consistent with the rest of the file's `params!`-style use); the unbounded path is a separate query so the plan stays clean. * **Group page** — `GET /group/{slug}` renders every chart in one group on a single screen. Each card embeds its own `<script id="chart-data-N">` payload + sibling `<canvas data-chart-index="N">`. `IntersectionObserver` defers `Chart` construction until the canvas scrolls into view (mobile-friendly + cheap for 22-chart TPC-H groups). * **Toolbar** — same component on `/chart/{slug}` and `/group/{slug}`. Scope buttons + slider, linear/log Y-axis, absolute / `% of baseline` mode. URL query string is canonical state; subtitle mirrors active state. Slider step is `5` so it can land on every preset value (`25`, `50`, `100`, `250`). * **Rich tooltip** — custom external HTML tooltip with `<short-sha> · YYYY-MM-DD` title; per-series rows render value with friendly unit (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured `% delta` vs the prior visible commit; footer carries the truncated commit message + a GitHub link. Document-level click closes. * **Legend → URL** — clicking a legend item rewrites `?hidden=engine:format|…` via `history.replaceState` (no back-button hostility). Permalinks reproduce the view. Delimiter is `|` so series names can contain `:` and `,` without escaping. * **Mobile** — `@media (max-width: 768px)`: single-column chart grid, toolbar wraps with ≥ 40 px touch targets, slider expands to fill the row, legend pops to the *top* of the chart so it doesn't push the chart off-screen on a phone. * **Landing search** — client-side filter input above the group list. * **/api/group/{slug}** — JSON sibling to the HTML route, returns every chart in the group with payloads inlined. ## What was *not* picked up from `planning/components/web-ui.md`'s deferred list Done now (moved out of deferred): - mobile redesign basics (single column, ≥ 40 px tap targets, toolbar wrap) - engine + series toggling (legend ↔ URL) - deep-link state (every toolbar control is URL-canonical) - group landing with the start of "filters" (client-side search) Still deferred (intentional): - per-commit drill-down page - ad-hoc SQL page - LTTB downsampling - engine name lookup table + curated colour palettes - summary cards (geomean ratios, rankings) - full-screen modal / zoom-pan - `?mode=delta` (compare-to-main) — parser branch dropped pending data shape work; toolbar surface today is only `abs / rel` ## Repro INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \ VORTEX_BENCH_DB=./bench.duckdb \ cargo run --release -p vortex-bench-server Then open `http://localhost:3000/`, click any group name (now a link to `/group/{slug}`), or any chart inside, and play with the toolbar. Toggle a series in the legend and notice `?hidden=…` appear in the URL. Resize to phone width to confirm single-column layout, sticky toolbar wrapping, and legend-on-top. ## Snapshot diffs Three `.snap` files refreshed by this PR: - `landing_page.snap` — group names now link to `/group/{slug}`, search input added, `data-group-name` for client filter. - `chart_page_query.snap` — toolbar + indexed `<script id="chart-data-0">` + tooltip host element. - `group_page_query.snap` (new) — group page rendered against the fixture DB, `?n=100` pinned for stability. Run `INSTA_UPDATE=always cargo test -p vortex-bench-server` (or `cargo insta accept`) to refresh. ## Test plan - [x] `cargo build -p vortex-bench-server` - [x] `cargo test -p vortex-bench-server` — 41 tests pass (22 unit + 10 ingest + 9 web_ui) - [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings` — clean - [x] `cargo +nightly fmt` — no diff - [ ] `./scripts/public-api.sh` — skipped per CLAUDE.md (leaf binary, not in workspace public-api lockfile set) - [ ] Manual screenshots — couldn't capture from the sandbox; the reviewer or follow-up should record landing / single chart with toolbar / group desktop / group mobile / tooltip open / log+rel. ## Follow-up review fixes (commits `7042f0d` … `da668a4`) - `7042f0d` — `LIMIT` value travels as a bound parameter (`LIMIT ?`) via `params_from_iter` instead of being interpolated into SQL. - `9c80bce` — drop the unused `?mode=delta` parser branch in both `UiQuery::mode` and `chart-init.js::parseUrl`. - `d156ab8` — `?hidden=` delimiter is now `|`; new test pins the server/client wire agreement. - `da668a4` — slider `step` lowered to 5 so it can land on every preset (`25/50/100/250`). ## Things explicitly NOT changed - `/api/ingest`, auth, schema, write paths. - DB migration (none added). - Existing routes (no renames). - v2 site at `benchmarks-website/server.js` etc — untouched. - Single-chart page still works; reuses the same `chart-init.js`. https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK --------- Signed-off-by: Claude <claude@anthropic.com> Co-authored-by: Claude <claude@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the v3 benchmarks website to a demo-ready state focused on the
historical-comparison use case (Vortex vs other engines on the same
commit, HEAD vs N commits ago, latest vs first as % delta). Single
process, single binary; SSR
maud+ inline JSON<script>+Chart.js — no client-side framework, no build step, no post-load API
round-trips.
CI note
The
Rust tests (windows-x64)job is failing on this PR but thesame job is also failing on the merge commit at the tip of
ct/benchmarks-v3(PR #7671's run, job id73229326105, thecommit
8697731we branched from). The base branch shipped withthat failure tolerated, and our diff only touches
benchmarks-website/server/(no Windows-specific paths, no FFI, nonew dependencies on Windows-fragile crates), so this failure is
pre-existing and not caused by the PR. CodSpeed flagged two
varbinview_zipregressions invortex-array/— also untouched bythis PR.
What's new
?n=25|50|100|250|all, default 100,server-side clamp to
[1, 1000]. SQL splices in aLIMIT ?filterand binds the value as a parameter (consistent with the rest of
the file's
params!-style use); the unbounded path is a separatequery so the plan stays clean.
GET /group/{slug}renders every chart in onegroup on a single screen. Each card embeds its own
<script id="chart-data-N">payload + sibling<canvas data-chart-index="N">.IntersectionObserverdefersChartconstruction until the canvas scrolls into view (mobile-friendly
/chart/{slug}and/group/{slug}.Scope buttons + slider, linear/log Y-axis, absolute /
% of baselinemode. URL query string is canonical state; subtitlemirrors active state. Slider step is
5so it can land on everypreset value (
25,50,100,250).<short-sha> · YYYY-MM-DDtitle; per-series rows render value with friendly unit(ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured
% deltavs the priorvisible commit; footer carries the truncated commit message + a
GitHub link. Document-level click closes.
?hidden=engine:format|…viahistory.replaceState(no back-buttonhostility). Permalinks reproduce the view. Delimiter is
|soseries names can contain
:and,without escaping.@media (max-width: 768px): single-column chart grid,toolbar wraps with ≥ 40 px touch targets, slider expands to fill
the row, legend pops to the top of the chart so it doesn't push
the chart off-screen on a phone.
every chart in the group with payloads inlined.
What was not picked up from
planning/components/web-ui.md's deferred listDone now (moved out of deferred):
toolbar wrap)
Still deferred (intentional):
?mode=delta(compare-to-main) — parser branch dropped pendingdata shape work; toolbar surface today is only
abs / relRepro
Then open
http://localhost:3000/, click any group name (now a linkto
/group/{slug}), or any chart inside, and play with the toolbar.Toggle a series in the legend and notice
?hidden=…appear in theURL. Resize to phone width to confirm single-column layout, sticky
toolbar wrapping, and legend-on-top.
Snapshot diffs
Three
.snapfiles refreshed by this PR:landing_page.snap— group names now link to/group/{slug},search input added,
data-group-namefor client filter.chart_page_query.snap— toolbar + indexed<script id="chart-data-0">+ tooltip host element.group_page_query.snap(new) — group page rendered against thefixture DB,
?n=100pinned for stability.Run
INSTA_UPDATE=always cargo test -p vortex-bench-server(orcargo insta accept) to refresh.Test plan
cargo build -p vortex-bench-servercargo test -p vortex-bench-server— 41 tests pass (22 unit +10 ingest + 9 web_ui)
cargo clippy -p vortex-bench-server --all-targets -- -D warnings— cleancargo +nightly fmt— no diff./scripts/public-api.sh— skipped per CLAUDE.md (leaf binary,not in workspace public-api lockfile set)
reviewer or follow-up should record landing / single chart with
toolbar / group desktop / group mobile / tooltip open / log+rel.
Follow-up review fixes (commits
7042f0d…da668a4)7042f0d—LIMITvalue travels as a bound parameter (LIMIT ?)via
params_from_iterinstead of being interpolated into SQL.9c80bce— drop the unused?mode=deltaparser branch in bothUiQuery::modeandchart-init.js::parseUrl.d156ab8—?hidden=delimiter is now|; new test pins theserver/client wire agreement.
da668a4— slidersteplowered to 5 so it can land on everypreset (
25/50/100/250).Things explicitly NOT changed
/api/ingest, auth, schema, write paths.benchmarks-website/server.jsetc — untouched.chart-init.js.https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK