Skip to content

[claude] feat(benchmarks-website): historical comparison UX + mobile#7681

Merged
connortsui20 merged 6 commits intoct/benchmarks-v3from
claude/demo-ready-benchmarks-v3-H5ECI
Apr 27, 2026
Merged

[claude] feat(benchmarks-website): historical comparison UX + mobile#7681
connortsui20 merged 6 commits intoct/benchmarks-v3from
claude/demo-ready-benchmarks-v3-H5ECI

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented Apr 27, 2026

Summary

Brings the v3 benchmarks website to a demo-ready state focused on the
historical-comparison use case (Vortex vs other engines on the same
commit, HEAD vs N commits ago, latest vs first as % delta). Single
process, single binary; SSR maud + inline JSON <script> +
Chart.js — no client-side framework, no build step, no post-load API
round-trips.

Branch note: this PR was developed on the harness-assigned branch
claude/demo-ready-benchmarks-v3-H5ECI rather than the
claude/benchmarks-v3-ui-historical-comparison branch the task
request mentioned, because the session's harness pins the working
branch (Develop on branch …, NEVER push to a different branch without explicit permission).

CI note

The Rust tests (windows-x64) job is failing on this PR but the
same job is also failing on the merge commit at the tip of
ct/benchmarks-v3
(PR #7671's run, job id 73229326105, the
commit 8697731 we branched from). The base branch shipped with
that failure tolerated, and our diff only touches
benchmarks-website/server/ (no Windows-specific paths, no FFI, no
new dependencies on Windows-fragile crates), so this failure is
pre-existing and not caused by the PR. CodSpeed flagged two
varbinview_zip regressions in vortex-array/ — also untouched by
this PR.

What's new

  • Scoped commit window?n=25|50|100|250|all, default 100,
    server-side clamp to [1, 1000]. SQL splices in a LIMIT ? filter
    and binds the value as a parameter (consistent with the rest of
    the file's params!-style use); the unbounded path is a separate
    query so the plan stays clean.
  • Group pageGET /group/{slug} renders every chart in one
    group on a single screen. Each card embeds its own
    <script id="chart-data-N"> payload + sibling <canvas data-chart-index="N">. IntersectionObserver defers Chart
    construction until the canvas scrolls into view (mobile-friendly
    • cheap for 22-chart TPC-H groups).
  • Toolbar — same component on /chart/{slug} and /group/{slug}.
    Scope buttons + slider, linear/log Y-axis, absolute / % of baseline mode. URL query string is canonical state; subtitle
    mirrors active state. Slider step is 5 so it can land on every
    preset value (25, 50, 100, 250).
  • Rich tooltip — custom external HTML tooltip with <short-sha> · YYYY-MM-DD title; per-series rows render value with friendly unit
    (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured % delta vs the prior
    visible commit; footer carries the truncated commit message + a
    GitHub link. Document-level click closes.
  • Legend → URL — clicking a legend item rewrites
    ?hidden=engine:format|… via history.replaceState (no back-button
    hostility). Permalinks reproduce the view. Delimiter is | so
    series names can contain : and , without escaping.
  • Mobile@media (max-width: 768px): single-column chart grid,
    toolbar wraps with ≥ 40 px touch targets, slider expands to fill
    the row, legend pops to the top of the chart so it doesn't push
    the chart off-screen on a phone.
  • Landing search — client-side filter input above the group list.
  • /api/group/{slug} — JSON sibling to the HTML route, returns
    every chart in the group with payloads inlined.

What was not picked up from planning/components/web-ui.md's deferred list

Done now (moved out of deferred):

  • mobile redesign basics (single column, ≥ 40 px tap targets,
    toolbar wrap)
  • engine + series toggling (legend ↔ URL)
  • deep-link state (every toolbar control is URL-canonical)
  • group landing with the start of "filters" (client-side search)

Still deferred (intentional):

  • per-commit drill-down page
  • ad-hoc SQL page
  • LTTB downsampling
  • engine name lookup table + curated colour palettes
  • summary cards (geomean ratios, rankings)
  • full-screen modal / zoom-pan
  • ?mode=delta (compare-to-main) — parser branch dropped pending
    data shape work; toolbar surface today is only abs / rel

Repro

INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \
VORTEX_BENCH_DB=./bench.duckdb \
cargo run --release -p vortex-bench-server

Then open http://localhost:3000/, click any group name (now a link
to /group/{slug}), or any chart inside, and play with the toolbar.
Toggle a series in the legend and notice ?hidden=… appear in the
URL. Resize to phone width to confirm single-column layout, sticky
toolbar wrapping, and legend-on-top.

Snapshot diffs

Three .snap files refreshed by this PR:

  • landing_page.snap — group names now link to /group/{slug},
    search input added, data-group-name for client filter.
  • chart_page_query.snap — toolbar + indexed
    <script id="chart-data-0"> + tooltip host element.
  • group_page_query.snap (new) — group page rendered against the
    fixture DB, ?n=100 pinned for stability.

Run INSTA_UPDATE=always cargo test -p vortex-bench-server (or
cargo insta accept) to refresh.

Test plan

  • cargo build -p vortex-bench-server
  • cargo test -p vortex-bench-server — 41 tests pass (22 unit +
    10 ingest + 9 web_ui)
  • cargo clippy -p vortex-bench-server --all-targets -- -D warnings — clean
  • cargo +nightly fmt — no diff
  • ./scripts/public-api.sh — skipped per CLAUDE.md (leaf binary,
    not in workspace public-api lockfile set)
  • Manual screenshots — couldn't capture from the sandbox; the
    reviewer or follow-up should record landing / single chart with
    toolbar / group desktop / group mobile / tooltip open / log+rel.

Follow-up review fixes (commits 7042f0dda668a4)

  • 7042f0dLIMIT value travels as a bound parameter (LIMIT ?)
    via params_from_iter instead of being interpolated into SQL.
  • 9c80bce — drop the unused ?mode=delta parser branch in both
    UiQuery::mode and chart-init.js::parseUrl.
  • d156ab8?hidden= delimiter is now |; new test pins the
    server/client wire agreement.
  • da668a4 — slider step lowered to 5 so it can land on every
    preset (25/50/100/250).

Things explicitly NOT changed

  • /api/ingest, auth, schema, write paths.
  • DB migration (none added).
  • Existing routes (no renames).
  • v2 site at benchmarks-website/server.js etc — untouched.
  • Single-chart page still works; reuses the same chart-init.js.

https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK

claude added 2 commits April 27, 2026 19:31
Phase 1+2 of the historical-comparison UX. Read-side API and slug
plumbing only; the HTML/JS rebuild lands in a follow-up commit.

* `CommitWindow::Last(n) / All` plus `?n=NNN|all` parsing on
  `/api/chart/{slug}`. Default cap is 100; numeric values clamp to
  `[1, 1000]`; malformed falls back to default. SQL filter splices in
  a `commit_sha IN (SELECT ... LIMIT n)` subquery so the unbounded
  path stays plan-clean.
* `GroupKey` enum mirroring `ChartKey` with distinct prefixes
  (`qmg/ctg/csg/rag/vsg`). `Group.slug` populated for each group.
* `/api/group/{slug}` returns every chart in the group with its data
  embedded (so the HTML page can render lazily without per-chart
  fetches).
* Round-trip + clamp tests for `CommitWindow` and `GroupKey`.

Signed-off-by: Claude <claude@anthropic.com>
Wires the historical-comparison UX through the HTML/JS/CSS layer.
URL query string is the source of truth for every UI control so
permalinks reproduce the view exactly.

* Toolbar (single component, both `/chart/{slug}` and `/group/{slug}`):
  scope buttons (25/50/100/250/all) + slider, linear/log Y-axis,
  absolute/`% of baseline` mode. Active value highlighted; subtitle
  reflects active state. Plain `<a>` navigation; URL is canonical.
* `/group/{slug}` HTML page renders each chart in a card, embedding
  payloads inline. `IntersectionObserver` defers `Chart` construction
  until the canvas scrolls into view (mobile + 22-chart TPC-H groups).
* `chart-init.js` generalised: discovers `<script id="chart-data-N">`
  + `<canvas data-chart-index="N">` pairs and instantiates one chart
  per pair; single-chart page reuses the same path.
* Rich custom external tooltip: `<short-sha> · YYYY-MM-DD` title;
  per-series row with friendly value (ns/µs/ms/s, B/KiB/MiB/GiB) +
  coloured `% delta` vs prior visible commit; footer with truncated
  message + GitHub commit link. Document-level click closes.
* Legend toggle rewrites `?hidden=engine:format,...` via
  `history.replaceState` (no back-button hostility).
* Landing page filter: client-side `?` search box that hides
  groups whose name doesn't match.
* Mobile breakpoint at 768px: toolbar wraps with ≥40px touch
  targets; chart grid collapses to one column; legend renders above
  the chart on narrow viewports so it doesn't push the chart off
  screen.
* Snapshot tests for landing/chart/group pages (?n pinned for
  stability) plus `?n` cap behaviour test on the API.

Signed-off-by: Claude <claude@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 27, 2026

Merging this PR will degrade performance by 21.64%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 5 improved benchmarks
❌ 2 regressed benchmarks
✅ 1156 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation varbinview_zip_block_mask 2.9 ms 3.7 ms -21.64%
Simulation varbinview_zip_fragmented_mask 6.5 ms 7.3 ms -10.28%
Simulation patched_take_10k_dispersed 315.4 µs 284.5 µs +10.88%
Simulation patched_take_10k_first_chunk_only 301.7 µs 270.9 µs +11.38%
Simulation patched_take_10k_adversarial 258.3 µs 227.4 µs +13.57%
Simulation take_10k_dispersed 284.3 µs 238.8 µs +19.07%
Simulation take_10k_first_chunk_only 270.2 µs 224.9 µs +20.14%

Comparing claude/demo-ready-benchmarks-v3-H5ECI (da668a4) with ct/benchmarks-v3 (8697731)

Open in CodSpeed

claude added 4 commits April 27, 2026 20:04
Address review feedback: `CommitWindow::sql_filter` was interpolating
the LIMIT integer into the SQL string. The value is server-clamped to
`[1, 1000]` so it's safe today, but the rest of the file binds with
`params![...]` and the inconsistency would invite a real injection
later.

* `sql_filter` now returns a `&'static str` with a `LIMIT ?`
  placeholder.
* New `limit_param() -> Option<i64>` returns `Some(n)` for `Last(n)`
  and `None` for `All`.
* Each `collect_*_chart` builds a `Vec<Box<dyn ToSql>>` and threads
  it through `params_from_iter`; the limit is appended only when
  present.
* `commit_window_sql_filter_shape` updated to assert the placeholder
  shape (and absence of the literal integer); new
  `commit_window_limit_param` test pins `Last(N).limit_param() ==
  Some(N as i64)` and `All.limit_param() == None`.

Signed-off-by: Claude <claude@anthropic.com>
The toolbar only ships absolute and `% of baseline` buttons. The
`?mode=delta` branch in both `UiQuery::mode` and `parseUrl` was dead
code that would have rendered the page with a non-functional mode.

Implementing the third mode is deferred to a follow-up — chose
deletion here so the parser surface matches what the UI actually
exposes, and unknown values fall through to `abs` like everything
else.

Signed-off-by: Claude <claude@anthropic.com>
Series labels are `engine:format`-shaped today and don't contain `|`,
but the comma delimiter for `?hidden=` was a fragile assumption — a
dataset variant with a comma in its name would silently corrupt the
URL state. Switch to `|`, which is URL-safe per RFC 3986 unreserved
plus `sub-delims` rules and which our internal labels never produce.

* `chart-init.js`: `parseHiddenParam` / `serializeHidden` use `|` via
  a `HIDDEN_DELIM` constant.
* `html.rs::urlencode` allowlist swaps `,` for `|` so the server
  round-trips a permalink with multiple hidden series without
  percent-encoding the delimiter.
* New test `ui_query_with_override_preserves_pipe_delimited_hidden`
  pins server/client wire agreement (`?hidden=a:b|c:d` survives
  `with_override`, the pipe is not `%7C`-encoded).

Signed-off-by: Claude <claude@anthropic.com>
The toolbar's scope buttons are `25/50/100/250/all` but the slider
was `step=10`, so dragging it could land on 50 and 100 but never on
25 or 250. Lower the step to 5 (and `min` to 5 for symmetry) so the
slider can reach every preset value the buttons advertise. Also keeps
the slider granular enough to be useful as a custom selector.

Snapshots refreshed for the changed slider attributes.

Signed-off-by: Claude <claude@anthropic.com>
@connortsui20 connortsui20 added the changelog/feature A new feature label Apr 27, 2026 — with Claude
@connortsui20 connortsui20 merged commit 5083e80 into ct/benchmarks-v3 Apr 27, 2026
55 of 59 checks passed
@connortsui20 connortsui20 deleted the claude/demo-ready-benchmarks-v3-H5ECI branch April 27, 2026 20:44
connortsui20 added a commit that referenced this pull request Apr 28, 2026
…7681)

## Summary

Brings the v3 benchmarks website to a demo-ready state focused on the
historical-comparison use case (Vortex vs other engines on the same
commit, HEAD vs N commits ago, latest vs first as % delta). Single
process, single binary; SSR `maud` + inline JSON `<script>` +
Chart.js — no client-side framework, no build step, no post-load API
round-trips.

> Branch note: this PR was developed on the harness-assigned branch
> `claude/demo-ready-benchmarks-v3-H5ECI` rather than the
> `claude/benchmarks-v3-ui-historical-comparison` branch the task
> request mentioned, because the session's harness pins the working
> branch (`Develop on branch …`, `NEVER push to a different branch
> without explicit permission`).

## CI note

The `Rust tests (windows-x64)` job is failing on this PR but the
**same job is also failing on the merge commit at the tip of
`ct/benchmarks-v3`** (PR #7671's run, job id `73229326105`, the
commit `8697731` we branched from). The base branch shipped with
that failure tolerated, and our diff only touches
`benchmarks-website/server/` (no Windows-specific paths, no FFI, no
new dependencies on Windows-fragile crates), so this failure is
pre-existing and not caused by the PR. CodSpeed flagged two
`varbinview_zip` regressions in `vortex-array/` — also untouched by
this PR.

## What's new

* **Scoped commit window** — `?n=25|50|100|250|all`, default 100,
  server-side clamp to `[1, 1000]`. SQL splices in a `LIMIT ?` filter
  and binds the value as a parameter (consistent with the rest of
  the file's `params!`-style use); the unbounded path is a separate
  query so the plan stays clean.
* **Group page** — `GET /group/{slug}` renders every chart in one
  group on a single screen. Each card embeds its own
  `<script id="chart-data-N">` payload + sibling `<canvas
  data-chart-index="N">`. `IntersectionObserver` defers `Chart`
  construction until the canvas scrolls into view (mobile-friendly
  + cheap for 22-chart TPC-H groups).
* **Toolbar** — same component on `/chart/{slug}` and `/group/{slug}`.
  Scope buttons + slider, linear/log Y-axis, absolute / `% of
  baseline` mode. URL query string is canonical state; subtitle
  mirrors active state. Slider step is `5` so it can land on every
  preset value (`25`, `50`, `100`, `250`).
* **Rich tooltip** — custom external HTML tooltip with `<short-sha> ·
  YYYY-MM-DD` title; per-series rows render value with friendly unit
  (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured `% delta` vs the prior
  visible commit; footer carries the truncated commit message + a
  GitHub link. Document-level click closes.
* **Legend → URL** — clicking a legend item rewrites
  `?hidden=engine:format|…` via `history.replaceState` (no back-button
  hostility). Permalinks reproduce the view. Delimiter is `|` so
  series names can contain `:` and `,` without escaping.
* **Mobile** — `@media (max-width: 768px)`: single-column chart grid,
  toolbar wraps with ≥ 40 px touch targets, slider expands to fill
  the row, legend pops to the *top* of the chart so it doesn't push
  the chart off-screen on a phone.
* **Landing search** — client-side filter input above the group list.
* **/api/group/{slug}** — JSON sibling to the HTML route, returns
  every chart in the group with payloads inlined.

## What was *not* picked up from `planning/components/web-ui.md`'s
deferred list

Done now (moved out of deferred):
- mobile redesign basics (single column, ≥ 40 px tap targets,
  toolbar wrap)
- engine + series toggling (legend ↔ URL)
- deep-link state (every toolbar control is URL-canonical)
- group landing with the start of "filters" (client-side search)

Still deferred (intentional):
- per-commit drill-down page
- ad-hoc SQL page
- LTTB downsampling
- engine name lookup table + curated colour palettes
- summary cards (geomean ratios, rankings)
- full-screen modal / zoom-pan
- `?mode=delta` (compare-to-main) — parser branch dropped pending
  data shape work; toolbar surface today is only `abs / rel`

## Repro

    INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \
    VORTEX_BENCH_DB=./bench.duckdb \
    cargo run --release -p vortex-bench-server

Then open `http://localhost:3000/`, click any group name (now a link
to `/group/{slug}`), or any chart inside, and play with the toolbar.
Toggle a series in the legend and notice `?hidden=…` appear in the
URL. Resize to phone width to confirm single-column layout, sticky
toolbar wrapping, and legend-on-top.

## Snapshot diffs

Three `.snap` files refreshed by this PR:
- `landing_page.snap` — group names now link to `/group/{slug}`,
  search input added, `data-group-name` for client filter.
- `chart_page_query.snap` — toolbar + indexed
  `<script id="chart-data-0">` + tooltip host element.
- `group_page_query.snap` (new) — group page rendered against the
  fixture DB, `?n=100` pinned for stability.

Run `INSTA_UPDATE=always cargo test -p vortex-bench-server` (or
`cargo insta accept`) to refresh.

## Test plan

- [x] `cargo build -p vortex-bench-server`
- [x] `cargo test -p vortex-bench-server` — 41 tests pass (22 unit +
      10 ingest + 9 web_ui)
- [x] `cargo clippy -p vortex-bench-server --all-targets -- -D
      warnings` — clean
- [x] `cargo +nightly fmt` — no diff
- [ ] `./scripts/public-api.sh` — skipped per CLAUDE.md (leaf binary,
      not in workspace public-api lockfile set)
- [ ] Manual screenshots — couldn't capture from the sandbox; the
      reviewer or follow-up should record landing / single chart with
      toolbar / group desktop / group mobile / tooltip open / log+rel.

## Follow-up review fixes (commits `7042f0d` … `da668a4`)

- `7042f0d` — `LIMIT` value travels as a bound parameter (`LIMIT ?`)
  via `params_from_iter` instead of being interpolated into SQL.
- `9c80bce` — drop the unused `?mode=delta` parser branch in both
  `UiQuery::mode` and `chart-init.js::parseUrl`.
- `d156ab8` — `?hidden=` delimiter is now `|`; new test pins the
  server/client wire agreement.
- `da668a4` — slider `step` lowered to 5 so it can land on every
  preset (`25/50/100/250`).

## Things explicitly NOT changed

- `/api/ingest`, auth, schema, write paths.
- DB migration (none added).
- Existing routes (no renames).
- v2 site at `benchmarks-website/server.js` etc — untouched.
- Single-chart page still works; reuses the same `chart-init.js`.

https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK

---------

Signed-off-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <claude@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants