[claude] chore(benchmarks-migrate): post-alpha cleanup nits#7671
Merged
connortsui20 merged 2 commits intoct/benchmarks-v3from Apr 27, 2026
Merged
[claude] chore(benchmarks-migrate): post-alpha cleanup nits#7671connortsui20 merged 2 commits intoct/benchmarks-v3from
connortsui20 merged 2 commits intoct/benchmarks-v3from
Conversation
Six small fixes left over from the v3 migration alpha: - Scale-factor canonicalization: a single `canonical_scale_factor` helper folds `"1"`, `"1.0"`, `"10"`, `"10.0"` into one form so the data.json.gz and file-sizes paths share `measurement_id`. - Flush-before-count: per-fact summary counters now bump only after the matching `Appender::append_record_batch` flush succeeds, so a partial failure no longer over-reports inserted rows. - Empty author/email/message → NULL: blank human-input strings now round-trip as SQL NULL instead of empty strings the UI rendered as blank cells. Schema columns made nullable; server reads coalesce. - Orphan WAL sweep: covered by a regression test pinning that `open_target_db` removes a `.wal` even when the main DB file was missing. - Random-access dataset: 2-part legacy records (no dataset in name) are now dropped instead of bucketed under a `"random access"` placeholder; 4-part records continue to extract `dataset/pattern` from the raw name. - File-sizes unknown id: matrix ids that don't appear in `KNOWN_FILE_SIZES_SUITES` now surface as `unknown:<id>` rather than masquerading as a real dataset name. Tests: covered each fix with an `rstest`-style or focused test in `tests/classifier.rs`, `tests/end_to_end.rs`, or a new private `migrate::tests` module. Verification: - cargo build -p vortex-bench-migrate - cargo test -p vortex-bench-migrate (all 65 tests pass) - cargo test -p vortex-bench-server (all 22 tests pass; schema / COALESCE change is server-safe) - cargo clippy -p vortex-bench-migrate --all-targets (clean) - cargo fmt on changed files (nightly fmt unavailable, stable run) Signed-off-by: Claude <noreply@anthropic.com>
The post-alpha cleanup made `bin_random_access` reject 2-part legacy shapes by returning None, but `classify_outcome` mapped any None bin to `Outcome::Unknown`, so those records counted against the 5% uncategorized gate. Intentional drops must Skip; only truly unrecognized shapes are Unknown. Reuses the existing `Skip::UnsupportedShape` variant, originally added for "get_group succeeded but classify_v2 didn't" — same semantic shape (recognized but unsupported). Regression test in `tests/classifier.rs` exercises `classify_outcome` (not just the top-level `classify`) so Skip vs Unknown is observable. Signed-off-by: Claude <noreply@anthropic.com>
6 tasks
connortsui20
added a commit
that referenced
this pull request
Apr 27, 2026
…7681) ## Summary Brings the v3 benchmarks website to a demo-ready state focused on the historical-comparison use case (Vortex vs other engines on the same commit, HEAD vs N commits ago, latest vs first as % delta). Single process, single binary; SSR `maud` + inline JSON `<script>` + Chart.js — no client-side framework, no build step, no post-load API round-trips. > Branch note: this PR was developed on the harness-assigned branch > `claude/demo-ready-benchmarks-v3-H5ECI` rather than the > `claude/benchmarks-v3-ui-historical-comparison` branch the task > request mentioned, because the session's harness pins the working > branch (`Develop on branch …`, `NEVER push to a different branch > without explicit permission`). ## CI note The `Rust tests (windows-x64)` job is failing on this PR but the **same job is also failing on the merge commit at the tip of `ct/benchmarks-v3`** (PR #7671's run, job id `73229326105`, the commit `8697731` we branched from). The base branch shipped with that failure tolerated, and our diff only touches `benchmarks-website/server/` (no Windows-specific paths, no FFI, no new dependencies on Windows-fragile crates), so this failure is pre-existing and not caused by the PR. CodSpeed flagged two `varbinview_zip` regressions in `vortex-array/` — also untouched by this PR. ## What's new * **Scoped commit window** — `?n=25|50|100|250|all`, default 100, server-side clamp to `[1, 1000]`. SQL splices in a `LIMIT ?` filter and binds the value as a parameter (consistent with the rest of the file's `params!`-style use); the unbounded path is a separate query so the plan stays clean. * **Group page** — `GET /group/{slug}` renders every chart in one group on a single screen. Each card embeds its own `<script id="chart-data-N">` payload + sibling `<canvas data-chart-index="N">`. `IntersectionObserver` defers `Chart` construction until the canvas scrolls into view (mobile-friendly + cheap for 22-chart TPC-H groups). * **Toolbar** — same component on `/chart/{slug}` and `/group/{slug}`. Scope buttons + slider, linear/log Y-axis, absolute / `% of baseline` mode. URL query string is canonical state; subtitle mirrors active state. Slider step is `5` so it can land on every preset value (`25`, `50`, `100`, `250`). * **Rich tooltip** — custom external HTML tooltip with `<short-sha> · YYYY-MM-DD` title; per-series rows render value with friendly unit (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured `% delta` vs the prior visible commit; footer carries the truncated commit message + a GitHub link. Document-level click closes. * **Legend → URL** — clicking a legend item rewrites `?hidden=engine:format|…` via `history.replaceState` (no back-button hostility). Permalinks reproduce the view. Delimiter is `|` so series names can contain `:` and `,` without escaping. * **Mobile** — `@media (max-width: 768px)`: single-column chart grid, toolbar wraps with ≥ 40 px touch targets, slider expands to fill the row, legend pops to the *top* of the chart so it doesn't push the chart off-screen on a phone. * **Landing search** — client-side filter input above the group list. * **/api/group/{slug}** — JSON sibling to the HTML route, returns every chart in the group with payloads inlined. ## What was *not* picked up from `planning/components/web-ui.md`'s deferred list Done now (moved out of deferred): - mobile redesign basics (single column, ≥ 40 px tap targets, toolbar wrap) - engine + series toggling (legend ↔ URL) - deep-link state (every toolbar control is URL-canonical) - group landing with the start of "filters" (client-side search) Still deferred (intentional): - per-commit drill-down page - ad-hoc SQL page - LTTB downsampling - engine name lookup table + curated colour palettes - summary cards (geomean ratios, rankings) - full-screen modal / zoom-pan - `?mode=delta` (compare-to-main) — parser branch dropped pending data shape work; toolbar surface today is only `abs / rel` ## Repro INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \ VORTEX_BENCH_DB=./bench.duckdb \ cargo run --release -p vortex-bench-server Then open `http://localhost:3000/`, click any group name (now a link to `/group/{slug}`), or any chart inside, and play with the toolbar. Toggle a series in the legend and notice `?hidden=…` appear in the URL. Resize to phone width to confirm single-column layout, sticky toolbar wrapping, and legend-on-top. ## Snapshot diffs Three `.snap` files refreshed by this PR: - `landing_page.snap` — group names now link to `/group/{slug}`, search input added, `data-group-name` for client filter. - `chart_page_query.snap` — toolbar + indexed `<script id="chart-data-0">` + tooltip host element. - `group_page_query.snap` (new) — group page rendered against the fixture DB, `?n=100` pinned for stability. Run `INSTA_UPDATE=always cargo test -p vortex-bench-server` (or `cargo insta accept`) to refresh. ## Test plan - [x] `cargo build -p vortex-bench-server` - [x] `cargo test -p vortex-bench-server` — 41 tests pass (22 unit + 10 ingest + 9 web_ui) - [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings` — clean - [x] `cargo +nightly fmt` — no diff - [ ] `./scripts/public-api.sh` — skipped per CLAUDE.md (leaf binary, not in workspace public-api lockfile set) - [ ] Manual screenshots — couldn't capture from the sandbox; the reviewer or follow-up should record landing / single chart with toolbar / group desktop / group mobile / tooltip open / log+rel. ## Follow-up review fixes (commits `7042f0d` … `da668a4`) - `7042f0d` — `LIMIT` value travels as a bound parameter (`LIMIT ?`) via `params_from_iter` instead of being interpolated into SQL. - `9c80bce` — drop the unused `?mode=delta` parser branch in both `UiQuery::mode` and `chart-init.js::parseUrl`. - `d156ab8` — `?hidden=` delimiter is now `|`; new test pins the server/client wire agreement. - `da668a4` — slider `step` lowered to 5 so it can land on every preset (`25/50/100/250`). ## Things explicitly NOT changed - `/api/ingest`, auth, schema, write paths. - DB migration (none added). - Existing routes (no renames). - v2 site at `benchmarks-website/server.js` etc — untouched. - Single-chart page still works; reuses the same `chart-init.js`. https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK --------- Signed-off-by: Claude <claude@anthropic.com> Co-authored-by: Claude <claude@anthropic.com>
connortsui20
added a commit
that referenced
this pull request
Apr 28, 2026
Six small fixes left over from the v3 migration alpha. All paths relative to `benchmarks-website/migrate/` unless noted. ## Fixes - **Scale-factor canonicalization** (`src/classifier.rs::bin_compression_size`, `src/migrate.rs::migrate_file_sizes`, helper in `src/v2.rs`): both paths now route the v2 SF string through `canonical_scale_factor`, which parses to `f64` and formats with no trailing zeros. Without this, `"1"` vs `"1.0"` and `"10"` vs `"10.0"` would produce different `dataset_variant` strings and prevent the data.json.gz and file-sizes-*.json.gz rows from sharing a `measurement_id`. - **Summary counter timing** (`src/migrate.rs::run`): per-fact counters used to be set from accumulator length *before* the flush, so a flush failure would print a summary that lied. Refactored into a `flush_all` helper that bumps `summary.<fact>_inserted` from the flushed `RecordBatch::num_rows()` only after each `Appender::append_record_batch` succeeds. - **Empty-string normalization in commits** (`src/commits.rs`, `benchmarks-website/server/src/schema.rs`, `benchmarks-website/server/src/api.rs`): `message`, `author_name`/`email`, `committer_name`/`email` now bind as `Option<String>` and store SQL `NULL` when v2 supplied an empty or whitespace-only string. Schema columns made nullable; server reads use `COALESCE(c.message, '')` so the existing `String` decoder still works. - **Orphan WAL cleanup** (`src/migrate.rs::open_target_db`): the existing code already attempts `remove_if_exists` on the `.wal` regardless of whether the main file was present; pinned the behavior with a regression test that stages an orphan `.wal` (no main file) and asserts the orphan bytes don't survive `open_target_db`. - **Random-access dataset extraction** (`src/classifier.rs::bin_random_access`): 4-part records `random-access/<dataset>/<pattern>/<format>-tokio-local-disk` continue to extract `dataset/pattern` from the raw name. 2-part legacy records carry no dataset and used to render under the placeholder `"random access"`; they're now dropped to keep the v3 dataset column meaningful. - **`migrate_file_sizes` dataset fallback** (`src/migrate.rs::migrate_file_sizes`): when the matrix id stripped from `file-sizes-<id>.json.gz` isn't on the `KNOWN_FILE_SIZES_SUITES` allowlist, the fallback now emits `unknown:<id>` so the UI clearly flags it instead of presenting it as a real dataset. ## Tests Each fix has a focused regression test (`rstest` parametrization where useful): - `tests/classifier.rs::compression_size_scale_factor_canonicalizes` covering `"1"`, `"1.0"`, `"10"`, `"10.0"`, `"0.1"`, whitespace, and `""`. - `tests/classifier.rs::unmapped_records_yield_none` extended with `random_access_2_part_legacy` and `random_access_3_part`. - `migrate::tests::flush_all_does_not_overcount_on_failure` (private unit test that drops `compression_times` to force the second flush to fail and asserts only the queries counter is set). - `tests/end_to_end.rs::summary_counts_match_actual_rows_on_success` (sister invariant for the success path). - `tests/end_to_end.rs::empty_author_email_stored_as_null`. - `tests/end_to_end.rs::open_target_db_removes_orphan_wal`. - `tests/end_to_end.rs::file_sizes_unknown_id_falls_back_to_unknown_prefix` and `file_sizes_known_id_uses_id_directly`. - `tests/end_to_end.rs::compression_size_data_and_file_sizes_merge_with_canonical_sf` (cross-path SF canonicalization end to end). ## Verification - `cargo build -p vortex-bench-migrate` — clean. - `cargo test -p vortex-bench-migrate` — 7 unit + 46 classifier + 12 end-to-end tests all pass. - `cargo test -p vortex-bench-server` — 6 unit + 10 ingest + 6 web_ui tests pass; schema and `COALESCE` changes are server-safe. - `cargo clippy -p vortex-bench-migrate --all-targets` — clean. - `cargo fmt` on changed files (nightly fmt unavailable in this sandbox; ran with stable, which is a no-op for the imports-granularity options the repo's `rustfmt.toml` gates on nightly). - Skipped `./scripts/public-api.sh`: migrate is a leaf binary outside the public-api lockfile set, and the only newly `pub` item is the internal `canonical_scale_factor` helper. Signed-off-by: Claude <noreply@anthropic.com> --- _Generated by [Claude Code](https://claude.ai/code/session_012XyYJRpcGFxmJXdTJuW8Ff)_ --------- Signed-off-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
connortsui20
added a commit
that referenced
this pull request
Apr 28, 2026
…7681) ## Summary Brings the v3 benchmarks website to a demo-ready state focused on the historical-comparison use case (Vortex vs other engines on the same commit, HEAD vs N commits ago, latest vs first as % delta). Single process, single binary; SSR `maud` + inline JSON `<script>` + Chart.js — no client-side framework, no build step, no post-load API round-trips. > Branch note: this PR was developed on the harness-assigned branch > `claude/demo-ready-benchmarks-v3-H5ECI` rather than the > `claude/benchmarks-v3-ui-historical-comparison` branch the task > request mentioned, because the session's harness pins the working > branch (`Develop on branch …`, `NEVER push to a different branch > without explicit permission`). ## CI note The `Rust tests (windows-x64)` job is failing on this PR but the **same job is also failing on the merge commit at the tip of `ct/benchmarks-v3`** (PR #7671's run, job id `73229326105`, the commit `8697731` we branched from). The base branch shipped with that failure tolerated, and our diff only touches `benchmarks-website/server/` (no Windows-specific paths, no FFI, no new dependencies on Windows-fragile crates), so this failure is pre-existing and not caused by the PR. CodSpeed flagged two `varbinview_zip` regressions in `vortex-array/` — also untouched by this PR. ## What's new * **Scoped commit window** — `?n=25|50|100|250|all`, default 100, server-side clamp to `[1, 1000]`. SQL splices in a `LIMIT ?` filter and binds the value as a parameter (consistent with the rest of the file's `params!`-style use); the unbounded path is a separate query so the plan stays clean. * **Group page** — `GET /group/{slug}` renders every chart in one group on a single screen. Each card embeds its own `<script id="chart-data-N">` payload + sibling `<canvas data-chart-index="N">`. `IntersectionObserver` defers `Chart` construction until the canvas scrolls into view (mobile-friendly + cheap for 22-chart TPC-H groups). * **Toolbar** — same component on `/chart/{slug}` and `/group/{slug}`. Scope buttons + slider, linear/log Y-axis, absolute / `% of baseline` mode. URL query string is canonical state; subtitle mirrors active state. Slider step is `5` so it can land on every preset value (`25`, `50`, `100`, `250`). * **Rich tooltip** — custom external HTML tooltip with `<short-sha> · YYYY-MM-DD` title; per-series rows render value with friendly unit (ns→µs→ms→s, B→KiB→MiB→GiB) and a coloured `% delta` vs the prior visible commit; footer carries the truncated commit message + a GitHub link. Document-level click closes. * **Legend → URL** — clicking a legend item rewrites `?hidden=engine:format|…` via `history.replaceState` (no back-button hostility). Permalinks reproduce the view. Delimiter is `|` so series names can contain `:` and `,` without escaping. * **Mobile** — `@media (max-width: 768px)`: single-column chart grid, toolbar wraps with ≥ 40 px touch targets, slider expands to fill the row, legend pops to the *top* of the chart so it doesn't push the chart off-screen on a phone. * **Landing search** — client-side filter input above the group list. * **/api/group/{slug}** — JSON sibling to the HTML route, returns every chart in the group with payloads inlined. ## What was *not* picked up from `planning/components/web-ui.md`'s deferred list Done now (moved out of deferred): - mobile redesign basics (single column, ≥ 40 px tap targets, toolbar wrap) - engine + series toggling (legend ↔ URL) - deep-link state (every toolbar control is URL-canonical) - group landing with the start of "filters" (client-side search) Still deferred (intentional): - per-commit drill-down page - ad-hoc SQL page - LTTB downsampling - engine name lookup table + curated colour palettes - summary cards (geomean ratios, rankings) - full-screen modal / zoom-pan - `?mode=delta` (compare-to-main) — parser branch dropped pending data shape work; toolbar surface today is only `abs / rel` ## Repro INGEST_BEARER_TOKEN=$(openssl rand -hex 32) \ VORTEX_BENCH_DB=./bench.duckdb \ cargo run --release -p vortex-bench-server Then open `http://localhost:3000/`, click any group name (now a link to `/group/{slug}`), or any chart inside, and play with the toolbar. Toggle a series in the legend and notice `?hidden=…` appear in the URL. Resize to phone width to confirm single-column layout, sticky toolbar wrapping, and legend-on-top. ## Snapshot diffs Three `.snap` files refreshed by this PR: - `landing_page.snap` — group names now link to `/group/{slug}`, search input added, `data-group-name` for client filter. - `chart_page_query.snap` — toolbar + indexed `<script id="chart-data-0">` + tooltip host element. - `group_page_query.snap` (new) — group page rendered against the fixture DB, `?n=100` pinned for stability. Run `INSTA_UPDATE=always cargo test -p vortex-bench-server` (or `cargo insta accept`) to refresh. ## Test plan - [x] `cargo build -p vortex-bench-server` - [x] `cargo test -p vortex-bench-server` — 41 tests pass (22 unit + 10 ingest + 9 web_ui) - [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings` — clean - [x] `cargo +nightly fmt` — no diff - [ ] `./scripts/public-api.sh` — skipped per CLAUDE.md (leaf binary, not in workspace public-api lockfile set) - [ ] Manual screenshots — couldn't capture from the sandbox; the reviewer or follow-up should record landing / single chart with toolbar / group desktop / group mobile / tooltip open / log+rel. ## Follow-up review fixes (commits `7042f0d` … `da668a4`) - `7042f0d` — `LIMIT` value travels as a bound parameter (`LIMIT ?`) via `params_from_iter` instead of being interpolated into SQL. - `9c80bce` — drop the unused `?mode=delta` parser branch in both `UiQuery::mode` and `chart-init.js::parseUrl`. - `d156ab8` — `?hidden=` delimiter is now `|`; new test pins the server/client wire agreement. - `da668a4` — slider `step` lowered to 5 so it can land on every preset (`25/50/100/250`). ## Things explicitly NOT changed - `/api/ingest`, auth, schema, write paths. - DB migration (none added). - Existing routes (no renames). - v2 site at `benchmarks-website/server.js` etc — untouched. - Single-chart page still works; reuses the same `chart-init.js`. https://claude.ai/code/session_015Nc73ihs9TUdx7QzLUZudK --------- Signed-off-by: Claude <claude@anthropic.com> Co-authored-by: Claude <claude@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Six small fixes left over from the v3 migration alpha. All paths
relative to
benchmarks-website/migrate/unless noted.Fixes
Scale-factor canonicalization (
src/classifier.rs::bin_compression_size,src/migrate.rs::migrate_file_sizes, helper insrc/v2.rs): both pathsnow route the v2 SF string through
canonical_scale_factor, which parsesto
f64and formats with no trailing zeros. Without this,"1"vs"1.0"and"10"vs"10.0"would produce differentdataset_variantstrings and prevent the data.json.gz and file-sizes-*.json.gz rows from
sharing a
measurement_id.Summary counter timing (
src/migrate.rs::run): per-fact countersused to be set from accumulator length before the flush, so a flush
failure would print a summary that lied. Refactored into a
flush_allhelper that bumps
summary.<fact>_insertedfrom the flushedRecordBatch::num_rows()only after eachAppender::append_record_batchsucceeds.
Empty-string normalization in commits (
src/commits.rs,benchmarks-website/server/src/schema.rs,benchmarks-website/server/src/api.rs):message,author_name/email,committer_name/emailnow bind asOption<String>and store SQLNULLwhen v2 supplied an empty orwhitespace-only string. Schema columns made nullable; server reads
use
COALESCE(c.message, '')so the existingStringdecoder stillworks.
Orphan WAL cleanup (
src/migrate.rs::open_target_db): the existingcode already attempts
remove_if_existson the.walregardless ofwhether the main file was present; pinned the behavior with a
regression test that stages an orphan
.wal(no main file) andasserts the orphan bytes don't survive
open_target_db.Random-access dataset extraction (
src/classifier.rs::bin_random_access):4-part records
random-access/<dataset>/<pattern>/<format>-tokio-local-diskcontinue to extract
dataset/patternfrom the raw name. 2-part legacyrecords carry no dataset and used to render under the placeholder
"random access"; they're now dropped to keep the v3 dataset columnmeaningful.
migrate_file_sizesdataset fallback (src/migrate.rs::migrate_file_sizes):when the matrix id stripped from
file-sizes-<id>.json.gzisn't onthe
KNOWN_FILE_SIZES_SUITESallowlist, the fallback now emitsunknown:<id>so the UI clearly flags it instead of presenting itas a real dataset.
Tests
Each fix has a focused regression test (
rstestparametrization whereuseful):
tests/classifier.rs::compression_size_scale_factor_canonicalizescovering
"1","1.0","10","10.0","0.1", whitespace, and"".tests/classifier.rs::unmapped_records_yield_noneextended withrandom_access_2_part_legacyandrandom_access_3_part.migrate::tests::flush_all_does_not_overcount_on_failure(privateunit test that drops
compression_timesto force the second flushto fail and asserts only the queries counter is set).
tests/end_to_end.rs::summary_counts_match_actual_rows_on_success(sister invariant for the success path).
tests/end_to_end.rs::empty_author_email_stored_as_null.tests/end_to_end.rs::open_target_db_removes_orphan_wal.tests/end_to_end.rs::file_sizes_unknown_id_falls_back_to_unknown_prefixand
file_sizes_known_id_uses_id_directly.tests/end_to_end.rs::compression_size_data_and_file_sizes_merge_with_canonical_sf(cross-path SF canonicalization end to end).
Verification
cargo build -p vortex-bench-migrate— clean.cargo test -p vortex-bench-migrate— 7 unit + 46 classifier + 12end-to-end tests all pass.
cargo test -p vortex-bench-server— 6 unit + 10 ingest + 6 web_uitests pass; schema and
COALESCEchanges are server-safe.cargo clippy -p vortex-bench-migrate --all-targets— clean.cargo fmton changed files (nightly fmt unavailable in thissandbox; ran with stable, which is a no-op for the imports-granularity
options the repo's
rustfmt.tomlgates on nightly)../scripts/public-api.sh: migrate is a leaf binary outsidethe public-api lockfile set, and the only newly
pubitem is theinternal
canonical_scale_factorhelper.Signed-off-by: Claude noreply@anthropic.com
Generated by Claude Code