[claude] feat(bench): emit v3 JSONL records and dual-write to bench server by connortsui20 · Pull Request #7780 · vortex-data/vortex

connortsui20 · 2026-05-04T16:00:08Z

Summary

Prototype website: http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/

This is the first step we should make before we cut over to the new benchmarks website on #7643

This PR allows the CI actions to additionally post data to a server (on my EC2 instance for now). We want to check that this actually works before we start using this for all of our CI.

Note that this does NOT change how the current benchmarks website works, as this just does a few extra things on top of that.

Also for reviewers, even though this looks like 1k LoC I think the logic here is not that hard to review, a lot of this is boilerplate you can skim over.

Below is a bunch of AI-generated description: read at your own discretion.

Details

Brings the v3 emitter and CI dual-write plumbing from ct/benchmarks-v3 onto develop without the v3 server/website code. CI continues to write v2 results to S3 unchanged; v3 ingest is a side channel that no-ops until the deploy track sets vars.V3_INGEST_URL.

This is item 2 ("CI ingestion wiring") of the v3 production-readiness checklist in benchmarks-website/planning/README.md. The v3 website itself ships in a separate PR off ct/benchmarks-v3 once dual-write is verified healthy in production.

What's included

Rust emitter (vortex-bench)

New vortex-bench/src/v3.rs: one record per kind (query_measurement, compression_time, compression_size, random_access_time, vector_search_run) plus a serde-tagged V3Record enum, JSONL writer, and insta snapshot tests. Field shapes match 02-contracts.md.
Dataset::v3_dataset_dims() (default (name(), None)) lets Public-BI map to (public-bi, <subset>).
compress and runner capture per-iteration timings and provide SqlBenchmarkRunner::v3_records().

Benchmark binaries

compress-bench, datafusion-bench, duckdb-bench, lance-bench, random-access-bench, vector-search-bench all gain --gh-json-v3 <path>. Bare records, no envelope. The legacy -d gh-json -o ... flow is untouched.

bench-orchestrator

vx-bench run --gh-json-v3 <path> plumbs the flag through to the underlying benchmark binary.

scripts/post-ingest.py (Python 3, stdlib only)

Reads JSONL, fills the commit envelope from git show, wraps in {run_meta, commit, records}, POSTs to /api/ingest with Authorization: Bearer ${INGEST_BEARER_TOKEN}. Exits non-zero on 4xx/5xx. No retry/spool — deferred.

Workflows

.github/workflows/bench.yml and sql-benchmarks.yml add --gh-json-v3 results.v3.jsonl to the bench runs and a follow-up "Ingest results to v3 server" step.
New .github/workflows/v3-commit-metadata.yml POSTs an empty envelope on every push to develop so the v3 commits dim stays populated even when no benchmark ran.

What's NOT included (intentionally)

Anything under benchmarks-website/ — the v2 React/Node app stays in production unchanged.
Workspace member additions for benchmarks-website/server and benchmarks-website/migrate — those crates don't exist on develop yet.
.github/workflows/ci.yml and publish-bench-server.yml changes — they reference vortex-bench-server, which is also v3-server-only.

Risk

Zero. The v3 ingest step is gated on vars.V3_INGEST_URL != '' and continue-on-error: true. If the V3 server is down, the variable is unset, or the bearer secret is missing, the workflow no-ops and the v2 path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a local file only; no network egress from the binaries themselves.

Verify

A CI run on this branch should show the new "Ingest results to v3 server" step running and POSTing successfully to the EC2 host at vars.V3_INGEST_URL.

Follow-up

The v3 website itself (server, migrator, web UI) ships in a separate PR off ct/benchmarks-v3 once dual-write is verified healthy in production. Outbox-style retry on failed POSTs is also a follow-up — not built until we observe a failure in the wild.

Test plan

Generated by Claude Code

Brings the v3 emitter and CI dual-write plumbing from ct/benchmarks-v3 onto develop without the v3 server/website code. CI continues to write v2 results to S3 unchanged; v3 ingest is gated on vars.V3_INGEST_URL and `continue-on-error: true`, so when the variable is unset (or the server is unreachable) the workflow no-ops. vortex-bench: - New `vortex-bench/src/v3.rs` with one record per `kind` (`query_measurement`, `compression_time`, `compression_size`, `random_access_time`, `vector_search_run`) plus a serde-tagged `V3Record` enum, JSONL writer, and snapshot tests. - `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets Public-BI map to `(public-bi, <subset>)`. - `compress`/`runner` capture per-iteration timings and provide `SqlBenchmarkRunner::v3_records()`. Benchmark binaries (`compress-bench`, `datafusion-bench`, `duckdb-bench`, `lance-bench`, `random-access-bench`, `vector-search-bench`) gain `--gh-json-v3 <path>` for JSONL emission alongside the existing `gh-json` flow. bench-orchestrator passes `--gh-json-v3` through `vx-bench run`. `scripts/post-ingest.py` reads JSONL, fills the `commit` envelope from `git show`, wraps in `{run_meta, commit, records}`, and POSTs to `/api/ingest`. Stdlib only. Workflows: - `.github/workflows/bench.yml` and `sql-benchmarks.yml` add `--gh-json-v3 results.v3.jsonl` and a follow-up "Ingest results to v3 server" step. - New `.github/workflows/v3-commit-metadata.yml` POSTs an empty envelope on every push to `develop` so the v3 `commits` dim stays populated. Files intentionally NOT brought over: anything under `benchmarks-website/`, the workspace member additions for the v3 server, and workflows depending on the v3 server crate. The v3 website ships in a follow-up PR off `ct/benchmarks-v3` once dual-write is healthy in production. Signed-off-by: Claude <noreply@anthropic.com>

codspeed-hq · 2026-05-04T16:25:28Z

Merging this PR will degrade performance by 17.47%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 10 improved benchmarks
❌ 4 regressed benchmarks
✅ 1155 untouched benchmarks
⏩ 138 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`take_10k_first_chunk_only`	270.8 µs	226 µs	+19.84%
⚡	Simulation	`take_10k_dispersed`	284.8 µs	239.8 µs	+18.76%
⚡	Simulation	`patched_take_10k_dispersed`	316 µs	285.6 µs	+10.64%
⚡	Simulation	`patched_take_10k_first_chunk_only`	302.4 µs	272.1 µs	+11.14%
⚡	Simulation	`patched_take_10k_adversarial`	259 µs	228.7 µs	+13.23%
⚡	Simulation	`decompress_rd[f64, (10000, 0.1)]`	138.7 µs	122.2 µs	+13.48%
❌	Simulation	`decompress_rd[f32, (100000, 0.01)]`	495 µs	582.7 µs	-15.06%
❌	Simulation	`decompress_rd[f64, (100000, 0.01)]`	842.6 µs	1,020.7 µs	-17.45%
⚡	Simulation	`decompress_rd[f64, (10000, 0.0)]`	138.5 µs	122.2 µs	+13.33%
⚡	Simulation	`decompress_rd[f32, (100000, 0.0)]`	583.4 µs	495.8 µs	+17.67%
❌	Simulation	`decompress_rd[f64, (100000, 0.1)]`	842.5 µs	1,020.9 µs	-17.47%
⚡	Simulation	`decompress_rd[f64, (10000, 0.01)]`	138.6 µs	122.1 µs	+13.56%
❌	Simulation	`decompress_rd[f32, (100000, 0.1)]`	495 µs	582.7 µs	-15.06%
⚡	Simulation	`decompress_rd[f32, (10000, 0.0)]`	94.5 µs	85.9 µs	+10.08%

_{Comparing claude/benchmarks-v3-emitter-split (5bd762b) with develop (e0a0527)}

138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 · 2026-05-04T19:52:49Z

I'd like to just merge it since it should have 0 effect on the existing code, and if it does break I can revert it quickly

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 added the changelog/feature A new feature label May 4, 2026 — with Claude

connortsui20 changed the title ~~feat(bench): emit v3 JSONL records and dual-write to bench server~~ [claude] feat(bench): emit v3 JSONL records and dual-write to bench server May 4, 2026

connortsui20 changed the base branch from develop to ct/benchmarks-v3 May 4, 2026 16:11

connortsui20 changed the base branch from ct/benchmarks-v3 to develop May 4, 2026 16:11

connortsui20 enabled auto-merge (squash) May 4, 2026 16:14

connortsui20 requested review from AdamGS, joseph-isaacs and lwwmanning May 4, 2026 18:57

connortsui20 marked this pull request as draft May 4, 2026 19:15

auto-merge was automatically disabled May 4, 2026 19:15
Pull request was converted to draft

fix issues

ffd090b

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 marked this pull request as ready for review May 4, 2026 19:51

lwwmanning approved these changes May 4, 2026

View reviewed changes

remove q0 for most groups

5bd762b

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 enabled auto-merge (squash) May 4, 2026 20:04

connortsui20 merged commit d3ff1f1 into develop May 4, 2026
61 checks passed

connortsui20 deleted the claude/benchmarks-v3-emitter-split branch May 4, 2026 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780

[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780
connortsui20 merged 3 commits intodevelopfrom
claude/benchmarks-v3-emitter-split

connortsui20 commented May 4, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 4, 2026 •

edited

Loading

Uh oh!

connortsui20 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

connortsui20 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

What's NOT included (intentionally)

Risk

Verify

Follow-up

Test plan

Uh oh!

codspeed-hq Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 17.47%

Performance Changes

Footnotes

Uh oh!

connortsui20 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

connortsui20 commented May 4, 2026 •

edited

Loading

codspeed-hq Bot commented May 4, 2026 •

edited

Loading