Skip to content

[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780

Merged
connortsui20 merged 3 commits intodevelopfrom
claude/benchmarks-v3-emitter-split
May 4, 2026
Merged

[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780
connortsui20 merged 3 commits intodevelopfrom
claude/benchmarks-v3-emitter-split

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented May 4, 2026

Summary

Prototype website: http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/

This is the first step we should make before we cut over to the new benchmarks website on #7643

This PR allows the CI actions to additionally post data to a server (on my EC2 instance for now). We want to check that this actually works before we start using this for all of our CI.

Note that this does NOT change how the current benchmarks website works, as this just does a few extra things on top of that.

Also for reviewers, even though this looks like 1k LoC I think the logic here is not that hard to review, a lot of this is boilerplate you can skim over.

Below is a bunch of AI-generated description: read at your own discretion.

Details

Brings the v3 emitter and CI dual-write plumbing from ct/benchmarks-v3 onto develop without the v3 server/website code. CI continues to write v2 results to S3 unchanged; v3 ingest is a side channel that no-ops until the deploy track sets vars.V3_INGEST_URL.

This is item 2 ("CI ingestion wiring") of the v3 production-readiness checklist in benchmarks-website/planning/README.md. The v3 website itself ships in a separate PR off ct/benchmarks-v3 once dual-write is verified healthy in production.

What's included

Rust emitter (vortex-bench)

  • New vortex-bench/src/v3.rs: one record per kind (query_measurement, compression_time, compression_size, random_access_time, vector_search_run) plus a serde-tagged V3Record enum, JSONL writer, and insta snapshot tests. Field shapes match 02-contracts.md.
  • Dataset::v3_dataset_dims() (default (name(), None)) lets Public-BI map to (public-bi, <subset>).
  • compress and runner capture per-iteration timings and provide SqlBenchmarkRunner::v3_records().

Benchmark binaries

  • compress-bench, datafusion-bench, duckdb-bench, lance-bench, random-access-bench, vector-search-bench all gain --gh-json-v3 <path>. Bare records, no envelope. The legacy -d gh-json -o ... flow is untouched.

bench-orchestrator

  • vx-bench run --gh-json-v3 <path> plumbs the flag through to the underlying benchmark binary.

scripts/post-ingest.py (Python 3, stdlib only)

  • Reads JSONL, fills the commit envelope from git show, wraps in {run_meta, commit, records}, POSTs to /api/ingest with Authorization: Bearer ${INGEST_BEARER_TOKEN}. Exits non-zero on 4xx/5xx. No retry/spool — deferred.

Workflows

  • .github/workflows/bench.yml and sql-benchmarks.yml add --gh-json-v3 results.v3.jsonl to the bench runs and a follow-up "Ingest results to v3 server" step.
  • New .github/workflows/v3-commit-metadata.yml POSTs an empty envelope on every push to develop so the v3 commits dim stays populated even when no benchmark ran.

What's NOT included (intentionally)

  • Anything under benchmarks-website/ — the v2 React/Node app stays in production unchanged.
  • Workspace member additions for benchmarks-website/server and benchmarks-website/migrate — those crates don't exist on develop yet.
  • .github/workflows/ci.yml and publish-bench-server.yml changes — they reference vortex-bench-server, which is also v3-server-only.

Risk

Zero. The v3 ingest step is gated on vars.V3_INGEST_URL != '' and continue-on-error: true. If the V3 server is down, the variable is unset, or the bearer secret is missing, the workflow no-ops and the v2 path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a local file only; no network egress from the binaries themselves.

Verify

A CI run on this branch should show the new "Ingest results to v3 server" step running and POSTing successfully to the EC2 host at vars.V3_INGEST_URL.

Follow-up

The v3 website itself (server, migrator, web UI) ships in a separate PR off ct/benchmarks-v3 once dual-write is verified healthy in production. Outbox-style retry on failed POSTs is also a follow-up — not built until we observe a failure in the wild.

Test plan

  • cargo build -p vortex-bench — clean.
  • cargo nextest run -p vortex-bench — 49/49 pass, including 7 new v3 snapshot tests.
  • cargo build -p compress-bench -p datafusion-bench -p duckdb-bench -p lance-bench -p random-access-bench -p vector-search-bench — clean.
  • All six benchmark binaries print --gh-json-v3 <GH_JSON_V3> in --help.
  • python3 scripts/post-ingest.py --help — clean.
  • pytest bench-orchestrator/tests/test_executor.py — 5/5 pass, including 2 new gh_json_v3 tests.
  • cargo +nightly fmt --all — no diff.
  • cargo clippy --all-targets --all-features -p vortex-bench — clean.
  • cargo clippy --all-targets -p compress-bench -p datafusion-bench -p lance-bench -p random-access-bench -p vector-search-bench — clean. duckdb-bench skipped (transitively triggers a pre-existing cognitive_complexity lint in vortex-duckdb/src/convert/expr.rs:47, present on develop and unrelated to these changes).
  • yamllint --strict -c .yamllint.yaml on the three changed/new workflow files — clean.
  • ./scripts/public-api.sh — N/A. All touched Rust crates have publish = false.
  • Real round-trip against the EC2 host — verifies once this branch triggers a CI bench run with V3_INGEST_URL set.

Generated by Claude Code

Brings the v3 emitter and CI dual-write plumbing from ct/benchmarks-v3
onto develop without the v3 server/website code. CI continues to write
v2 results to S3 unchanged; v3 ingest is gated on vars.V3_INGEST_URL
and `continue-on-error: true`, so when the variable is unset (or the
server is unreachable) the workflow no-ops.

vortex-bench:
- New `vortex-bench/src/v3.rs` with one record per `kind`
  (`query_measurement`, `compression_time`, `compression_size`,
  `random_access_time`, `vector_search_run`) plus a serde-tagged
  `V3Record` enum, JSONL writer, and snapshot tests.
- `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets
  Public-BI map to `(public-bi, <subset>)`.
- `compress`/`runner` capture per-iteration timings and provide
  `SqlBenchmarkRunner::v3_records()`.

Benchmark binaries (`compress-bench`, `datafusion-bench`,
`duckdb-bench`, `lance-bench`, `random-access-bench`,
`vector-search-bench`) gain `--gh-json-v3 <path>` for JSONL emission
alongside the existing `gh-json` flow.

bench-orchestrator passes `--gh-json-v3` through `vx-bench run`.

`scripts/post-ingest.py` reads JSONL, fills the `commit` envelope
from `git show`, wraps in `{run_meta, commit, records}`, and POSTs
to `/api/ingest`. Stdlib only.

Workflows:
- `.github/workflows/bench.yml` and `sql-benchmarks.yml` add
  `--gh-json-v3 results.v3.jsonl` and a follow-up "Ingest results to
  v3 server" step.
- New `.github/workflows/v3-commit-metadata.yml` POSTs an empty
  envelope on every push to `develop` so the v3 `commits` dim stays
  populated.

Files intentionally NOT brought over: anything under
`benchmarks-website/`, the workspace member additions for the
v3 server, and workflows depending on the v3 server crate. The v3
website ships in a follow-up PR off `ct/benchmarks-v3` once
dual-write is healthy in production.

Signed-off-by: Claude <noreply@anthropic.com>
@connortsui20 connortsui20 added the changelog/feature A new feature label May 4, 2026 — with Claude
@connortsui20 connortsui20 changed the title feat(bench): emit v3 JSONL records and dual-write to bench server [claude] feat(bench): emit v3 JSONL records and dual-write to bench server May 4, 2026
@connortsui20 connortsui20 changed the base branch from develop to ct/benchmarks-v3 May 4, 2026 16:11
@connortsui20 connortsui20 changed the base branch from ct/benchmarks-v3 to develop May 4, 2026 16:11
@connortsui20 connortsui20 enabled auto-merge (squash) May 4, 2026 16:14
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 4, 2026

Merging this PR will degrade performance by 17.47%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 10 improved benchmarks
❌ 4 regressed benchmarks
✅ 1155 untouched benchmarks
⏩ 138 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_10k_first_chunk_only 270.8 µs 226 µs +19.84%
Simulation take_10k_dispersed 284.8 µs 239.8 µs +18.76%
Simulation patched_take_10k_dispersed 316 µs 285.6 µs +10.64%
Simulation patched_take_10k_first_chunk_only 302.4 µs 272.1 µs +11.14%
Simulation patched_take_10k_adversarial 259 µs 228.7 µs +13.23%
Simulation decompress_rd[f64, (10000, 0.1)] 138.7 µs 122.2 µs +13.48%
Simulation decompress_rd[f32, (100000, 0.01)] 495 µs 582.7 µs -15.06%
Simulation decompress_rd[f64, (100000, 0.01)] 842.6 µs 1,020.7 µs -17.45%
Simulation decompress_rd[f64, (10000, 0.0)] 138.5 µs 122.2 µs +13.33%
Simulation decompress_rd[f32, (100000, 0.0)] 583.4 µs 495.8 µs +17.67%
Simulation decompress_rd[f64, (100000, 0.1)] 842.5 µs 1,020.9 µs -17.47%
Simulation decompress_rd[f64, (10000, 0.01)] 138.6 µs 122.1 µs +13.56%
Simulation decompress_rd[f32, (100000, 0.1)] 495 µs 582.7 µs -15.06%
Simulation decompress_rd[f32, (10000, 0.0)] 94.5 µs 85.9 µs +10.08%

Comparing claude/benchmarks-v3-emitter-split (5bd762b) with develop (e0a0527)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 marked this pull request as draft May 4, 2026 19:15
auto-merge was automatically disabled May 4, 2026 19:15

Pull request was converted to draft

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 marked this pull request as ready for review May 4, 2026 19:51
@connortsui20
Copy link
Copy Markdown
Contributor Author

I'd like to just merge it since it should have 0 effect on the existing code, and if it does break I can revert it quickly

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 enabled auto-merge (squash) May 4, 2026 20:04
@connortsui20 connortsui20 merged commit d3ff1f1 into develop May 4, 2026
61 checks passed
@connortsui20 connortsui20 deleted the claude/benchmarks-v3-emitter-split branch May 4, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants