[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780
[claude] feat(bench): emit v3 JSONL records and dual-write to bench server#7780connortsui20 merged 3 commits intodevelopfrom
Conversation
Brings the v3 emitter and CI dual-write plumbing from ct/benchmarks-v3
onto develop without the v3 server/website code. CI continues to write
v2 results to S3 unchanged; v3 ingest is gated on vars.V3_INGEST_URL
and `continue-on-error: true`, so when the variable is unset (or the
server is unreachable) the workflow no-ops.
vortex-bench:
- New `vortex-bench/src/v3.rs` with one record per `kind`
(`query_measurement`, `compression_time`, `compression_size`,
`random_access_time`, `vector_search_run`) plus a serde-tagged
`V3Record` enum, JSONL writer, and snapshot tests.
- `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets
Public-BI map to `(public-bi, <subset>)`.
- `compress`/`runner` capture per-iteration timings and provide
`SqlBenchmarkRunner::v3_records()`.
Benchmark binaries (`compress-bench`, `datafusion-bench`,
`duckdb-bench`, `lance-bench`, `random-access-bench`,
`vector-search-bench`) gain `--gh-json-v3 <path>` for JSONL emission
alongside the existing `gh-json` flow.
bench-orchestrator passes `--gh-json-v3` through `vx-bench run`.
`scripts/post-ingest.py` reads JSONL, fills the `commit` envelope
from `git show`, wraps in `{run_meta, commit, records}`, and POSTs
to `/api/ingest`. Stdlib only.
Workflows:
- `.github/workflows/bench.yml` and `sql-benchmarks.yml` add
`--gh-json-v3 results.v3.jsonl` and a follow-up "Ingest results to
v3 server" step.
- New `.github/workflows/v3-commit-metadata.yml` POSTs an empty
envelope on every push to `develop` so the v3 `commits` dim stays
populated.
Files intentionally NOT brought over: anything under
`benchmarks-website/`, the workspace member additions for the
v3 server, and workflows depending on the v3 server crate. The v3
website ships in a follow-up PR off `ct/benchmarks-v3` once
dual-write is healthy in production.
Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will degrade performance by 17.47%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | take_10k_first_chunk_only |
270.8 µs | 226 µs | +19.84% |
| ⚡ | Simulation | take_10k_dispersed |
284.8 µs | 239.8 µs | +18.76% |
| ⚡ | Simulation | patched_take_10k_dispersed |
316 µs | 285.6 µs | +10.64% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
302.4 µs | 272.1 µs | +11.14% |
| ⚡ | Simulation | patched_take_10k_adversarial |
259 µs | 228.7 µs | +13.23% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.1)] |
138.7 µs | 122.2 µs | +13.48% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.01)] |
495 µs | 582.7 µs | -15.06% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.01)] |
842.6 µs | 1,020.7 µs | -17.45% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.0)] |
138.5 µs | 122.2 µs | +13.33% |
| ⚡ | Simulation | decompress_rd[f32, (100000, 0.0)] |
583.4 µs | 495.8 µs | +17.67% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.1)] |
842.5 µs | 1,020.9 µs | -17.47% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.01)] |
138.6 µs | 122.1 µs | +13.56% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.1)] |
495 µs | 582.7 µs | -15.06% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.0)] |
94.5 µs | 85.9 µs | +10.08% |
Comparing claude/benchmarks-v3-emitter-split (5bd762b) with develop (e0a0527)
Footnotes
-
138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
|
I'd like to just merge it since it should have 0 effect on the existing code, and if it does break I can revert it quickly |
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Summary
Prototype website: http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/
This is the first step we should make before we cut over to the new benchmarks website on #7643
This PR allows the CI actions to additionally post data to a server (on my EC2 instance for now). We want to check that this actually works before we start using this for all of our CI.
Note that this does NOT change how the current benchmarks website works, as this just does a few extra things on top of that.
Also for reviewers, even though this looks like 1k LoC I think the logic here is not that hard to review, a lot of this is boilerplate you can skim over.
Below is a bunch of AI-generated description: read at your own discretion.
Details
Brings the v3 emitter and CI dual-write plumbing from
ct/benchmarks-v3ontodevelopwithout the v3 server/website code. CI continues to write v2 results to S3 unchanged; v3 ingest is a side channel that no-ops until the deploy track setsvars.V3_INGEST_URL.This is item 2 ("CI ingestion wiring") of the v3 production-readiness checklist in
benchmarks-website/planning/README.md. The v3 website itself ships in a separate PR offct/benchmarks-v3once dual-write is verified healthy in production.What's included
Rust emitter (
vortex-bench)vortex-bench/src/v3.rs: one record perkind(query_measurement,compression_time,compression_size,random_access_time,vector_search_run) plus a serde-taggedV3Recordenum, JSONL writer, andinstasnapshot tests. Field shapes match02-contracts.md.Dataset::v3_dataset_dims()(default(name(), None)) lets Public-BI map to(public-bi, <subset>).compressandrunnercapture per-iteration timings and provideSqlBenchmarkRunner::v3_records().Benchmark binaries
compress-bench,datafusion-bench,duckdb-bench,lance-bench,random-access-bench,vector-search-benchall gain--gh-json-v3 <path>. Bare records, no envelope. The legacy-d gh-json -o ...flow is untouched.bench-orchestratorvx-bench run --gh-json-v3 <path>plumbs the flag through to the underlying benchmark binary.scripts/post-ingest.py(Python 3, stdlib only)commitenvelope fromgit show, wraps in{run_meta, commit, records}, POSTs to/api/ingestwithAuthorization: Bearer ${INGEST_BEARER_TOKEN}. Exits non-zero on 4xx/5xx. No retry/spool — deferred.Workflows
.github/workflows/bench.ymlandsql-benchmarks.ymladd--gh-json-v3 results.v3.jsonlto the bench runs and a follow-up "Ingest results to v3 server" step..github/workflows/v3-commit-metadata.ymlPOSTs an empty envelope on every push todevelopso the v3commitsdim stays populated even when no benchmark ran.What's NOT included (intentionally)
benchmarks-website/— the v2 React/Node app stays in production unchanged.benchmarks-website/serverandbenchmarks-website/migrate— those crates don't exist ondevelopyet..github/workflows/ci.ymlandpublish-bench-server.ymlchanges — they referencevortex-bench-server, which is also v3-server-only.Risk
Zero. The v3 ingest step is gated on
vars.V3_INGEST_URL != ''andcontinue-on-error: true. If the V3 server is down, the variable is unset, or the bearer secret is missing, the workflow no-ops and the v2 path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a local file only; no network egress from the binaries themselves.Verify
A CI run on this branch should show the new "Ingest results to v3 server" step running and POSTing successfully to the EC2 host at
vars.V3_INGEST_URL.Follow-up
The v3 website itself (server, migrator, web UI) ships in a separate PR off
ct/benchmarks-v3once dual-write is verified healthy in production. Outbox-style retry on failed POSTs is also a follow-up — not built until we observe a failure in the wild.Test plan
cargo build -p vortex-bench— clean.cargo nextest run -p vortex-bench— 49/49 pass, including 7 new v3 snapshot tests.cargo build -p compress-bench -p datafusion-bench -p duckdb-bench -p lance-bench -p random-access-bench -p vector-search-bench— clean.--gh-json-v3 <GH_JSON_V3>in--help.python3 scripts/post-ingest.py --help— clean.pytest bench-orchestrator/tests/test_executor.py— 5/5 pass, including 2 newgh_json_v3tests.cargo +nightly fmt --all— no diff.cargo clippy --all-targets --all-features -p vortex-bench— clean.cargo clippy --all-targets -p compress-bench -p datafusion-bench -p lance-bench -p random-access-bench -p vector-search-bench— clean.duckdb-benchskipped (transitively triggers a pre-existingcognitive_complexitylint invortex-duckdb/src/convert/expr.rs:47, present ondevelopand unrelated to these changes).yamllint --strict -c .yamllint.yamlon the three changed/new workflow files — clean../scripts/public-api.sh— N/A. All touched Rust crates havepublish = false.V3_INGEST_URLset.Generated by Claude Code