build(bench): add quick mode + multi-n inputs to benchmark harness#230
Merged
Conversation
Adds benchmarks/helpers.exs exposing a shared Bench module with two
pre-canned Benchee profiles selected by the LUA_BENCH_MODE env var:
* default ("quick") - 1 s warmup, 3 s measurement, memory_time off.
Each Benchee.run takes ~4 s; the full mix lua.bench suite is ~80 s
instead of ~17 min. For "did my change move the needle?" loops.
* "full" - 2 s warmup, 10 s measurement, memory_time on, plus a
sweep of input sizes (n=10, 100, 1000) for the table workloads.
For any numbers we publish.
Each script Code.require_file/2s helpers.exs and calls Bench.opts() in
place of an inline keyword list. table_ops.exs is restructured to use
Benchee inputs: from Bench.table_inputs/0 so all sizes share warmup
and measurement state per workload.
Quick mode trades measurement precision (higher deviation bands) for
iteration speed. Full mode is the source of truth for published
numbers.
The mix lua.bench task forwards the parent process env, so
LUA_BENCH_MODE set in the user's shell propagates to the child
mix run automatically.
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Quick mode + multi-n inputs for the benchmark harness
Drops
mix lua.benchruntime from ~17 minutes to ~80 seconds foriterative use, while keeping the long-form numbers available behind an
env var. Independent of the B-series perf plans; this is harness work
that we'll lean on for every subsequent perf change.
Motivation
While iterating on B7 (table array+hash split), measurement noise made
it hard to tell whether a code change had moved the headline workload.
The 10 s benchee window is the right tradeoff for publishing a number
(low deviation, memory tracking, multiple input sizes) but it's the
wrong tradeoff for "did my last commit help?". Each
mix lua.benchcycle was ~17 minutes, which is slow enough that we'd run the bench
once and trust the result — which turned out to be a measurement
mistake exactly once.
A separate but related question: n=500 is one specific point on the
table workload curve. At that size, exponential-growth tuples are
already at 1024 capacity and most array work is amortized. n=10
exercises pre-doubling. n=100 hits the middle. Useful to see the
curve, not just one point.
Design
benchmarks/helpers.exsexposes a sharedBenchmodule with one knob —the
LUA_BENCH_MODEenv var:"quick") — 1 s warmup, 3 s measurement, memory_time off.~4 sperBenchee.run. Fullmix lua.bench≈ 80 s."full"— 2 s warmup, 10 s measurement, memory_time on, plus asweep of
n ∈ {10, 100, 1000}for the table workloads.~1-2 minperBenchee.run. Fullmix lua.bench≈ 15+ min.Each script does
Code.require_file("helpers.exs", __DIR__)and callsBench.opts()in place of an inline keyword list.table_ops.exsisrestructured to use Benchee's
inputs:fromBench.table_inputs/0soall sizes share warmup state per workload.
The
mix lua.benchtask forwards the parent process env, soLUA_BENCH_MODEset in the user's shell propagates to the childmix runautomatically.Usage
Tradeoffs
Quick mode trades measurement precision for iteration speed. Deviation
bands grow from ±0.5% (10 s window) to ±15-25% (3 s window) on most
workloads. That's fine for "did my change move the needle by 10%+", but
it's not fine for headline numbers. The
LUA_BENCH_MODE=fullpathexists exactly to bridge that — any number we publish should come from
a full run, ideally with the machine in a known-cold state.
Changes
Verification
Sample output
Quick mode
table_ops, on main:The deviation bands are wider than full mode (~±0.5%) but the
ips ordering is still clear and intermediate runs converge to the
full-mode answer at n=100 (luerl ≈ 18 µs, our chunk path ≈ 17 µs).
Out of scope (intentional)
in full mode is a starting point; if specific workloads need more
granular curves we can extend
Bench.table_inputs/0.a separate plan if we want it.