You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The string.format / table / dispatcher performance pass landing in v1.0.0-rc.1 was a net win across the benchmark suite, but the fibonacci workload (fib(30), pure recursion) regressed ~25% versus v1.0.0-rc.0:
rc.0: 1.64 ips (608.6 ms), ±0.69%
main/rc.1: 1.23 ips (814.2 ms), ±0.42%
Measured with full Benchee runs on a single machine, run sequentially. Luerl and PUC-Lua (identical code in both runs) were used as drift controls and moved within ±3%, so net of machine drift the regression is still ≈ −22%. This is real, not noise.
Cause
PR #283 ("configurable max call depth", commit 3a4a0c8) added per-call bookkeeping to the executor's call/return paths: a State.check_call_depth!/1 function call plus call_depth + 1 / - 1 state updates on every Lua function call. fibonacci is ~2.7M calls with almost no work per call, so it pays that overhead with nothing to amortize against.
Workloads that do real work per call are unaffected or faster (OOP +41%, closures +4%) because the dispatcher gains (#275, #277) dominate there. The regression is specific to call-dense, work-light code.
Options to investigate
Inline the depth check as a plain integer comparison rather than a function call into State.
Check depth every Nth frame instead of on every call.
Derive depth from the existing call_stack length instead of maintaining a separate counter.
Acceptance criteria
Recover most of the regression — target within ~5% of rc.0 on fibonacci — without weakening the call-depth limit.
Verify with mix lua.bench --workload fibonacci before/after on the same machine.
Summary
The
string.format/table/ dispatcher performance pass landing inv1.0.0-rc.1was a net win across the benchmark suite, but thefibonacciworkload (fib(30), pure recursion) regressed ~25% versusv1.0.0-rc.0:Measured with full Benchee runs on a single machine, run sequentially. Luerl and PUC-Lua (identical code in both runs) were used as drift controls and moved within ±3%, so net of machine drift the regression is still ≈ −22%. This is real, not noise.
Cause
PR #283 ("configurable max call depth", commit
3a4a0c8) added per-call bookkeeping to the executor's call/return paths: aState.check_call_depth!/1function call pluscall_depth + 1/- 1state updates on every Lua function call.fibonacciis ~2.7M calls with almost no work per call, so it pays that overhead with nothing to amortize against.Workloads that do real work per call are unaffected or faster (OOP +41%, closures +4%) because the dispatcher gains (#275, #277) dominate there. The regression is specific to call-dense, work-light code.
Options to investigate
State.call_stacklength instead of maintaining a separate counter.Acceptance criteria
fibonacci— without weakening the call-depth limit.mix lua.bench --workload fibonaccibefore/after on the same machine.References
benchmarks/results/2026-06-02-rc0-vs-main.mdv1.0.0-rc.1CHANGELOG; to be fixed before1.0.0final.