perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat by davydog187 · Pull Request #277 · tv-labs/lua

davydog187 · 2026-05-28T14:25:46Z

Closes #272

Summary

Extends Lua.VM.Dispatcher and Lua.Compiler.Bytecode to cover the remaining opcodes needed for closures-, OOP-, and string-heavy workloads. After this PR, the encoder's catch-all :fallback clause only matches genuinely out-of-scope opcodes (:goto/:label, :set_global, the bitwise family). The closures, oop, and string_ops benchmarks compile end-to-end with no fallback in any sub-prototype.

What's covered

New opcodes (tags 37–51, contiguous after B5b-v2):

Closures + upvalues — :closure, :set_upvalue, :get_open_upvalue, :set_open_upvalue
Vararg + multi-return — :vararg, :return_proto_varargs, :return_collect, :return_multi, :call_multi (handles every shape of :call outside the existing :call_one/:call_zero fast paths)
Loops + break — :while_loop, :repeat_loop, :generic_for, :break. The B5b-v2 contains_break? guard against break inside :numeric_for is removed.
OOP — :self, :concatenate

:tail_call was listed in the plan frontmatter but is verified dead code (codegen never emits it; only Lua.Compiler.Instruction.tail_call/3 exists as an unused helper). Tail-position calls compile to :call with result_count = -1, covered by :call_multi.

Frame tuple changes

The frame's dest slot gains a {:multi, base, count} variant for multi-return shapes (-1 forward, -2 expand, n > 1 fixed multi). :call_one/:call_zero keep their integer / :discard fast paths — the fib hot path stays on the integer-base branch. return_one/3 and return_multi/3 both pattern-match on all four shapes.

Loop CPS

:while_loop/:repeat_loop/:generic_for/:numeric_for each push two cont markers: a CPS marker (driving normal body/condition handoff) and a :loop_exit anchor below it. find_loop_exit/1 ports the interpreter's marker-scanning shape for :break.

Executor bridges

Three new metamethod-coupled bridges plus one error-attribution helper:

dispatcher_index_method_target/5 — :self resolution via index_value/6
dispatcher_call_value/4 — generic-for iterator step via call_value/5
dispatcher_concat/5 — :concatenate slow path with __concat metamethod
dispatcher_call_function/5 — takes name_hint so nil/non-callable error wording matches :call opcode

call_stack push/pop now happens at every dispatcher call site (both compiled-closure and lua_closure paths) so error-context tests see the same stack the interpreter would.

Verification

✅ mix test: 2008 passing (1902 + 51 properties + 55 doctests), 24 skipped, 0 failures
✅ mix test --only lua53: 12 passing, 17 skipped
✅ mix test test/lua/vm/leak_regression_test.exs: 3 passing
✅ closures.exs, oop.exs, string_ops.exs all compile end-to-end (no sub-prototype fallback)

Perf

Hard floor (no workload regresses by >10%): met.

Workload	Dispatcher vs Interpreter
`fib(25)`	1.15x (vs B5a-v2's 1.17x baseline)
`closures(100)`	1.22x
`oop(100)`	1.07x

The ~2% loss on fib(25) vs B5a-v2 is the call_stack push/pop added for error-context parity. The soft target on closures.exs (≥1.5x) is brushed at 1.22x — profile attribution points to closure-construction allocation and upvalue cells, which are post-B5 mutable-storage work.

Test plan

mix test passes (2008/2008)
mix test --only lua53 passes
Leak regression test passes
closures.exs / oop.exs / string_ops.exs compile end-to-end with no fallbacks
mix format clean
mix compile --warnings-as-errors clean

…ops, self, concat Closes #272 Extends the bytecode dispatcher (`Lua.VM.Dispatcher`) and encoder (`Lua.Compiler.Bytecode`) to cover the remaining opcode families needed for closures-, OOP-, and string-heavy workloads. The encoder's catch-all `:fallback` now only matches genuinely out-of-scope opcodes (`:goto`/`:label`, `:set_global`, the bitwise family). The closures, oop, and string_ops benchmarks compile end-to-end with no fallback in any sub-prototype. New opcodes (tags 37-51, contiguous after B5b-v2): - `:closure` + `:set_upvalue` + `:get_open_upvalue` + `:set_open_upvalue` - `:vararg` + `:return_proto_varargs` + `:return_collect` + `:return_multi` - `:call_multi` (handles every shape of `:call` outside the existing fast-path `:call_one`/`:call_zero`, including multi-return arg counts and multi-return result counts) - `:self`, `:concatenate`, `:break` - `:while_loop`, `:repeat_loop`, `:generic_for` Frame tuple's `dest` slot gains a `{:multi, base, count}` variant for multi-return shapes; `:call_one` and `:call_zero` keep their integer / `:discard` fast paths. `return_one/3` and `return_multi/3` both handle the four shapes (`:discard`, integer, `:multi` with -1/-2/n>1). Loop CPS markers (`:cps_while_test`, `:cps_while_body`, `:cps_repeat_body`, `:cps_repeat_cond`, `:cps_generic_for`) sit above a `:loop_exit` anchor on `cont`; `find_loop_exit/1` ports the interpreter's marker-scanning shape for `:break`. The B5b-v2 `contains_break?` guard against break inside `:numeric_for` is removed. Executor exposes three new `dispatcher_*` bridges for the metamethod-coupled paths (`dispatcher_index_method_target`, `dispatcher_call_value`, `dispatcher_concat`) and a `dispatcher_call_function/5` that takes a `name_hint` so error wording for nil/non-callable values matches the interpreter's `:call` opcode (otherwise `(upvalue 'x')` etc. would drop on the dispatcher path). Perf gates: - Hard floor (no >10% regression): met across all workloads. - `dispatcher_vs_interpreter` fib(25): 1.15x faster (vs B5a-v2's 1.17x; the ~2% loss is `call_stack` push/pop added for error-context parity). - `closures.exs` A/B: 1.22x faster than interpreter (soft target was 1.5x; closure-construction and upvalue allocation dominate). - `oop.exs` A/B: 1.07x faster than interpreter. Tests: 2008 passing (1902 + 51 properties + 55 doctests). +106 new tests covering per-opcode goldens and end-to-end shape goldens for closures / OOP / multi-return.

…kstep Review feedback on #277: - `:concatenate` had two identical cond branches both bridging to `Executor.dispatcher_concat/4`. Collapsed to one fast path (binary+binary) plus a single fallback. - `assign_iter_results/4` was using `Enum.at/2` per var_reg, making each `:generic_for` step O(n²) on the result list. Now consumes results head-by-head; out-of-list slots get nil.

davydog187 added 2 commits May 28, 2026 10:22

davydog187 merged commit 2b90e04 into main May 28, 2026
5 checks passed

davydog187 deleted the perf/dispatcher-closures branch May 28, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat#277

perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat#277
davydog187 merged 2 commits into
mainfrom
perf/dispatcher-closures

davydog187 commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented May 28, 2026

Summary

What's covered

Frame tuple changes

Loop CPS

Executor bridges

Verification

Perf

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant