Skip to content

perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat#277

Merged
davydog187 merged 2 commits into
mainfrom
perf/dispatcher-closures
May 28, 2026
Merged

perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat#277
davydog187 merged 2 commits into
mainfrom
perf/dispatcher-closures

Conversation

@davydog187
Copy link
Copy Markdown
Contributor

Closes #272

Summary

Extends Lua.VM.Dispatcher and Lua.Compiler.Bytecode to cover the remaining opcodes needed for closures-, OOP-, and string-heavy workloads. After this PR, the encoder's catch-all :fallback clause only matches genuinely out-of-scope opcodes (:goto/:label, :set_global, the bitwise family). The closures, oop, and string_ops benchmarks compile end-to-end with no fallback in any sub-prototype.

What's covered

New opcodes (tags 37–51, contiguous after B5b-v2):

  • Closures + upvalues:closure, :set_upvalue, :get_open_upvalue, :set_open_upvalue
  • Vararg + multi-return:vararg, :return_proto_varargs, :return_collect, :return_multi, :call_multi (handles every shape of :call outside the existing :call_one/:call_zero fast paths)
  • Loops + break:while_loop, :repeat_loop, :generic_for, :break. The B5b-v2 contains_break? guard against break inside :numeric_for is removed.
  • OOP:self, :concatenate

:tail_call was listed in the plan frontmatter but is verified dead code (codegen never emits it; only Lua.Compiler.Instruction.tail_call/3 exists as an unused helper). Tail-position calls compile to :call with result_count = -1, covered by :call_multi.

Frame tuple changes

The frame's dest slot gains a {:multi, base, count} variant for multi-return shapes (-1 forward, -2 expand, n > 1 fixed multi). :call_one/:call_zero keep their integer / :discard fast paths — the fib hot path stays on the integer-base branch. return_one/3 and return_multi/3 both pattern-match on all four shapes.

Loop CPS

:while_loop/:repeat_loop/:generic_for/:numeric_for each push two cont markers: a CPS marker (driving normal body/condition handoff) and a :loop_exit anchor below it. find_loop_exit/1 ports the interpreter's marker-scanning shape for :break.

Executor bridges

Three new metamethod-coupled bridges plus one error-attribution helper:

  • dispatcher_index_method_target/5:self resolution via index_value/6
  • dispatcher_call_value/4 — generic-for iterator step via call_value/5
  • dispatcher_concat/5:concatenate slow path with __concat metamethod
  • dispatcher_call_function/5 — takes name_hint so nil/non-callable error wording matches :call opcode

call_stack push/pop now happens at every dispatcher call site (both compiled-closure and lua_closure paths) so error-context tests see the same stack the interpreter would.

Verification

  • mix test: 2008 passing (1902 + 51 properties + 55 doctests), 24 skipped, 0 failures
  • mix test --only lua53: 12 passing, 17 skipped
  • mix test test/lua/vm/leak_regression_test.exs: 3 passing
  • closures.exs, oop.exs, string_ops.exs all compile end-to-end (no sub-prototype fallback)

Perf

Hard floor (no workload regresses by >10%): met.

Workload Dispatcher vs Interpreter
fib(25) 1.15x (vs B5a-v2's 1.17x baseline)
closures(100) 1.22x
oop(100) 1.07x

The ~2% loss on fib(25) vs B5a-v2 is the call_stack push/pop added for error-context parity. The soft target on closures.exs (≥1.5x) is brushed at 1.22x — profile attribution points to closure-construction allocation and upvalue cells, which are post-B5 mutable-storage work.

Test plan

  • mix test passes (2008/2008)
  • mix test --only lua53 passes
  • Leak regression test passes
  • closures.exs / oop.exs / string_ops.exs compile end-to-end with no fallbacks
  • mix format clean
  • mix compile --warnings-as-errors clean

…ops, self, concat

Closes #272

Extends the bytecode dispatcher (`Lua.VM.Dispatcher`) and encoder
(`Lua.Compiler.Bytecode`) to cover the remaining opcode families
needed for closures-, OOP-, and string-heavy workloads. The
encoder's catch-all `:fallback` now only matches genuinely
out-of-scope opcodes (`:goto`/`:label`, `:set_global`, the bitwise
family). The closures, oop, and string_ops benchmarks compile
end-to-end with no fallback in any sub-prototype.

New opcodes (tags 37-51, contiguous after B5b-v2):

- `:closure` + `:set_upvalue` + `:get_open_upvalue` + `:set_open_upvalue`
- `:vararg` + `:return_proto_varargs` + `:return_collect` + `:return_multi`
- `:call_multi` (handles every shape of `:call` outside the existing
  fast-path `:call_one`/`:call_zero`, including multi-return arg
  counts and multi-return result counts)
- `:self`, `:concatenate`, `:break`
- `:while_loop`, `:repeat_loop`, `:generic_for`

Frame tuple's `dest` slot gains a `{:multi, base, count}` variant
for multi-return shapes; `:call_one` and `:call_zero` keep their
integer / `:discard` fast paths. `return_one/3` and `return_multi/3`
both handle the four shapes (`:discard`, integer, `:multi` with
-1/-2/n>1).

Loop CPS markers (`:cps_while_test`, `:cps_while_body`,
`:cps_repeat_body`, `:cps_repeat_cond`, `:cps_generic_for`) sit
above a `:loop_exit` anchor on `cont`; `find_loop_exit/1` ports
the interpreter's marker-scanning shape for `:break`. The B5b-v2
`contains_break?` guard against break inside `:numeric_for` is
removed.

Executor exposes three new `dispatcher_*` bridges for the
metamethod-coupled paths (`dispatcher_index_method_target`,
`dispatcher_call_value`, `dispatcher_concat`) and a
`dispatcher_call_function/5` that takes a `name_hint` so error
wording for nil/non-callable values matches the interpreter's
`:call` opcode (otherwise `(upvalue 'x')` etc. would drop on the
dispatcher path).

Perf gates:

- Hard floor (no >10% regression): met across all workloads.
- `dispatcher_vs_interpreter` fib(25): 1.15x faster (vs B5a-v2's
  1.17x; the ~2% loss is `call_stack` push/pop added for
  error-context parity).
- `closures.exs` A/B: 1.22x faster than interpreter (soft target
  was 1.5x; closure-construction and upvalue allocation dominate).
- `oop.exs` A/B: 1.07x faster than interpreter.

Tests: 2008 passing (1902 + 51 properties + 55 doctests).
+106 new tests covering per-opcode goldens and end-to-end
shape goldens for closures / OOP / multi-return.
…kstep

Review feedback on #277:

- `:concatenate` had two identical cond branches both bridging to
  `Executor.dispatcher_concat/4`. Collapsed to one fast path
  (binary+binary) plus a single fallback.
- `assign_iter_results/4` was using `Enum.at/2` per var_reg, making
  each `:generic_for` step O(n²) on the result list. Now consumes
  results head-by-head; out-of-list slots get nil.
@davydog187 davydog187 merged commit 2b90e04 into main May 28, 2026
5 checks passed
@davydog187 davydog187 deleted the perf/dispatcher-closures branch May 28, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dispatcher: cover :closure, :set_upvalue, and open-upvalue ops (B5c-v2)

1 participant