perf(vm): dispatcher coverage for closures, varargs, multi-return, loops, self, concat#277
Merged
Conversation
…ops, self, concat Closes #272 Extends the bytecode dispatcher (`Lua.VM.Dispatcher`) and encoder (`Lua.Compiler.Bytecode`) to cover the remaining opcode families needed for closures-, OOP-, and string-heavy workloads. The encoder's catch-all `:fallback` now only matches genuinely out-of-scope opcodes (`:goto`/`:label`, `:set_global`, the bitwise family). The closures, oop, and string_ops benchmarks compile end-to-end with no fallback in any sub-prototype. New opcodes (tags 37-51, contiguous after B5b-v2): - `:closure` + `:set_upvalue` + `:get_open_upvalue` + `:set_open_upvalue` - `:vararg` + `:return_proto_varargs` + `:return_collect` + `:return_multi` - `:call_multi` (handles every shape of `:call` outside the existing fast-path `:call_one`/`:call_zero`, including multi-return arg counts and multi-return result counts) - `:self`, `:concatenate`, `:break` - `:while_loop`, `:repeat_loop`, `:generic_for` Frame tuple's `dest` slot gains a `{:multi, base, count}` variant for multi-return shapes; `:call_one` and `:call_zero` keep their integer / `:discard` fast paths. `return_one/3` and `return_multi/3` both handle the four shapes (`:discard`, integer, `:multi` with -1/-2/n>1). Loop CPS markers (`:cps_while_test`, `:cps_while_body`, `:cps_repeat_body`, `:cps_repeat_cond`, `:cps_generic_for`) sit above a `:loop_exit` anchor on `cont`; `find_loop_exit/1` ports the interpreter's marker-scanning shape for `:break`. The B5b-v2 `contains_break?` guard against break inside `:numeric_for` is removed. Executor exposes three new `dispatcher_*` bridges for the metamethod-coupled paths (`dispatcher_index_method_target`, `dispatcher_call_value`, `dispatcher_concat`) and a `dispatcher_call_function/5` that takes a `name_hint` so error wording for nil/non-callable values matches the interpreter's `:call` opcode (otherwise `(upvalue 'x')` etc. would drop on the dispatcher path). Perf gates: - Hard floor (no >10% regression): met across all workloads. - `dispatcher_vs_interpreter` fib(25): 1.15x faster (vs B5a-v2's 1.17x; the ~2% loss is `call_stack` push/pop added for error-context parity). - `closures.exs` A/B: 1.22x faster than interpreter (soft target was 1.5x; closure-construction and upvalue allocation dominate). - `oop.exs` A/B: 1.07x faster than interpreter. Tests: 2008 passing (1902 + 51 properties + 55 doctests). +106 new tests covering per-opcode goldens and end-to-end shape goldens for closures / OOP / multi-return.
…kstep Review feedback on #277: - `:concatenate` had two identical cond branches both bridging to `Executor.dispatcher_concat/4`. Collapsed to one fast path (binary+binary) plus a single fallback. - `assign_iter_results/4` was using `Enum.at/2` per var_reg, making each `:generic_for` step O(n²) on the result list. Now consumes results head-by-head; out-of-list slots get nil.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #272
Summary
Extends
Lua.VM.DispatcherandLua.Compiler.Bytecodeto cover the remaining opcodes needed for closures-, OOP-, and string-heavy workloads. After this PR, the encoder's catch-all:fallbackclause only matches genuinely out-of-scope opcodes (:goto/:label,:set_global, the bitwise family). Theclosures,oop, andstring_opsbenchmarks compile end-to-end with no fallback in any sub-prototype.What's covered
New opcodes (tags 37–51, contiguous after B5b-v2):
:closure,:set_upvalue,:get_open_upvalue,:set_open_upvalue:vararg,:return_proto_varargs,:return_collect,:return_multi,:call_multi(handles every shape of:calloutside the existing:call_one/:call_zerofast paths):while_loop,:repeat_loop,:generic_for,:break. The B5b-v2contains_break?guard against break inside:numeric_foris removed.:self,:concatenate:tail_callwas listed in the plan frontmatter but is verified dead code (codegen never emits it; onlyLua.Compiler.Instruction.tail_call/3exists as an unused helper). Tail-position calls compile to:callwithresult_count = -1, covered by:call_multi.Frame tuple changes
The frame's
destslot gains a{:multi, base, count}variant for multi-return shapes (-1 forward, -2 expand, n > 1 fixed multi).:call_one/:call_zerokeep their integer /:discardfast paths — the fib hot path stays on the integer-base branch.return_one/3andreturn_multi/3both pattern-match on all four shapes.Loop CPS
:while_loop/:repeat_loop/:generic_for/:numeric_foreach push twocontmarkers: a CPS marker (driving normal body/condition handoff) and a:loop_exitanchor below it.find_loop_exit/1ports the interpreter's marker-scanning shape for:break.Executor bridges
Three new metamethod-coupled bridges plus one error-attribution helper:
dispatcher_index_method_target/5—:selfresolution viaindex_value/6dispatcher_call_value/4— generic-for iterator step viacall_value/5dispatcher_concat/5—:concatenateslow path with__concatmetamethoddispatcher_call_function/5— takesname_hintso nil/non-callable error wording matches:callopcodecall_stackpush/pop now happens at every dispatcher call site (both compiled-closure and lua_closure paths) so error-context tests see the same stack the interpreter would.Verification
mix test: 2008 passing (1902 + 51 properties + 55 doctests), 24 skipped, 0 failuresmix test --only lua53: 12 passing, 17 skippedmix test test/lua/vm/leak_regression_test.exs: 3 passingclosures.exs,oop.exs,string_ops.exsall compile end-to-end (no sub-prototype fallback)Perf
Hard floor (no workload regresses by >10%): met.
fib(25)closures(100)oop(100)The ~2% loss on fib(25) vs B5a-v2 is the
call_stackpush/pop added for error-context parity. The soft target onclosures.exs(≥1.5x) is brushed at 1.22x — profile attribution points to closure-construction allocation and upvalue cells, which are post-B5 mutable-storage work.Test plan
mix testpasses (2008/2008)mix test --only lua53passesclosures.exs/oop.exs/string_ops.exscompile end-to-end with no fallbacksmix formatcleanmix compile --warnings-as-errorsclean