perf(vm): dispatcher coverage for table opcodes and :numeric_for#275
Merged
Conversation
Extend the v2 dispatcher and bytecode encoder to cover the table opcode family (:new_table, :get_table, :set_table, :set_field, :set_list non-multi-return form, :length), :numeric_for with a new :cps_for continuation marker, and :call with result_count == 0 (the statement-call form like table.sort(t)). After this PR all four table_ops benchmarks compile end-to-end and the run_closures orchestrator joins them; only multi-return shapes (return f(...), :return_vararg) still keep workloads on the interpreter. Mini-bench: fib(22) 1.54x interpreter (up from 1.17x in B5a-v2), run_table_build(500) 1.18x, run_table_sum(1000) 1.13x, run_table_map_reduce(500) 1.02x. The soft target (>=1.5x on table_sum) is not met because Table.put/3 allocation churn and setelement/3 register writes dominate the table workloads — both flagged as out of scope by the parent plan. The hard floor (no regression) is met across the board. The dispatcher delegates the slow paths to new Executor.dispatcher_* bridges (dispatcher_get_table, dispatcher_set_table, dispatcher_set_field, dispatcher_length, dispatcher_coerce_numeric_for_controls, dispatcher_close_open_upvalues_at_or_above), each wrapping the existing defp helpers so metamethod fidelity matches the interpreter for free. Plan: B5b-v2 Closes #271
- Tighten `:set_list` encoder guard to `count > 0`. The literal-count
zero shape is the interpreter's multi-return sentinel (consumes
`state.multi_return_count` trailing values); codegen never emits
it from a literal constructor today, but encoding it as a no-op
would silently diverge if that ever changed. Two new tests pin
the fallback contract for both `count == 0` and `{:multi, _}`.
- Fix misleading comment about step == 0 in `:numeric_for` body
completion. Neither the dispatcher nor the interpreter implements
PUC-Lua's runtime check; both infinite-loop on step == 0. Parity
is preserved; the comment now describes it accurately.
- Alias `Lua.VM.Table` at the top of `Lua.VM.Dispatcher` and use
the short `Table.put/3` form in `set_list_into_table/6`,
matching the other sibling-module aliases.
- Tighten `@spec` on `dispatcher_set_table` / `dispatcher_set_field`
to `State.t() | no_return()` (the non-tref clauses always raise).
- Document the unused `_proto` parameter on `dispatcher_length` as
forward-compat for B5d-v2 error attribution. The other table
bridges already thread `proto.source` through their slow paths;
`__len` does not yet, but will when error positions land for
compiled prototypes.
Contributor
Author
|
Addressed the review feedback in 75e616d. Summary:
Verification: |
Contributor
Author
|
Benchmarks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dispatcher table opcodes — make table-heavy workloads bypass the interpreter
Plan:
.agents/plans/B5b-v2-dispatcher-tables.mdCloses #271
Goal
Extend
Lua.VM.DispatcherandLua.Compiler.Bytecodeto lower thetable opcode family plus
:numeric_for. After this PR all fourtable_opsbenchmarks compile end-to-end and stay out of theinterpreter fallback path, plus the
closuresorchestrator(
run_closures).Success criteria
Lua.Compiler.Bytecodelearns encoders for:new_table,:get_table,:set_table,:set_field,:set_list(basicnon-multi-return form),
:length, and:numeric_for(bodyencoded recursively; falls back if body contains an uncovered
opcode or
:break).Lua.VM.Dispatchergains onecasebranch per new opcodeplus the
:cps_forcontinuation marker handled byfinish_body/6.:callwithresult_count == 0(statement-call form) iscovered via a new
@op_call_zerotag — required sorun_table_sortcan compile (table.sort(t)is a statementcall).
Lua.VM.Executorexposes six newdispatcher_*publicbridges (
dispatcher_get_table,dispatcher_set_table,dispatcher_set_field,dispatcher_length,dispatcher_coerce_numeric_for_controls,dispatcher_close_open_upvalues_at_or_above) — each wraps anexisting
defphelper so metamethod fidelity matches theinterpreter for free.
mix compile --warnings-as-errorspasses.mix test: 1883 → 1902 tests, 0 failures, 25 skipped.mix test --only lua53: 29 tests, 0 failures, 18 skipped.test/lua/vm/leak_regression_test.exs: 3 tests, 0 failures.table_opsbenchmark functions compile to bytecodeend-to-end (verified by a new dispatcher-test setup that asserts
{:compiled_closure, _, _}for each global after install).run_closuresorchestrator compiles (make_counterstillfalls back because of
:closure— that's B5c-v2).dispatcher_vs_interpreter-style A/B harness.run_table_sum(1000)≥1.5x interpreter — landedat 1.13x. Profile shows
Table.put/3allocation churnand
setelement/3register writes dominate; both areexplicitly out of scope per the parent plan ("table-storage
churn is the post-B5b ceiling").
Performance
Mini-bench (
mix run -e, 100–200 iter median, warmed):run_table_build(500)run_table_sum(500)run_table_sum(1000)run_table_map_reduce(500)fib(22)fib lands at 1.54x — well above B5a-v2's 1.17x median for fib(25),
a sign the table-opcode additions did not push the arithmetic
path off any inlining cliff.
Discoveries
:callwithresult_count == 0was needed.run_table_sortcalls
table.sort(t)at statement position, which lowers to{:call, _, _, 0, _}. Added@op_call_zero(reusing slot 25,freed by the removed
@op_test_true). The dispatcher branchshares the same shape as
:call_one, with a:discardsentinelin the frame's
baseslot signalling "throw the return valueaway."
string_opsorchestrators do not fully compile. Theissue described "orchestrators in
closuresandstring_ops"as compiling.
run_closuresdoes. But both string_ops orchestratorsend with
return table.concat(...)/return string.format(...)— multi-return shapes (
:callwithresult_count = -1plus:return_vararg) which are explicitly out of scope per theparent plan. Documented as a B5c-v2 follow-up.
:breakinside:numeric_forforces fallback. Theinterpreter's
:breakwalksfind_loop_exit/1against thecontinuation stack. Reproducing that in the dispatcher requires
mixing post-test markers with
{:loop_exit, _}markers — thesame machinery generic-for / while-loop / break all want, which
is the domain of B5c-v2. The encoder rejects
:numeric_forbodies containing
:breakupfront (walking recursively throughnested
:testbranches) and the whole enclosing prototype fallsback.
The
:cps_forcontinuation marker integrates cleanly. B5a'scontstack carried only{code, pc}resume points. Adding{:cps_for, base, loop_var, body_bc, code, pc + 1}markersexpands
finish_body/6from two clauses to three. The markerstays on the stack across iterations (re-pushed when the body
restarts), so nested numeric-fors compose naturally on the same
stack.
Soft perf target missed; hard floor met.
setelement/3+Table.put/3allocation dominate the table workloads. Theparent plan explicitly flagged this and ruled mutable storage
out of scope; the next plan for closing the table-workload gap
is mutable register/table storage, not more dispatch work.
Changes
Verification
Out of scope (intentional)
:closureopcode and varargs → B5c-v2.:call(result_count = -1) and:return_vararg→ still B5c-v2. This is why
string_opsorchestratorsdon't fully compile.
:while_loop,:repeat_loop,:generic_for→ defer to a futureplan (not blockers for
table_ops).:breakinside:numeric_for→ fallback for now. Theloop-exit continuation machinery lands with the rest of B5c-v2.
:set_listwith{:multi, _}→ fallback.gap matters.
pass
line: 0.