chore(B4): defer plan; flat-tuple+PC dispatch did not beat list-cons#233
chore(B4): defer plan; flat-tuple+PC dispatch did not beat list-cons#233davydog187 wants to merge 1 commit into
Conversation
Implemented the full B4 spec on a throwaway branch: - Lua.Compiler.Linearize lifts nested bodies into flat bytecode - Prototype.instructions becomes a tuple with a labels map - do_execute/8 rewritten to PC-dispatch via case-on-elem - CPS continuation stack and find_label/find_loop_exit removed All 1705 tests + 29 lua53 suite tests passed. Closed unmerged because: - fib(30) chunk: ~850ms (main) -> ~875ms (B4), +3% regression - do_execute self-time: 50.83% (main) -> 50.64% (B4), unchanged - Table workloads: mixed +-5%, within deviation bands The plan's risks section anticipated this outcome explicitly. On the BEAM, list-cons head-match destructures head+tail in one op, while case-on-elem is two ops (fetch + discriminate). The hoped-for jump- table optimization did not produce a net win, and threading instrs through every tail call added register pressure. Plan file documents the full implementation findings and the conditions under which the work could be reopened. The linearizer design will be reused by B5 (compile to Erlang) as a compile-time preparation step, without touching the runtime executor.
|
Closing as superseded by #232. #232 merged first with a cleaner approach: an isolated dispatch microbenchmark (10k-instruction synthetic stream, three shape variants) that falsified the hypothesis pre-implementation. The plan file on My branch implemented the full executor rewrite end-to-end (all 1705 tests + 29 lua53 suite tests passed) and measured the regression on real workloads (+3% fib(30)). That's additional confirming evidence — same conclusion, different angle — but it doesn't change the decision, and the plan is already in the right state on main. The implementation findings live in this conversation; the |
Defer B4: flat-tuple + PC dispatch did not beat list-cons head-match
Updates
.agents/plans/B4-flat-instruction-stream.mdfromready→deferredwith the measurement data and design notes from a complete-but-not-shipped implementation.Why we tried
The plan's hypothesis: replace the current
[head | rest]list-cons dispatch inLua.VM.Executor.do_execute/8with acaseon:erlang.element(pc + 1, instrs), with all nested bodies (:testthen/else, while/repeat/for loop bodies) lifted into a single flat instruction tuple by a compile-time pass. Stretch target: 20% reduction in fib(25) median. Floor: no workload regresses by more than 2%.What I built
Full end-to-end on a throwaway branch:
Lua.Compiler.Linearize— lifts every nested-body opcode into the flat top-level stream, with:test_pc,:goto_pc,:while_test_pc,:numeric_for_step_pc, etc. as explicit jump opcodes. Labels and:breakresolve to PCs at compile time, sofind_label/2andfind_loop_exit/1become dead code.Lua.Compiler.Prototype— gained alabelsfield and changedinstructionsfromlisttotuple().Lua.VM.Executor.do_execute/8— rewritten from 64 list-consdefpclauses into a single function whose body is onecasewith one arm per opcode. The continuation stack (cont) was removed entirely; CPS frames for loops became:numeric_for_step_pc/:generic_for_step_pcopcodes emitted by the linearizer.:end_of_functionwas added so the linearizer could safely append a fall-off-end terminator (which yields zero results, distinct from explicitreturnyielding a single nil).All 1705 tests + 29 lua53 suite tests passed under the rewrite. The correctness work is sound.
Why we closed it
fib(30) regressed 3% — past the plan's 2% floor.
Profile confirms the structural change didn't matter:
do_execute/8self-time was 50.64% under B4 vs 50.83% on main. Essentially unchanged.The plan's risks section anticipated this:
On the BEAM concretely:
[head | rest]pattern-match destructures the list head + tail in a single op.case :erlang.element(pc + 1, instrs) dois two ops (element fetch + case discrimination). Plus threadinginstrsthrough every recursive call adds register pressure. The hoped-for jump-table optimization on thecasedid not produce a net win against the optimized list-cons path.What's preserved
The implementation isn't entirely throwaway. If/when B5 (compile prototypes to Erlang functions) starts, the linearizer design can be reintroduced only at compile time — feeding the B5 codegen with flat bytecode, leaving the runtime executor on its proven list-cons path.
Changes
Only the plan file changes; no library code touched.
Verification