chore(B4): defer plan; flat-tuple+PC dispatch did not beat list-cons by davydog187 · Pull Request #233 · tv-labs/lua

davydog187 · 2026-05-22T09:55:41Z

Defer B4: flat-tuple + PC dispatch did not beat list-cons head-match

Updates .agents/plans/B4-flat-instruction-stream.md from ready → deferred with the measurement data and design notes from a complete-but-not-shipped implementation.

Why we tried

The plan's hypothesis: replace the current [head | rest] list-cons dispatch in Lua.VM.Executor.do_execute/8 with a case on :erlang.element(pc + 1, instrs), with all nested bodies (:test then/else, while/repeat/for loop bodies) lifted into a single flat instruction tuple by a compile-time pass. Stretch target: 20% reduction in fib(25) median. Floor: no workload regresses by more than 2%.

What I built

Full end-to-end on a throwaway branch:

Lua.Compiler.Linearize — lifts every nested-body opcode into the flat top-level stream, with :test_pc, :goto_pc, :while_test_pc, :numeric_for_step_pc, etc. as explicit jump opcodes. Labels and :break resolve to PCs at compile time, so find_label/2 and find_loop_exit/1 become dead code.
Lua.Compiler.Prototype — gained a labels field and changed instructions from list to tuple().
Lua.VM.Executor.do_execute/8 — rewritten from 64 list-cons defp clauses into a single function whose body is one case with one arm per opcode. The continuation stack (cont) was removed entirely; CPS frames for loops became :numeric_for_step_pc / :generic_for_step_pc opcodes emitted by the linearizer.
A new synth opcode :end_of_function was added so the linearizer could safely append a fall-off-end terminator (which yields zero results, distinct from explicit return yielding a single nil).

All 1705 tests + 29 lua53 suite tests passed under the rewrite. The correctness work is sound.

Why we closed it

workload	main	B4	delta
fib(30) chunk	~850 ms	~875 ms	+3% ⚠️
OOP n=50	137 µs	137 µs	flat
Table Build n=100	17.33 µs	16.44 µs	-5%
Table Sort n=100	34.83 µs	36.24 µs	+4%
Table Iterate	24.17 µs	23.01 µs	-5%
Table Map+Reduce	~50 µs	49.06 µs	-2%

fib(30) regressed 3% — past the plan's 2% floor.

Profile confirms the structural change didn't matter: do_execute/8 self-time was 50.64% under B4 vs 50.83% on main. Essentially unchanged.

The plan's risks section anticipated this:

If the post-merge profile shows no improvement (or worse, a regression), the structural change isn't paying for itself and B5 (Erlang functions) is the better lever.

On the BEAM concretely: [head | rest] pattern-match destructures the list head + tail in a single op. case :erlang.element(pc + 1, instrs) do is two ops (element fetch + case discrimination). Plus threading instrs through every recursive call adds register pressure. The hoped-for jump-table optimization on the case did not produce a net win against the optimized list-cons path.

What's preserved

The implementation isn't entirely throwaway. If/when B5 (compile prototypes to Erlang functions) starts, the linearizer design can be reintroduced only at compile time — feeding the B5 codegen with flat bytecode, leaving the runtime executor on its proven list-cons path.

Changes

 .agents/plans/B4-flat-instruction-stream.md | 94 +++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 2 deletions(-)

Only the plan file changes; no library code touched.

Verification

mix format
mix compile --warnings-as-errors
mix test    # 1705 tests pass (unchanged)

Implemented the full B4 spec on a throwaway branch: - Lua.Compiler.Linearize lifts nested bodies into flat bytecode - Prototype.instructions becomes a tuple with a labels map - do_execute/8 rewritten to PC-dispatch via case-on-elem - CPS continuation stack and find_label/find_loop_exit removed All 1705 tests + 29 lua53 suite tests passed. Closed unmerged because: - fib(30) chunk: ~850ms (main) -> ~875ms (B4), +3% regression - do_execute self-time: 50.83% (main) -> 50.64% (B4), unchanged - Table workloads: mixed +-5%, within deviation bands The plan's risks section anticipated this outcome explicitly. On the BEAM, list-cons head-match destructures head+tail in one op, while case-on-elem is two ops (fetch + discriminate). The hoped-for jump- table optimization did not produce a net win, and threading instrs through every tail call added register pressure. Plan file documents the full implementation findings and the conditions under which the work could be reopened. The linearizer design will be reused by B5 (compile to Erlang) as a compile-time preparation step, without touching the runtime executor.

davydog187 · 2026-05-22T10:59:03Z

Closing as superseded by #232.

#232 merged first with a cleaner approach: an isolated dispatch microbenchmark (10k-instruction synthetic stream, three shape variants) that falsified the hypothesis pre-implementation. The plan file on main already reflects the deferral with that evidence.

My branch implemented the full executor rewrite end-to-end (all 1705 tests + 29 lua53 suite tests passed) and measured the regression on real workloads (+3% fib(30)). That's additional confirming evidence — same conclusion, different angle — but it doesn't change the decision, and the plan is already in the right state on main.

The implementation findings live in this conversation; the Lua.Compiler.Linearize design and the :end_of_function sentinel insight will be relevant if B5 (compile prototypes to Erlang) starts and needs a flat-bytecode compile-time preparation step.

davydog187 mentioned this pull request May 22, 2026

docs(roadmap): consolidate B-series findings (B4, B6, B7, B8 + harness) #234

Merged

davydog187 closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(B4): defer plan; flat-tuple+PC dispatch did not beat list-cons#233

chore(B4): defer plan; flat-tuple+PC dispatch did not beat list-cons#233
davydog187 wants to merge 1 commit into
mainfrom
chore/defer-b4

davydog187 commented May 22, 2026

Uh oh!

davydog187 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented May 22, 2026

Defer B4: flat-tuple + PC dispatch did not beat list-cons head-match

Why we tried

What I built

Why we closed it

What's preserved

Changes

Verification

Uh oh!

davydog187 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant