chore(B4): defer plan; PC+elem dispatch is 9-15% slower than list-cons#232
Merged
Conversation
…5% slower than list-cons A synthetic dispatch microbench compared the current list-cons shape against the proposed pc+elem case (and a multi-head variant) on identical work. The current shape wins by 9-15%, with identical memory. BEAM's [h | t] destructuring is one indirect load; elem(instrs, pc) is a bounds-checked load plus integer arithmetic, and the tagged-tuple jump table is identical in both, so the dispatch read is the only delta and cons wins. The structural target was correct: do_execute/8 is 51% of fib(22) self-time on main @ bc69a2e (the plan referenced 43.6% from a pre-PR-223 baseline; the proportional cost has grown). But the proposed shape makes it worse, not better. The plan's Risk #1 anticipated this: 'if the post-merge profile shows no improvement, B5 (Erlang functions) is the better lever.' That exit condition is met pre-merge. Plan: B4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
B4: Flat instruction stream + PC dispatch — deferred
Plan:
.agents/plans/B4-flat-instruction-stream.mdMirrors the B7 deferral pattern (#231): the plan's core hypothesis was
falsified by a pre-flight spike, so we ship the writeup as a deferral
rather than a ~2,400-line rewrite that won't pay off.
The spike
Before committing to the full executor + codegen + loop-opcode rewrite,
a synthetic microbench compared the two dispatch shapes on identical
work (10,000 mixed opcodes, same tagged-tuple shape, same register
state — only the dispatch read changes):
case(proposed)Memory: identical to three decimal places. Stable over multiple runs.
Why
The tagged-tuple jump table is the same in both shapes — BEAM compiles
both into a jump on the tag of the matched tuple. The only difference
is the dispatch read itself:
[h | t]is a single indirect load. The BEAM is heavily tuned forcons-list iteration; it is the native iteration idiom on the platform.
elem(instrs, pc)is a bounds-checked indirect load plus integerarithmetic.
Cons-list iteration wins by 9-15% on raw dispatch.
The structural target is still correct — the shape is wrong
fib(22) baseline profile (main @ bc69a2e):
```
Lua.VM.Executor.do_execute/8 802388 50.98% self
:erlang.setelement/3 601788 25.49%
Lua.VM.Executor.do_frame_return/6 57313 5.96%
Lua.VM.Executor.copy_args_to_regs/5 114626 4.94%
Lua.VM.Numeric.to_signed_int64/1 85968 3.35%
```
do_execute/8is 51% of fib self-time (plan referenced 43.6% froma pre-PR-223 baseline; the proportional cost has grown). Attacking
dispatch is the right target — the proposed shape just doesn't help.
The plan's Risk #1 anticipated this exact outcome:
That exit condition is met pre-merge.
Recommendation
Defer B4. The 51%
do_execute/8self-time should be attacked by B5(compile instruction streams to native Erlang functions), which
collapses dispatch entirely into the BEAM's function-call mechanism —
the BEAM-tuned operation that beat every data-shape alternative we
tried.
A future plan could revisit B4 as a structural prerequisite for B5
if codegen wants integer entry points in the IR. Success criteria
would change: the bar would be "B5 compiles cleanly from the new
layout," not "dispatch gets faster" (which is now disproven).
Changes
```
.agents/plans/B4-flat-instruction-stream.md | 92 ++++++++++++++++++++-
1 file changed, 90 insertions(+), 2 deletions(-)
```
Verification
```
mix compile --warnings-as-errors # passes (no code changes)
```
Plan-file-only change; no tests affected.
Out of scope (intentional)
before sunk cost begins.
pick it up.