Skip to content

chore(B4): defer plan; PC+elem dispatch is 9-15% slower than list-cons#232

Merged
davydog187 merged 1 commit into
mainfrom
chore/defer-b4
May 22, 2026
Merged

chore(B4): defer plan; PC+elem dispatch is 9-15% slower than list-cons#232
davydog187 merged 1 commit into
mainfrom
chore/defer-b4

Conversation

@davydog187
Copy link
Copy Markdown
Contributor

B4: Flat instruction stream + PC dispatch — deferred

Plan: .agents/plans/B4-flat-instruction-stream.md

Mirrors the B7 deferral pattern (#231): the plan's core hypothesis was
falsified by a pre-flight spike, so we ship the writeup as a deferral
rather than a ~2,400-line rewrite that won't pay off.

The spike

Before committing to the full executor + codegen + loop-opcode rewrite,
a synthetic microbench compared the two dispatch shapes on identical
work (10,000 mixed opcodes, same tagged-tuple shape, same register
state — only the dispatch read changes):

Dispatch IPS Mean vs current
list-cons (current) 13.86 K 72.13 µs baseline
pc+elem case (proposed) 12.69 K 78.83 µs 1.09x slower
pc+elem multi-head 12.10 K 82.65 µs 1.15x slower

Memory: identical to three decimal places. Stable over multiple runs.

Why

The tagged-tuple jump table is the same in both shapes — BEAM compiles
both into a jump on the tag of the matched tuple. The only difference
is the dispatch read itself:

  • [h | t] is a single indirect load. The BEAM is heavily tuned for
    cons-list iteration; it is the native iteration idiom on the platform.
  • elem(instrs, pc) is a bounds-checked indirect load plus integer
    arithmetic.

Cons-list iteration wins by 9-15% on raw dispatch.

The structural target is still correct — the shape is wrong

fib(22) baseline profile (main @ bc69a2e):

```
Lua.VM.Executor.do_execute/8 802388 50.98% self
:erlang.setelement/3 601788 25.49%
Lua.VM.Executor.do_frame_return/6 57313 5.96%
Lua.VM.Executor.copy_args_to_regs/5 114626 4.94%
Lua.VM.Numeric.to_signed_int64/1 85968 3.35%
```

do_execute/8 is 51% of fib self-time (plan referenced 43.6% from
a pre-PR-223 baseline; the proportional cost has grown). Attacking
dispatch is the right target — the proposed shape just doesn't help.

The plan's Risk #1 anticipated this exact outcome:

If the post-merge profile shows no improvement (or worse, a
regression), the structural change isn't paying for itself and
B5 (Erlang functions) is the better lever.

That exit condition is met pre-merge.

Recommendation

Defer B4. The 51% do_execute/8 self-time should be attacked by B5
(compile instruction streams to native Erlang functions), which
collapses dispatch entirely into the BEAM's function-call mechanism —
the BEAM-tuned operation that beat every data-shape alternative we
tried.

A future plan could revisit B4 as a structural prerequisite for B5
if codegen wants integer entry points in the IR. Success criteria
would change: the bar would be "B5 compiles cleanly from the new
layout," not "dispatch gets faster" (which is now disproven).

Changes

```
.agents/plans/B4-flat-instruction-stream.md | 92 ++++++++++++++++++++-
1 file changed, 90 insertions(+), 2 deletions(-)
```

Verification

```
mix compile --warnings-as-errors # passes (no code changes)
```

Plan-file-only change; no tests affected.

Out of scope (intentional)

  • Implementing any of B4. The point of this PR is to stop the work
    before sunk cost begins.
  • B5. That's its own plan, to be scoped separately once we decide to
    pick it up.

…5% slower than list-cons

A synthetic dispatch microbench compared the current list-cons shape
against the proposed pc+elem case (and a multi-head variant) on
identical work. The current shape wins by 9-15%, with identical memory.
BEAM's [h | t] destructuring is one indirect load; elem(instrs, pc)
is a bounds-checked load plus integer arithmetic, and the tagged-tuple
jump table is identical in both, so the dispatch read is the only
delta and cons wins.

The structural target was correct: do_execute/8 is 51% of fib(22)
self-time on main @ bc69a2e (the plan referenced 43.6% from a pre-PR-223
baseline; the proportional cost has grown). But the proposed shape
makes it worse, not better. The plan's Risk #1 anticipated this:
'if the post-merge profile shows no improvement, B5 (Erlang functions)
is the better lever.' That exit condition is met pre-merge.

Plan: B4
@davydog187 davydog187 merged commit 9c873ed into main May 22, 2026
4 checks passed
@davydog187 davydog187 deleted the chore/defer-b4 branch May 22, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant