perf(vm): fast-path Numeric.to_signed_int64 for in-range integers#227
Merged
Conversation
The Lua 5.3 wrap-around mask runs on every integer arithmetic result, but
the overwhelming common case is an input already in [-2^63, 2^63 - 1],
which passes through unchanged. Adding a guard-clause clause that returns
the input as-is short-circuits the masking on that branch.
`@compile {:inline, ...}` lets the BEAM inline both clauses at intra-module
call sites; cross-module callers still trip a function boundary but the
guarded clause's match cost is lower than the band+compare body.
On fib(22), Numeric.to_signed_int64 self-time drops 3.82% -> 3.38% under
tprof. On fib(30) wall clock, lua (chunk) improves 873.4ms -> 844.8ms
(-3.3%), comfortably outside the run-to-run deviation band. Luerl (the
control) does not move. Overflow tests (max_int + 1, min_int - 1,
0xFFFF...) still wrap correctly.
Plan: .agents/plans/B8-inline-numeric-narrowing.md
8 tasks
davydog187
added a commit
that referenced
this pull request
May 22, 2026
Post-PR #223 / #227 profile shows Map.get/2 + Map.get/3 combined at 3.28% on fib(22) (plan claimed 6.4%), 2.81% on OOP, and 0.04% on table_build. The real table-workload bottlenecks live inside Lua.VM.Table (insert/put 18%, normalize_key 3.3%, sequence_length 4%) and in :erlang.setelement (17.5% on table writes, 20.9% on OOP). Those are B7's targets, not B6's. B6's projected wall-clock win is now below 1%, inside benchee's deviation band on every measured workload. Audit cleanup may still be worth doing later as a refactor, but not as a perf plan and not before B7.
davydog187
added a commit
that referenced
this pull request
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inline
to_signed_int64/1for the in-range fast pathPlan:
.agents/plans/B8-inline-numeric-narrowing.mdGoal
Lua.VM.Numeric.to_signed_int64/1is called on every integer arithmetic resultto wrap into signed 64-bit per Lua 5.3 §3.4.1. In the fib(22) tprof profile it
accounts for 3.82% of total time across 85,968 calls. For the overwhelming
common case where the result is already in
[-2^63, 2^63 - 1], the masking andconditional subtraction are wasted work. Adding a guarded fast-path clause that
returns the input as-is when it's already in range short-circuits the cost on
that branch, and
@compile {:inline, ...}lets the BEAM inline both clauses atintra-module call sites.
Success criteria
to_signed_int64/1has a guard-clause fast path for inputs already inthe signed 64-bit range — verified in
lib/lua/vm/numeric.ex.signed?/1is@compile {:inline, signed?: 1}so the fast-path guardis cheap — applied alongside
to_signed_int64: 1.mix testpasses — 1692 tests, 51 properties, 55 doctests, 0 failures.mix test --only lua53does not regress — 29 tests, 0 failures (matchesmain).
Numeric.to_signed_int64self-time drops on fib(22).Measured: 3.82% → 3.38% (12% relative drop). The plan's stretch target of
< 1.5% relied on cross-module inlining, which
@compile {:inline, ...}does not perform; the realized win comes from the guard short-circuit only.
wall clock: lua (chunk) 873.4ms → 844.8ms (-3.3%), lua (eval) 876.7ms
→ 852.2ms (-2.8%). Luerl (control) 730.9ms → 731.8ms (unchanged).
Both runs at ±0.5% deviation, well outside noise.
9223372036854775807 + 1 == -9223372036854775808,-9223372036854775808 - 1 == 9223372036854775807,0xFFFFFFFFFFFFFFFF == -1. All correct.Changes
The behavior is bit-for-bit identical; the fast path is purely a guard-tested
return-as-is. The slow path (already-out-of-range integers needing wrap-around)
is unchanged.
Discoveries
@compile {:inline, ...}only inlines within the same module. Cross-modulecallers in
Lua.VM.ExecutorandLua.VM.Valuestill trip a functionboundary on every call. This caps the win below the plan's stretch target —
the realized improvement comes entirely from the guard short-circuit, not
from inlining at the dispatch sites.
happened at the cross-module callers. The wall-clock improvement is real
(luerl as control did not move), so the per-call win is genuine even if
modest.
Verification
Benchmark (fib(30), 10s benchee runs, 2s warmup):
Profile (fib(22),
mix profile.tprof):Out of scope (intentional)
to_signed_int64/1calls entirely at the executor level — thatis B5 territory (compiling prototypes to Erlang).
not the right place.