fix(vm): keep heap effects across protected-call error unwinding#333
Conversation
Automated multi-agent reviewThis is an automated multi-agent review of PR #333 ( Scope note: error-value passthrough / stringification (issue #334) was explicitly out of scope and not reviewed. Confirmed findings (all fixed on this branch)[HIGH] Native generic-for iterator raise drops current-frame heap mutations [HIGH] deflua/set! user-function error tuple drops heap mutations under pcall [MEDIUM] gsub function-replacement raise drops callback heap mutations [MEDIUM] No coverage for error in a native-stdlib callback under pcall [LOW] lua_next discards order_tail flush mutation on invalid-key raise [LOW] Doc comments reference private/nonexistent annotate_state/2 [LOW] Benchmark script fails repo Styler format check [LOW] pcall_state_bench omits memory measurement and the generic-for path Refuted / not actioned findingsNotable findings raised during the loop but refuted (for transparency)
Some lower-confidence findings drew a split vote (e.g. the Status
Note: the review-fix loop reached its maximum round count without converging to full stability, so a residual edge case may remain unsurfaced. All findings confirmed through round 4 have been fixed and the full suite is green. |
Red tests for protected-call error unwinding: heap effects (globals, table fields, upvalue cells, metatables) made before an error must survive pcall/xpcall trapping the error, matching reference Lua. Covers both the dispatcher and interpreter engines plus the Lua.call_function Elixir API error path.
VM exceptions gain a :state field carrying the raise-time State. pcall/xpcall/Lua.call_function recover it via State.unwind_to/2 — heap fields (tables, userdata, metatables, upvalue cells, private) come from the raise-time snapshot, control fields (call stack, open upvalues) restore from the protected-call entry, matching Lua 5.3 §2.3 error semantics. Annotates the raise sites that already have state in scope: error(), assert(), stack-overflow, and call-type errors in both engines.
Both native invocation points (Executor.call_function and the interpreter's inline native path) reraise VM exceptions with the call's entry state when none was attached deeper, covering every stdlib bad-argument raise without threading state through each check. Innermost annotation wins.
Arithmetic, comparison, concat, index, bitwise, length, and numeric-for coercion helpers raise from pure code where the freshest state lived only in the executor loop binding. They now take the caller's threaded state and attach it to the exception, so protected calls keep heap effects made earlier in the same frame (e.g. x = 2; return nil + 1 under pcall). One extra already-bound argument per helper — raise opts are only built on the error path.
Frame entries in both engines (Executor.execute, the lua_closure paths of call_function/call_value, and the dispatcher's do_execute_top) annotate escaping VM exceptions with the frame's entry state when nothing deeper attached one. Converts any missed raise site from 'loses all mutations under pcall' into 'loses only in-frame mutations of the innermost unannotated frame'. One try per frame entry — the executor loop's tail recursion is untouched.
Also derives Inspect excluding :state on VM exceptions so inspecting a trapped error never dumps the whole VM heap into logs.
string.gsub with a function replacement threads state through the
callback, but the invalid-return RuntimeError dropped the freshest
post-callback state, so a protected unwind rolled back the callback's
heap effects. Attach the live %State{} to the raise (guarded so the
non-stateful gsub/4 entry, which passes nil, is unaffected).
Also point the frame-boundary rescue comments at the real public
Executor.annotate_frame_state/2 and collapse the redundant private
annotate_state/2 alias so the name is uniform across both engines.
Add matrix cases (both engines) for heap mutations made inside a native stdlib callback that errors before returning: a gsub function replacement that returns an invalid value, a table.sort comparator that errors, and a pairs loop body that errors. These cross the native-call choke point and exercise the innermost-annotation reraise path.
The native-function clause of call_value/5 — the choke point for generic-for iterators reached from both engines — wrapped the native call in try/after for position restore but had no state-annotating rescue. A VM exception raised by a native iterator (utf8.codes, next via pairs) escaped with :state == nil and was only caught by the enclosing frame backstop, which annotated it with the frame's ENTRY state, rolling back heap mutations the protected function made before the loop (violating Lua 5.3 §2.3). Mirror the :call opcode native dispatch: add a rescue that reraises annotate_frame_state(e, state) so the iterator-call's entry state ferries out on the exception. Both engines route through this one clause via Executor.dispatcher_call_value/4, so no parity gap. Also thread the flushed state into lua_next's invalid-key ArgumentError so a next() invalid-key raise keeps the order_tail normalization and any prior heap mutations across the protected boundary.
Consistent with the repo's Styler plugin output; the file failed mix format --check-formatted otherwise.
The set!/3 arity-2 and deflua execute_function wrappers raised
RuntimeError without attaching the returned struct's state when a user
function signaled {:error, reason, %Lua{}}. The only annotation that
fired was the native-call choke point using the pre-call entry state,
which predates the mutations the user function made before erroring, so
pcall/xpcall rolled them back.
Bind the returned struct and pass state: returned_lua.state on the
raise, mirroring the gsub callback fix, so heap effects survive
protected-call unwinding.
Add pcall_state_preservation tests covering both wrapper sites.
Closes #331
…ll bench Enable memory_time so the per-call allocation/GC delta of the added try frames is observable, and add a generic_for case driven by a Lua closure iterator so executor.ex call_value/5's lua_closure clause (the path that gained a new per-iteration try block) is actually exercised.
42cab9d to
d785722
Compare
Closes #331.
Problem
When a function called via
pcall/xpcall(orLua.call_function/3from Elixir) raised an error, every mutation made before the error — global writes, table field updates, upvalue assignments, metatable changes — was rolled back. Reference Lua keeps those heap effects and only unwinds control state (Lua 5.3 §2.3):Root cause: all VM state threads through the immutable
%Lua.VM.State{}struct, and errors are Elixir exceptions that carried no state — so a protected call'srescueonly had its stale entry snapshot.Fix
VM exceptions (
RuntimeError,TypeError,AssertionError,ArgumentError) gain a:statefield carrying the raise-time state. Protected boundaries recover it via the newState.unwind_to/2: heap fields (tables, userdata, metatables, upvalue cells, private) come from the raise-time snapshot; control fields (call stack, call depth, open upvalues) restore from the entry state.State is attached in layers so no raise path is missed, with zero happy-path overhead:
error(),assert(), stack overflow, call-type errors (both engines) annotate directly.x = 2; return nil + 1under pcall keepsx = 2.tryper frame entry; the executor loop's tail recursion is untouched.The
xpcallmessage handler now runs against the recovered state, so it observes the mutations (matching PUC-Lua).Lua.call_function/3's{:error, reason, lua}likewise returns the recovered state. VM exceptions deriveInspectexcluding:stateso logging a trapped error never dumps the VM heap.TDD
The first commit is a red 37-case matrix (
test/lua/vm/pcall_state_preservation_test.exs) pinning reference semantics — global/table/upvalue/metatable mutations crossed witherror(),assert, runtime type errors, stdlib bad-argument errors, stack overflow, nested pcall, xpcall (including an erroring handler), and the Elixir API — each run under both engines (dispatcher and bytecode-stripped interpreter). Expected values were cross-checked against PUC Lua. Each subsequent commit turns a slice green; control tests pin that no-mutation and success paths are unchanged.Verification
mix test: 2190 passed, 0 failures (includes the official Lua 5.3 suite tests)[2, false, "boom"](was[1, false, "boom"])main(benchmarks/scripts/pcall_state_bench.exs): fib(26) −0.2%, table-write 200k −6.6%, concat 5k +1.2%, pcall loop 50k +0.4% — all within noiseNotes
pcall's rescue list now explicitly includesArgumentError(previously trapped by the catch-all clause; message behavior unchanged).error({code = 1})stringifies the error object instead of passing it through to pcall's second return — now tracked in pcall returns stringified error objects instead of passing the error value through #334.