Motivation
The VM has no internal bound on CPU work. A script that never terminates
(while true do end) or spins in a tight numeric loop runs until
something outside the VM stops it. Today that "something" is only the
host: the playground wraps execution in a wall-clock timeout, and
Website.LuaSandbox.run/1 has a safety timeout. But a library consumer
who calls Lua.eval!/2 directly — without wrapping it in a Task +
timeout — has no protection against a runaway script.
The allocation-bomb surface is already covered deterministically inside
the VM (see #305: Lua.VM.Limits + the .. concat guard, and the
existing :max_call_depth). Pure CPU exhaustion is the remaining gap.
Proposal
Add a :max_steps option to Lua.new/1, mirroring the existing
:max_call_depth:
- Default
:infinity (no limit), so existing behavior is unchanged.
- A positive integer bounds the number of VM instructions executed; on
exhaustion, raise a catchable runtime error (e.g.
"instruction budget exceeded") so pcall can recover.
Optionally, a companion :max_alloc_bytes that tallies bytes produced at
the allocating opcodes (concat, table grow) for a deterministic memory
bound independent of the BEAM's GC-timed max_heap_size. Could land in a
follow-up.
Design notes / risks
- This touches the hot
do_execute/8 dispatch loop, so it must be
benchmarked. The benchee harness under benchmarks/ is the tool; gate
the change on no meaningful regression when :max_steps is :infinity
(the default path must stay free).
- To minimize per-instruction overhead, consider incrementing the counter
only at loop back-edges and call boundaries rather than on every opcode
— that still bounds total work without taxing straight-line code. Thread
the counter as a parameter rather than rebuilding %State{} per step
(the executor deliberately keeps line out of State for this reason).
- Both the interpreter and the compiled dispatcher (
dispatcher.ex) paths
need the budget.
Acceptance
Context
Deferred out of #305 ("Harden the VM against allocation-bomb DoS") as a
separate, benchmarked change. The playground is unaffected — infinite
loops there are already stopped by the wall-clock timeout.
Motivation
The VM has no internal bound on CPU work. A script that never terminates
(
while true do end) or spins in a tight numeric loop runs untilsomething outside the VM stops it. Today that "something" is only the
host: the playground wraps execution in a wall-clock timeout, and
Website.LuaSandbox.run/1has a safety timeout. But a library consumerwho calls
Lua.eval!/2directly — without wrapping it in aTask+timeout — has no protection against a runaway script.
The allocation-bomb surface is already covered deterministically inside
the VM (see #305:
Lua.VM.Limits+ the..concat guard, and theexisting
:max_call_depth). Pure CPU exhaustion is the remaining gap.Proposal
Add a
:max_stepsoption toLua.new/1, mirroring the existing:max_call_depth::infinity(no limit), so existing behavior is unchanged.exhaustion, raise a catchable runtime error (e.g.
"instruction budget exceeded") sopcallcan recover.Optionally, a companion
:max_alloc_bytesthat tallies bytes produced atthe allocating opcodes (concat, table grow) for a deterministic memory
bound independent of the BEAM's GC-timed
max_heap_size. Could land in afollow-up.
Design notes / risks
do_execute/8dispatch loop, so it must bebenchmarked. The benchee harness under
benchmarks/is the tool; gatethe change on no meaningful regression when
:max_stepsis:infinity(the default path must stay free).
only at loop back-edges and call boundaries rather than on every opcode
— that still bounds total work without taxing straight-line code. Thread
the counter as a parameter rather than rebuilding
%State{}per step(the executor deliberately keeps
lineout ofStatefor this reason).dispatcher.ex) pathsneed the budget.
Acceptance
:max_stepsconfigurable viaLua.new/1, validated like:max_call_depthpcall):infinitypathguides/sandboxing.mdto cover:max_stepsContext
Deferred out of #305 ("Harden the VM against allocation-bomb DoS") as a
separate, benchmarked change. The playground is unaffected — infinite
loops there are already stopped by the wall-clock timeout.