feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42
Merged
feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42
Conversation
handle_input/3 in both asobi_lua_match and asobi_lua_world bridges no longer wraps the Luerl call in bounded_eval (spawn + monitor + heap_limit). At realistic input rates (200 players × 10 Hz = 2k inputs/sec) the per-call spawn overhead dominated actual Lua work and caused tail-latency stalls on the BEAM scheduler. Bench delta (asobi-bench, 200 bots, 30s, 10 Hz): - p99.9: ~2945ms -> ~1860ms (-37%) - max: ~3750ms -> ~2065ms (-45%) - inputs throughput: ~26k -> ~41k per 30s window (+56%) Trade-off documented in ADR 0002 and pinned by tests: - match_handle_input_no_wall_clock_timeout_test (match bridge) - world_handle_input_no_wall_clock_timeout_test (world bridge) - prop_lua_error_containment splits crash modes: tick still tests infinite_loop containment; input_crash_mode excludes it (would wedge the property runner — by design). Trust model updated in guides/security-trust-model.md with a new "Per-callback isolation" table and an explicit "handle_input is not a sandbox boundary" section. Also includes the project ADR convention (0000) and retroactive ADR 0001 documenting the asobi_lua_match_shared bridge that shipped in #41.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
The local 200-bot bench (asobi-bench/results/2026-05-05-post-fix1.md) revealed that encode-once (asobi#117) didn't move p99 because Luerl-eval CPU dominated encode CPU at 2k inputs/sec. The spawn-and-monitor-and-heap-cap-and-message-pass overhead was ~80 µs per call vs ~50-200 µs of real Lua work.
After this change (asobi-bench/results/2026-05-05-handle-input-no-spawn.md):
Trade-off
A `while true do end` inside handle_input now hangs the match server until its caller's gen_server timeout (5s default) trips. The match supervisor then restarts the match. Blast radius is one match.
Prior behaviour: bounded_eval killed the runaway in 100ms, the bridge logged and dropped the input, the match continued.
This is documented in ADR 0002 with the explicit framing "handle_input is not a sandbox boundary; tick/1 is the load-bearing isolation point."
Test plan
Companion PR: asobi#118 (ADR convention + retroactive ADR 0001).