feat(lua): drop bounded_eval for handle_input (37% better tail latency) by Taure · Pull Request #42 · widgrensit/asobi_lua

Taure · 2026-05-05T16:33:49Z

Summary

handle_input/3 in both `asobi_lua_match` and `asobi_lua_world` no longer wraps the Luerl call in `bounded_eval` (spawn + monitor + heap_limit). Direct `luerl:call_function` instead.
tick/init/get_state/join/leave/vote_*/phases keep `bounded_eval` — those are the real sandbox boundaries.
Trust-model guide updated with a per-callback isolation table and an explicit "handle_input is not a sandbox boundary" section.
ADRs 0000 (process), 0001 (retroactive: asobi_lua_match_shared from feat(lua): asobi_lua_match_shared bridge for encode-once broadcast #41), 0002 (this change).

Why

The local 200-bot bench (asobi-bench/results/2026-05-05-post-fix1.md) revealed that encode-once (asobi#117) didn't move p99 because Luerl-eval CPU dominated encode CPU at 2k inputs/sec. The spawn-and-monitor-and-heap-cap-and-message-pass overhead was ~80 µs per call vs ~50-200 µs of real Lua work.

After this change (asobi-bench/results/2026-05-05-handle-input-no-spawn.md):

metric	baseline	post-fix-#1	post-this	delta vs fix-#1
p99	1433	1700	1530	-10%
p99.9	2429	2945	1860	-37%
max	4155	3750	2065	-45%
inputs/30s	~26k	~26k	~41k	+56%

Trade-off

A `while true do end` inside handle_input now hangs the match server until its caller's gen_server timeout (5s default) trips. The match supervisor then restarts the match. Blast radius is one match.

Prior behaviour: bounded_eval killed the runaway in 100ms, the bridge logged and dropped the input, the match continued.

This is documented in ADR 0002 with the explicit framing "handle_input is not a sandbox boundary; tick/1 is the load-bearing isolation point."

Test plan

Companion PR: asobi#118 (ADR convention + retroactive ADR 0001).

handle_input/3 in both asobi_lua_match and asobi_lua_world bridges no longer wraps the Luerl call in bounded_eval (spawn + monitor + heap_limit). At realistic input rates (200 players × 10 Hz = 2k inputs/sec) the per-call spawn overhead dominated actual Lua work and caused tail-latency stalls on the BEAM scheduler. Bench delta (asobi-bench, 200 bots, 30s, 10 Hz): - p99.9: ~2945ms -> ~1860ms (-37%) - max: ~3750ms -> ~2065ms (-45%) - inputs throughput: ~26k -> ~41k per 30s window (+56%) Trade-off documented in ADR 0002 and pinned by tests: - match_handle_input_no_wall_clock_timeout_test (match bridge) - world_handle_input_no_wall_clock_timeout_test (world bridge) - prop_lua_error_containment splits crash modes: tick still tests infinite_loop containment; input_crash_mode excludes it (would wedge the property runner — by design). Trust model updated in guides/security-trust-model.md with a new "Per-callback isolation" table and an explicit "handle_input is not a sandbox boundary" section. Also includes the project ADR convention (0000) and retroactive ADR 0001 documenting the asobi_lua_match_shared bridge that shipped in #41.

Taure merged commit d46e203 into main May 5, 2026
15 checks passed

Taure deleted the feat/handle-input-no-spawn branch May 5, 2026 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42

feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42
Taure merged 1 commit intomainfrom
feat/handle-input-no-spawn

Taure commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Taure commented May 5, 2026

Summary

Why

Trade-off

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant