perf(vm): split-storage tables (Erlang :array + map) by davydog187 · Pull Request #328 · tv-labs/lua

davydog187 · 2026-06-03T10:42:30Z

Closes #314.

Summary

Splits %Lua.VM.Table{} storage: dense positive-integer keys (1..n)
route to an Erlang :array (O(1) functional read/write, dense-integer
iteration ordering for free); all other keys (strings, non-positive /
sparse integers, float/bool/table keys) stay in the hash map with the
existing order/order_tail/dead iteration bookkeeping.

String-keyed reads (globals, fields, metatable lookups) are unchanged —
they still hit the hash map directly, so the executor's %{^name => v}
fast paths and __index/__newindex dispatch pay nothing.

Why now

The issue gated this on "the table-build / sort residual gap persists
after the per-element write-back fixes." It does, and split storage
closes it — moving us from behind luerl to ahead of it on three of four
table-heavy workloads.

Benchmarks

Apple M4, LUA_BENCH_MODE=full, lua (chunk) path (isolates the VM),
n=1000. luerl is the machine-drift control and stays flat (<3%) across
the before/after runs, so the deltas are real, not machine noise.

Workload	Before	After	Delta	luerl (before -> after)
Build	214.15 us	137.30 us	-36% (1.56x)	155.16 -> 157.55 (flat)
Sort	350.66 us	251.55 us	-28% (1.39x)	181.41 -> 179.77 (flat)
Iterate/Sum	304.06 us	194.09 us	-36% (1.57x)	246.38 -> 254.00 (flat)
Map + Reduce	620.45 us	388.18 us	-37% (1.59x)	447.55 -> 444.89 (flat)

Standing vs luerl after the change: Build, Iterate, and Map+Reduce now
beat luerl (e.g. Build 137 us vs luerl 158 us, previously 1.38x
slower). Sort improves a lot (350 -> 252 us) but is still 1.40x slower
than luerl — the remaining bottleneck is the O(n^2) state-threading
comparator path in table.sort, not storage. That is a separate lever.

Collateral spot-check (non-table workloads, must not regress):

Fibonacci: 801.40 -> 807.63 ms (+0.8%, noise)
Closures: 450.79 -> 447.16 us (-0.8%, noise)

Correctness

Full mix test: 2153 passed, 0 failures (includes the Lua 5.3 official
suite — nextvar.lua, next/pairs/ipairs, sort, pack/unpack).
delete on an array key leaves a nil hole and keeps arr_n as the
high-water bound rather than shrinking the border. This preserves the
§6.1 dead-key contract: after a mid-iteration t[k] = nil,
next_entry/2 can still tell a former array key (resume in-array) from
a key never present (raise "invalid key to 'next'").
length/1 derives the sequence border by scanning the array to the
first hole, then probing the hash for sparse integers beyond it.

Notes / follow-ups

One test (clearing a dense integer key to nil ... keeps its array slot) was updated: a cleared-then-revived integer key now keeps its
array index (iteration 1,2,3,4,5) instead of moving to the end of
insertion order (1,2,4,5,3). The new order matches reference Lua 5.3,
where these keys all live in the array part. Two value_test assertions
that reached into table.data[1] were repointed at Table.get/2.
length/1 is currently O(border) rather than O(1) because holes can
appear after a delete; a cached border would restore O(1) for the
hole-free common case. Left as a follow-up — it did not show up as a
regression in any benchmark.
benchmarks/array_vs_map_probe.exs is kept as the data-structure
probe that motivated the decision. luaport was dropped from the
benchmark deps (it fails to compile without C Lua headers and the
harness already skips it gracefully).

Opened as a draft for review.

…ay not compile)

Complete the array/hash split-storage rework started in the prior WIP commit. Route dense positive-integer keys (1..n) to an Erlang :array for O(1) functional read/write; keep all other keys in the hash map with the existing order/dead iteration bookkeeping. Salvaged from WIP and finished: - table.sort plain path: read via Table.get/2 and write back via Table.put_many/2 so sorted values land in the array (was reading/writing the hash, making sort a no-op for array-resident tables). - get/2 and has_key?/2: normalize integer-valued float keys (t[1.0]) so they resolve to the same array slot as t[1]. - delete on an array key now leaves a nil hole and keeps arr_n as the high-water bound instead of shrinking the border, so next_entry/2 can still distinguish a former array key (resume in-array) from a key that was never present (raise 'invalid key to next') after a mid-iteration clear, per Lua 5.3 §6.1. - length/1 derives the sequence border by scanning the array to the first hole, then probing the hash for sparse integer keys beyond it. Full mix test (incl. Lua 5.3 official suite) passes.

davydog187 · 2026-06-03T10:49:43Z

Automated review (workflow `respike-314`) — stable in round 1

This PR was produced by a benchmarked go/no-go spike and reviewed by an AI workflow. Posting the reviewer's own findings for the human record.

Verdict: approve (zero blocker/high/medium). The reviewer checked out the branch, compiled it, ran the full suite (2153 passed, incl. the Lua 5.3 official suite), and wrote ~35 targeted probe tests against the edge cases of a storage split. All held:

float/int key aliasing (t[1.0] == t[1]); key 0 and negatives stay in hash and don't affect #
huge sparse indices (t[1]=x; t[1000000]=y) do not allocate a 1M :array (beyond-border writes fall through to hash)
nil-deletion works in both stores; # border with holes is a valid Lua 5.3 border; ipairs stops at the first hole
pairs iterates array-then-hash, visits every key exactly once (incl. after absorb-promotion of out-of-order fills); §6.1 dead-key semantics hold (mid-iteration clears, invalid key to next, deleted-then-revived keys keep slots)
__index/__newindex fire on misses in either store; insert/remove/sort write-back preserves non-sequence keys
absorb promotion is O(n) on descending construction and doesn't resurrect deleted sparse keys
benchmark plausibility confirmed; fibonacci/closures flat as expected (no table code in their paths)

Findings (all low/nit — non-blocking cleanups)

low — mix.exs:85: the PR drops the {:luaport, …, only: :benchmark} dep, but benchmarks/oop.exs still references :luaport (degrades gracefully to "skipping C Lua benchmarks", but the oop bench silently loses its C-Lua column). Either restore the dep or remove the dead block + setup-script references.
low — lib/lua/vm/value.ex:166: Value.sequence_length/1 is now dead code — every runtime caller migrated to Table.length/1; only a stale comment at executor.ex:2391 still mentions it.
low — lib/lua/vm/state.ex:111: State.globals/1 returns only the hash portion post-split — a program doing _G[1] = x would have that entry invisible to globals/1. Either return Table.to_map(table) or tighten the @doc to string-named globals.
nit — lib/lua/vm/table.ex:74: replace_data/2 could be split_from_map(%__MODULE__{metatable: table.metatable}, data) instead of the long explicit-reset one-liner.

Benchmark summary (Apple M4, n=1000, luerl as drift control — stayed <3% flat)

Workload	Before	After	Delta	vs luerl
Table Build	214.15 µs	137.30 µs	−36%	was 1.38× slower → now faster
Table Sort	350.66 µs	251.55 µs	−28%	still 1.40× slower (comparator-bound, not storage)
Table Iterate/Sum	304.06 µs	194.09 µs	−36%	now faster
Table Map+Reduce	620.45 µs	388.18 µs	−37%	now faster

Collateral: fibonacci +0.8%, closures −0.8% (noise).

…ytes The 256 MiB string ceiling guarded string.rep, .., and load readers at a fixed compile-time bound. Embedders running the VM inside a process capped with :max_heap_size could not align the two limits: the VM would legally materialize strings far past a smaller heap cap, leaving the kill to GC-observation timing instead of a deterministic refusal. Thread the ceiling through State (default unchanged) and expose it as Lua.new(max_string_bytes: n), mirroring :max_call_depth. The hot-path concat guard reads a struct field instead of a module attribute; byte_size/1 remains O(1), so the check stays cheap.

…orker The string-bomb playground example only passed by GC-timing luck: the VM's 256 MiB string ceiling sits far above the sandbox's 64 MB heap cap, so the doubling demo legally materialized ~384 MB of live binaries and survived only when no GC observed the peak. Any allocation-pattern shift elsewhere in the VM re-rolled that dice — surfacing as a flaky website test. Set the VM's :max_string_bytes to 16 MB, well under the heap cap, so the refusal is deterministic and the example's before-a-byte-is- allocated claim is actually true. Also unlink the eval worker after it delivers its result: its exit-time GC can trip the heap kill late, and the :killed signal propagated to the caller after trap_exit was restored, crashing it (reproducible by calling run/1 twice in a row with the string-bomb source).

davydog187 · 2026-06-03T11:45:10Z

Website CI failure: diagnosis and fix

The Website tests job failed on playground/string-bomb ("expected to run cleanly, got: Execution exceeded the memory limit"). This PR did not break the example — it exposed a pre-existing fragility:

The VM's string ceiling (Limits.max_string_bytes, 256 MiB) sits far above the website sandbox's worker heap cap (64 MB). The example's doubling loop is therefore legally allowed to materialize 128 MB + 256 MB binaries before the .. guard refuses — ~384 MB of live binaries inside a 64 MB-capped process.
max_heap_size (with include_shared_binaries) only kills when a GC happens to observe the oversized live set. Whether one lands during the two fatal iterations depends on the exact allocation trajectory. A byte-faithful replica of the sandbox's eval gets killed on main too; running the real LuaSandbox.run/1 twice in a row on main crashes the caller on the second run. The example has been passing by GC-timing luck, and this PR's different Lua.new setup allocations merely re-rolled the dice.

Fix (two commits)

feat(vm): Lua.new(max_string_bytes: n) — the string ceiling (.., string.rep, load readers) becomes per-state, mirroring :max_call_depth from feat(vm): configurable max call depth #283. Default unchanged (256 MiB). The hot-path concat guard reads a struct field instead of a module attribute. New coverage in limits_test.exs (both interpreter and dispatcher concat paths, load readers, validation); sandboxing guide updated with a "pair the ceiling with your heap cap" warning.
fix(website): the sandbox sets max_string_bytes: 16 MB — well under its 64 MB cap — so the string-bomb refusal is deterministic and the example's "refused before a byte is allocated" claim is actually true. Also fixes a latent race found while diagnosing: the eval worker stays linked after delivering its result, and its exit-time GC can trip the heap kill after the caller stops trapping exits, crashing the caller. The worker is now unlinked (and any queued exit flushed) on result delivery.

Verification

Full suite: 2159 passed / 19 skipped (incl. new :max_string_bytes tests)
Lua 5.3 official suite: passed
Website tests: 52/52 (previously 51/52)
The two-consecutive-runs caller-crash repro: 5/5 clean

Scope note for the reviewer: commit 1 adds a public API option; it's the root-cause fix rather than a test-tolerance workaround, but it does extend this PR beyond table storage.

davydog187 added 2 commits June 3, 2026 03:06

WIP: split-storage table prototype (interrupted mid-implementation, m…

d45652c

…ay not compile)

davydog187 marked this pull request as ready for review June 3, 2026 11:08

davydog187 added 2 commits June 3, 2026 04:44

davydog187 merged commit 9755265 into main Jun 3, 2026
5 checks passed

davydog187 deleted the perf/split-storage-tables branch June 3, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vm): split-storage tables (Erlang :array + map)#328

perf(vm): split-storage tables (Erlang :array + map)#328
davydog187 merged 4 commits into
mainfrom
perf/split-storage-tables

davydog187 commented Jun 3, 2026

Uh oh!

davydog187 commented Jun 3, 2026

Uh oh!

davydog187 commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented Jun 3, 2026

Summary

Why now

Benchmarks

Correctness

Notes / follow-ups

Uh oh!

davydog187 commented Jun 3, 2026

Automated review (workflow respike-314) — stable in round 1

Findings (all low/nit — non-blocking cleanups)

Benchmark summary (Apple M4, n=1000, luerl as drift control — stayed <3% flat)

Uh oh!

davydog187 commented Jun 3, 2026

Website CI failure: diagnosis and fix

Fix (two commits)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Automated review (workflow `respike-314`) — stable in round 1