perf(vm): split-storage tables (Erlang :array + map)#328
Conversation
Complete the array/hash split-storage rework started in the prior WIP commit. Route dense positive-integer keys (1..n) to an Erlang :array for O(1) functional read/write; keep all other keys in the hash map with the existing order/dead iteration bookkeeping. Salvaged from WIP and finished: - table.sort plain path: read via Table.get/2 and write back via Table.put_many/2 so sorted values land in the array (was reading/writing the hash, making sort a no-op for array-resident tables). - get/2 and has_key?/2: normalize integer-valued float keys (t[1.0]) so they resolve to the same array slot as t[1]. - delete on an array key now leaves a nil hole and keeps arr_n as the high-water bound instead of shrinking the border, so next_entry/2 can still distinguish a former array key (resume in-array) from a key that was never present (raise 'invalid key to next') after a mid-iteration clear, per Lua 5.3 §6.1. - length/1 derives the sequence border by scanning the array to the first hole, then probing the hash for sparse integer keys beyond it. Full mix test (incl. Lua 5.3 official suite) passes.
Automated review (workflow
|
| Workload | Before | After | Delta | vs luerl |
|---|---|---|---|---|
| Table Build | 214.15 µs | 137.30 µs | −36% | was 1.38× slower → now faster |
| Table Sort | 350.66 µs | 251.55 µs | −28% | still 1.40× slower (comparator-bound, not storage) |
| Table Iterate/Sum | 304.06 µs | 194.09 µs | −36% | now faster |
| Table Map+Reduce | 620.45 µs | 388.18 µs | −37% | now faster |
Collateral: fibonacci +0.8%, closures −0.8% (noise).
…ytes The 256 MiB string ceiling guarded string.rep, .., and load readers at a fixed compile-time bound. Embedders running the VM inside a process capped with :max_heap_size could not align the two limits: the VM would legally materialize strings far past a smaller heap cap, leaving the kill to GC-observation timing instead of a deterministic refusal. Thread the ceiling through State (default unchanged) and expose it as Lua.new(max_string_bytes: n), mirroring :max_call_depth. The hot-path concat guard reads a struct field instead of a module attribute; byte_size/1 remains O(1), so the check stays cheap.
…orker The string-bomb playground example only passed by GC-timing luck: the VM's 256 MiB string ceiling sits far above the sandbox's 64 MB heap cap, so the doubling demo legally materialized ~384 MB of live binaries and survived only when no GC observed the peak. Any allocation-pattern shift elsewhere in the VM re-rolled that dice — surfacing as a flaky website test. Set the VM's :max_string_bytes to 16 MB, well under the heap cap, so the refusal is deterministic and the example's before-a-byte-is- allocated claim is actually true. Also unlink the eval worker after it delivers its result: its exit-time GC can trip the heap kill late, and the :killed signal propagated to the caller after trap_exit was restored, crashing it (reproducible by calling run/1 twice in a row with the string-bomb source).
Website CI failure: diagnosis and fixThe
Fix (two commits)
Verification
Scope note for the reviewer: commit 1 adds a public API option; it's the root-cause fix rather than a test-tolerance workaround, but it does extend this PR beyond table storage. |
Closes #314.
Summary
Splits
%Lua.VM.Table{}storage: dense positive-integer keys (1..n)route to an Erlang
:array(O(1) functional read/write, dense-integeriteration ordering for free); all other keys (strings, non-positive /
sparse integers, float/bool/table keys) stay in the hash map with the
existing
order/order_tail/deaditeration bookkeeping.String-keyed reads (globals, fields, metatable lookups) are unchanged —
they still hit the hash map directly, so the executor's
%{^name => v}fast paths and
__index/__newindexdispatch pay nothing.Why now
The issue gated this on "the table-build / sort residual gap persists
after the per-element write-back fixes." It does, and split storage
closes it — moving us from behind luerl to ahead of it on three of four
table-heavy workloads.
Benchmarks
Apple M4,
LUA_BENCH_MODE=full,lua (chunk)path (isolates the VM),n=1000. luerl is the machine-drift control and stays flat (<3%) across
the before/after runs, so the deltas are real, not machine noise.
Standing vs luerl after the change: Build, Iterate, and Map+Reduce now
beat luerl (e.g. Build 137 us vs luerl 158 us, previously 1.38x
slower). Sort improves a lot (350 -> 252 us) but is still 1.40x slower
than luerl — the remaining bottleneck is the O(n^2) state-threading
comparator path in
table.sort, not storage. That is a separate lever.Collateral spot-check (non-table workloads, must not regress):
Correctness
mix test: 2153 passed, 0 failures (includes the Lua 5.3 officialsuite —
nextvar.lua,next/pairs/ipairs, sort, pack/unpack).deleteon an array key leaves anilhole and keepsarr_nas thehigh-water bound rather than shrinking the border. This preserves the
§6.1 dead-key contract: after a mid-iteration
t[k] = nil,next_entry/2can still tell a former array key (resume in-array) froma key never present (raise "invalid key to 'next'").
length/1derives the sequence border by scanning the array to thefirst hole, then probing the hash for sparse integers beyond it.
Notes / follow-ups
clearing a dense integer key to nil ... keeps its array slot) was updated: a cleared-then-revived integer key now keeps itsarray index (iteration
1,2,3,4,5) instead of moving to the end ofinsertion order (
1,2,4,5,3). The new order matches reference Lua 5.3,where these keys all live in the array part. Two
value_testassertionsthat reached into
table.data[1]were repointed atTable.get/2.length/1is currently O(border) rather than O(1) because holes canappear after a delete; a cached border would restore O(1) for the
hole-free common case. Left as a follow-up — it did not show up as a
regression in any benchmark.
benchmarks/array_vs_map_probe.exsis kept as the data-structureprobe that motivated the decision.
luaportwas dropped from thebenchmark deps (it fails to compile without C Lua headers and the
harness already skips it gracefully).
Opened as a draft for review.