From c81a64491d1b99d2016605af8d75ad4edea3fec6 Mon Sep 17 00:00:00 2001 From: Dave Lucia Date: Thu, 21 May 2026 18:09:47 -0700 Subject: [PATCH] chore(B7): defer plan; tuple cost crossover makes large tables a regression PR #229 implemented the full plan and passed correctness, but multi-n measurement on the bench harness from PR #230 revealed that wins at n<=100 (-14% to -21%) are accompanied by losses at n=1000 (+30% to +40%), plus 3-5x memory regression at the same scale. The crossover is structural: setelement/3 on a 1024-cell tuple copies the whole tuple every write, vs Map.put's amortized log-time tree allocation. PUC-Lua avoids this with in-place C mutation; we can't. A future plan could revisit this with threshold-based promotion (stay in the data map until array_len reaches some boundary, then promote). That would preserve the small-table wins without the large-table hit. Until that plan exists, table workloads have to look elsewhere for durable wins (likely B5: compile prototypes to Erlang). Plan file updated with full measurement data and the conditions under which the work could be reopened. PR #229 closed without merging. --- .agents/plans/B7-table-array-hash-split.md | 73 +++++++++++++++++++++- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/.agents/plans/B7-table-array-hash-split.md b/.agents/plans/B7-table-array-hash-split.md index 3a9c92d..9c9e0c2 100644 --- a/.agents/plans/B7-table-array-hash-split.md +++ b/.agents/plans/B7-table-array-hash-split.md @@ -2,10 +2,10 @@ id: B7 title: Split table storage into array + hash parts issue: null -pr: null +pr: 229 branch: perf/table-array-hash-split base: main -status: ready +status: deferred direction: B unlocks: - O(1) `t[#t + 1] = x` (supersedes A10b) @@ -250,4 +250,71 @@ IO.puts("delta=#{after_mem - before_mem}B") ## Discoveries -(populated during implementation) +Implemented in PR #229; closed unmerged after multi-n measurement +(enabled by the bench harness in PR #230) revealed a hard crossover. + +### What landed in #229 + +- `Lua.VM.Table` gained `array :: tuple()`, `array_len :: non_neg_integer()`, + and `array_has_holes :: boolean()` fields. +- Reads go through new `Table.get/2`, `Table.has?/2`, `Table.length/1`, + `Table.to_map/1`, `Table.keys/1` helpers that consult both parts. +- Integer-keyed writes route through `put_integer/3` with exponential + capacity growth (doubling, floor 4) so sequential `t[i] = ...` is + amortized O(1). +- Every site that previously read `table.data` for an integer key was + migrated to the new helpers (executor, stdlib, lua.ex, display). + +### Why we closed it + +Full-mode benchmarks on the merged bench harness (#230) showed: + +| workload @ n | main | B7 | delta | +|---------------|-----------|-----------|---------| +| Build n=100 | 17.09 µs | 14.03 µs | -18% | +| Build n=1000 | 197.96 µs | 265.82 µs | **+34%** | +| Sort n=100 | 34.91 µs | 27.57 µs | -21% | +| Sort n=1000 | 490.49 µs | 655.72 µs | **+34%** | +| Iterate n=100 | 24.59 µs | 21.11 µs | -14% | +| Iterate n=1000| 276.74 µs | 358.64 µs | **+30%** | +| Map+Red n=100 | 49.79 µs | 42.78 µs | -14% | +| Map+Red n=1000| 603.93 µs | 843.57 µs | **+40%** | + +Memory regressed 3-5× at n=1000 (e.g. Sort 2.08 MB → 12.40 MB). + +The crossover is fundamental: exponential-growth tuples win over +`Map.put` at small n (where `setelement/3`'s constant-factor advantage +matters), but lose at large n (where every `setelement/3` copies the +whole tuple). PUC-Lua avoids this with in-place mutation in C; the +BEAM cannot. + +The single n=500 number that originally motivated B7 was right at +the crossover, which explains the inconsistent run-to-run results +before #230 enabled multi-n measurement. + +### Conditions for reconsidering + +A future plan could revisit this with **threshold-based promotion**: +keep contiguous integer keys in the hash map until `array_len` reaches +some threshold (e.g. 256), then promote. That preserves the small-table +wins (-14% to -21%) without taking the large-table hit. If we open such +a plan, it should: + +- Decide the promotion threshold empirically (n where setelement + cost crosses Map.put cost on the target hardware). +- Account for memory: even at threshold, the tuple still allocates + more than the equivalent map at the same size. +- Keep the helpers (`get/2`, `has?/2`, `length/1`, `to_map/1`, + `keys/1`) we'd reuse — they're the call-site contract. + +Until that plan is written and shipped, the durable wins on table +workloads have to come from elsewhere (e.g. attacking `setelement/3` +register-write cost via Erlang codegen — B5 — or by reducing the +number of writes per opcode). + +### Suite/test impact + +All 1692 tests + 29 lua53 suite tests passed on the B7 branch, so the +correctness work (helpers, nil-as-hole semantics, dead-key iteration) +is sound. None of that ships with this deferral — but the patterns +proved out and would be reusable in a future threshold-based attempt.