perf(stdlib): parse string.format flags once, dispatch on integer specifier#317
Conversation
…cifier Parse format flags a single time at parse time into an integer bitmask instead of re-scanning the flags binary with String.contains?/2 on every specifier application, and store the conversion char as a raw code point so apply_format_spec/2 dispatches on BEAM integer patterns rather than one-byte binaries. Pure internal refactor: formatted output is unchanged. Plan: B12 Closes #309
davydog187
left a comment
There was a problem hiding this comment.
Verdict: clean-with-nits
Automated review (skeptical senior-reviewer pass). Read-only; no self-approve since PR author == reviewer, so this is a comment.
Verified this is a genuine perf-only refactor with no observable output change. Specifics checked:
Correctness (output is byte-for-byte equivalent)
- Old
parse_flags/2guarded on~c(-+ 0#)= exactly[?-, ?+, ?\\s, ?0, ?#]. The new per-char clauses cover precisely those five bits. Equivalent set. ✓ - Order never mattered: the old code only ever consulted the flags binary via
String.contains?(flags, "0")/(flags, "-"), so collapsing to an unordered integer mask is faithful. ✓ - All 14 dispatch branches converted consistently to integer code points (
?d…?q) — no"d"/?dmismatch left behind (lib/lua/vm/stdlib/string.ex:432-447). ✓ - Invalid-option error re-renders the char via
<<specifier>>(string.ex:447), soinvalid option '%z'-style messages are unchanged. ✓ apply_width_flags/3(string.ex:858-859) now readsflags &&& @flag_minus/@flag_zero; all three padding branches (left-justify, right-justify, sign-aware zero-pad at :864) preserved one-to-one. ✓- The only surviving
String.contains?(string.ex:633) is the float.check inexpand_float— unrelated to flags, correctly untouched. ✓
Tests — coverage pins the behavior: example tests for every specifier (d/i/u/f/e/E/g/G/x/X/o/c/s/q) plus property tests for %d/%x/%X/%o/%s/%c, width, left-justify (%-Nd), zero-pad (%0Nd), precision (%.Nf/%.Ns), and the sign-aware zero-pad branch (%06d over -42, string_test.exs:398). The UTF-8 %6s/%-6s cases pin byte-width padding. Output drift would turn these red. All CI green (test matrix x2, Dialyzer, Lua 5.3 suite).
Scope — clean: only .agents/plans/B12-…md + lib/lua/vm/stdlib/string.ex. No bleed into the width (#316) or float-io_lib (#319) sibling regions.
Findings
- [nit] Commits
26ec3bdandaa875d5usechore(B12):— plan id as the commit scope, which the repo conventions (.agents/skills/ship-a-plan + CLAUDE.md) explicitly forbid; scope should name the subsystem. The substantive perf commit (0f5fa90) is correct:perf(stdlib): …subject withPlan: B12in the body. Non-blocking, but worth squashing/renaming the bookkeeping commits on the next pass.
No blockers, no majors. No plan-id leakage into source/comments, no Co-Authored-By AI trailer. Ship it.
perf(stdlib): parse string.format flags once, dispatch on integer specifier
Plan: .agents/plans/B12-string-format-flags-bitmask.md
Closes #309
Goal
Speed up the
string.formatper-specifier hot path inlib/lua/vm/stdlib/string.exwith zero change to observable output:bitmask, so the apply path reads precomputed bits instead of
re-scanning the flags binary with
String.contains?/2on everyspecifier application.
?d/?f/…)so the
apply_format_spec/2casedispatches on BEAMinteger patterns rather than one-byte binaries.
Success criteria
string.formatproduces byte-for-byte identical output for allexisting tests —
mix test test/lua/vm/string_test.exsgreen(152 passed, 41 properties). The plan named
test/lua/vm/stdlib/string_test.exs, but the format coverage livesin
test/lua/vm/string_test.exs; see Discoveries.String.contains?(flags, ...)re-scan remains inapply_width_flags/3(now readsflags &&& @flag_minus/@flag_zero).casedispatches on?d/?f/… and the invalid-option errorre-renders the char (
<<specifier>>) so the message is unchanged.+, space,#) still carried in themask but not consulted downstream — output unchanged.
mix compile --warnings-as-errorsclean.mix testpasses (2114 passed, 19 skipped, 1 excluded).mix run benchmarks/string_format.exsruns to completion; numbers below.Benchmark (Apple M4, mode: quick) —
lua (chunk)The width-flagged workload (which exercises the padding path on every
conversion, where the
String.contains?/2re-scan lived) sees thelargest win, as expected.
Changes
Verification output tail
Out of scope
format_spec_*formatter bodies (siblings perf(stdlib): defer string.format width padding to iolist #310, perf(stdlib): use io_lib.format for string.format float conversion #311 edit those).+, space,#flags.string.formatargument validation, error messages, or arity.lib/lua/vm/stdlib/string.ex.Discoveries
test/lua/vm/stdlib/string_test.exs,which does not exist. The
string.formatcoverage lives intest/lua/vm/string_test.exs; that file plus the fullmix testrunwere used as the correctness gate.
benchmarks/string_format.exslistsluaportas an optional dep thatneeds C Lua headers (absent here); its dep compile aborts
mix runbefore the runtime skip-guard fires. The lua-vs-luerl numbers above
were captured with
luaporttemporarily excluded frommix.exsforthe bench run only;
mix.exsis unmodified in this PR.