perf(vm): pass callback args by slice to skip per-call Vec allocs 🩻#154
Merged
Conversation
In `dispatch_vec_call` and `dispatch_vec_call_dynamic`, every broadcast element used `std::mem::replace(&mut elem_args, Vec::with_capacity(args))` to hand a fresh `Vec<Value>` to `call_callback` — N allocations per outer call. Likewise every stdlib HOF callsite did `comp.call(vec![…])`, one heap allocation per element of `map`/`filter`/`sort_by`/`reduce`/etc. `call_callback` and `VmCallable::call` now take `&[Value]`. The native path was already passing `&args`; the closure path becomes `extend(args.iter().cloned())`. The vec dispatch loops reuse a single `elem_args` buffer via `clear()`. Stdlib HOFs build stack arrays. Three `vec![x.clone()]` sites that clippy flagged switch to `std::slice::from_ref(&x)`, eliminating a real Rc bump+drop per element on object-heavy iterables. `vec_hot_loop` and `hof_pipeline` benches show no meaningful movement (the allocator caches these small Vecs well), but the dispatch loops and HOF callsites read cleaner and the slice-from-ref change has measurable upside for non-trivial element types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
advent-of-brian comparison (23 programs)Aggregate, 5 full-suite runs
0.77% delta on the aggregate — within noise. Most programs are short enough that process startup dominates. Per-program (hyperfine, 8 runs each, the 4 longest)
Day-08 wins are real (5-7%, well outside σ) and reproducible. Both programs lean heavily on vec dispatch ( Day-04 and day-09 are flat, which is what we'd expect for programs that don't lean on the affected paths. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dispatch_vec_call/dispatch_vec_call_dynamicpreviously allocated a freshVec<Value>per broadcast element viastd::mem::replace(&mut elem_args, Vec::with_capacity(args)). They now reuse a singleelem_argsbuffer viaclear().map,filter,sort_by,reduce,max_by_key, …) previously didcomp.call(vec![…])— one heap alloc per element. They now build a stack array and pass&[…].filter,find,by_key) where the slice was&[x.clone()]switched tostd::slice::from_ref(&x)per clippy's suggestion, eliminating a realRc::clone+ drop per element on object-heavy iterables.Vm::call_callback,Vm::call_function,VmCallable::callall take&[Value]. The native dispatch path was already using&args; the closure path now doesself.stack.extend(args.iter().cloned()). For numeric tuples (the main vec-dispatch case), elements areInt/Floatand clone is a bitwise copy — no extra Rc work.Benchmarks
vec_hot_loop(200k Tuple+Tuple)hof_pipeline(filter/map/reduce, 100k items)Bench needle didn't move meaningfully — the allocator caches these small Vecs well — but the dispatch loop and HOF callsites read cleaner and the
from_refchange has measurable upside for non-trivial element types. The advent-of-brian comparison run will be added as a comment.Test plan
cargo fmtcargo clippy --all-targets— no new warnings introducedcargo test— all 718 tests pass