feat(vectorization): broaden tuple operators and recover precise types 📐#141
feat(vectorization): broaden tuple operators and recover precise types 📐#141timfennis wants to merge 1 commit into
Conversation
Status flipped from Draft to Implemented. Patched the sections that drifted from what landed: - Worked-example commentary no longer claims the compiler emits unrolled element calls — that's RFC step 6, still deferred. - Type-inference table now reflects the actual mechanism (LUB across all candidates' inferred returns) instead of the separate "all scalar" vs "any vec" branches; documents the uniform LUB-collapse on vec return types and the exact-subtype precedence step. - Runtime broadening describes the new `find_overload`/ `dispatch_vec_call` path instead of the deleted try_vectorized_call. - Implementation checklist marked done per step with a ⏸ on step 6. - Open questions resolved or annotated with what shipped (LUB collapse over per-position, "no function found" over a dedicated arity error, empty tuple rejects, etc.). - References updated to point at the current code (and PR #141). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b56b7f7efd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c10338c61b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
c10338c to
a2998a1
Compare
Status flipped from Draft to Implemented. Patched the sections that drifted from what landed: - Worked-example commentary no longer claims the compiler emits unrolled element calls — that's RFC step 6, still deferred. - Type-inference table now reflects the actual mechanism (LUB across all candidates' inferred returns) instead of the separate "all scalar" vs "any vec" branches; documents the uniform LUB-collapse on vec return types and the exact-subtype precedence step. - Runtime broadening describes the new `find_overload`/ `dispatch_vec_call` path instead of the deleted try_vectorized_call. - Implementation checklist marked done per step with a ⏸ on step 6. - Open questions resolved or annotated with what shipped (LUB collapse over per-position, "no function found" over a dedicated arity error, empty tuple rejects, etc.). - References updated to point at the current code (and PR #141). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9ecac3295
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ion model 📐
Vectorization — applying operators element-wise across tuples like
`(1, 2) + (3, 4) → (4, 6)` — previously only fired for binary numeric
operations and used a uniform "one scalar overload covers every
position" model. This PR broadens the scope and refactors the dispatch
to a single principle:
> A vec call is a tuple of independent scalar dispatches, one per
> element position.
User-visible changes
- Unary, n-ary, and non-numeric operators all vec now. `-(1, 2, 3)`,
`("a", "b") ++ ("c", "d")`, `([1], [2]) ++ ([3], [4])`, `(1, 2) + 5`
all work where they previously errored.
- Heterogeneous tuples dispatch per position:
`([1,2,3], "foo") ++ ([4,5,6], "bar")` evaluates element 0 via
`++(List, List)` and element 1 via `++(String, String)`, yielding
`([1,2,3,4,5,6], "foobar")`.
- Chained operator calls keep precise types — the analyser no longer
widens `Tuple<Int, Int> - Tuple<Int, Int>` to `Any`.
- Compound assignment (`+=` etc.) on tuple lvalues type-checks
correctly.
- Mixed-element tuples with no overload at some position
(`(1, "a") + (2, "b")`) fail at compile time with the failing
position called out, instead of crashing mid-iteration at runtime.
- Element-call failures wrap with `"while vectorising '<name>' at
index N"` to preserve outer-call context.
- Vectorization is gated on operator syntax — `id((1, 2, 3))` still
returns the tuple verbatim and never element-wise calls `id`.
Implementation
- Parser: `Expression::Call` carries an `operator_form: bool` flag set
by the desugaring sites for infix and prefix-unary operators.
- Binding shape: every `Binding::Resolved` / `Binding::Dynamic` entry
is a `Candidate { var: ResolvedVar, vectorized: bool }`.
- Analyser: per-position resolution walks the scope chain once per
tuple position. `Binding::Resolved(vec)` only when every position
pins to the same scalar; otherwise `Binding::Dynamic` with the
union of per-position candidates. Result types are computed
element-by-element so heterogeneous-vec calls keep precise tuple
types.
- Runtime: `Callable::Vec { candidates, axis_len }` carries the full
scalar list. `dispatch_vec_call` resolves a scalar per element
position via the same `matches_value_args` lookup the scalar path
uses, so the analyser and runtime can't drift on what counts as
"applicable".
- Dead code removed: `BinaryOperator::supports_vectorization`,
`StaticType::supports_vectorization{,_with}`, `synthetic_vec_sig`,
`candidate_is_compat`, `static_vec_axis_len`, `vec_candidate_applies`,
and the old `try_vectorized_call` fallback path.
Design and docs
- Full RFC at `docs/design/vectorization.md`.
- User-facing behaviour documented in
`manual/src/reference/types/tuple.md`.
Tests
- 12 new functional `.ndc` programs covering unary vec, non-numeric
vec, list-element vec, mixed-numeric promotion, chained-precision,
compound-assignment aliasing, regular-call no-vec, exact-match
precision, heterogeneous vec, per-position no-overload error.
- 1 new regression program for the `Binding::Dynamic` misinference
the original PR caught.
- 5 new unit tests for the pure helpers.
Benchmarks (release-with-debug, 10 runs)
- AoC 2025/08 part1: 489ms master → 578ms branch (+18%).
- AoC 2025/08 part2: 816ms master → 885ms branch (+8%).
- Non-vec benches: ~20% faster on `enumerate_for_loop.ndc` and
`hof_pipeline.ndc` from precision recovery on operator chains;
others within ±4% noise.
The vec-loop slowdown is the cost of per-element correctness checking
(the old probe-first approach miscoupled mixed-element tuples). The
RFC's deferred compile-time unrolling of `Resolved(vec)` is the
natural follow-on perf PR.
Closes #145.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d9ecac3 to
ecc91f4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecc91f453a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let scalar = candidates | ||
| .iter() | ||
| .find(|f| f.matches_value_args(&element_args)) | ||
| .ok_or_else(|| { |
There was a problem hiding this comment.
Preserve per-position overload priority in vectorized dispatch
dispatch_vec_call picks the first runtime-matching function from a single merged candidate list, so overload order from earlier tuple positions can override the best match for later positions. With heterogeneous tuples and overlapping overloads (e.g. +(Number, Number) and +(Int, Int)), a call like (1.5, 1) + (2.5, 2) can resolve element 0 to the generic overload first, then incorrectly reuse that precedence for element 1 even though the scalar resolver would prefer the Int overload there. This breaks the scalar precedence contract for per-position vectorization and can produce wrong results for user-defined operator overloads.
Useful? React with 👍 / 👎.
…s 📐 (#146) ## Summary An alternate take on #141 with the same four user-visible features but cleaner separation of concerns and matching-or-better performance on every benchmark. **Vectorization now works for anything that has a scalar overload.** Unary, n-ary, non-numeric, scalar broadcast — same as #141: ``` -(1, 2, 3) → (-1, -2, -3) ("a", "b") ++ ("c", "d") → ("ac", "bd") ([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4]) (1, 2) + 5 → (6, 7) ``` Vectorization only fires on operator syntax. `id((1, 2, 3))` still returns the tuple, never three calls to `id`. **Precise types survive chained operations.** Operator calls that the analyser can pin to one scalar overload keep their precise return type — chains of `Tuple<Int, Int>` ops no longer widen to `Any`. **More problems caught at compile time.** Per-position vec resolution errors on `(1, "a") + (2, "b")` and `(1,1,(1,)) + (1,1,(1,))` at analysis time instead of crashing mid-iteration. ## Design Three structural moves vs the existing PR: * New AST variant **`Expression::OperatorCall`** for desugared operator syntax. Distinct from `Call`, so downstream layers match exhaustively and the parser is the only crate that knows which token names are operators. No `operator_form: bool` riding `Expression::Call` across every layer. * **`Candidate::{Scalar, Vec}`** as a sum type rather than a struct with a bool. `Binding::Resolved(Candidate::Vec(scalar))` reads cleanly in pattern matches without `if c.vectorized`. * **`ScopeTree::resolve_call`** is a single walk that returns both the binding and the inferred return type. The analyser no longer runs per-position resolution twice per Dynamic operator-form call. ## Runtime dispatch * **`OpCode::CallVec(args)`** for analyser-pinned vec calls. The compiler emits the scalar function directly (no `OverloadSet` wrapper) and the VM broadcasts it across the tuple axis without overload probing. This is the missing "step 6" optimisation from #141's RFC, brought forward into the same change. * **`Object::OverloadSet { scalars, vec_candidates }`** keeps the hot scalar walk at master's footprint — a unified `Vec<Candidate>` was the source of the numerics-heavy regressions. * **`Vm::resolve_callee`** still returns `Option<Function>` (same shape as master), so the dispatch loop's `OpCode::Call` arm stays compact. Full design write-up at [`docs/design/vectorization.md`](docs/design/vectorization.md). ## Behaviour changes * Mixed-element tuples error at compile time instead of crashing mid-iteration. Existing test `003_vector_error2.ndc` updated to match the new analyser-side error message. * `BinaryOperator::supports_vectorization` and the `StaticType::supports_vectorization{,_with}` helpers deleted — vec decisions live entirely in the analyser now. ## Benchmarks Hyperfine, release-with-debug, 20+ runs per command: | Script | Master | This branch | Δ | |-------------------------|-----------|-------------|--------| | `vec_hot_loop` (new) | 58.1 ms | 42.3 ms | **−27%** | | `fibonacci` | 69.1 ms | 63.7 ms | −8% | | `hof_pipeline` | 35.7 ms | 34.6 ms | −3% | | `enumerate_for_loop` | 107.4 ms | 107.4 ms | 0% | | `sieve` | 107.7 ms | 107.3 ms | 0% | | `matrix_mul` | 57.3 ms | 56.1 ms | −2% | | `ackermann` | 127.6 ms | 124.6 ms | −2% | | `quicksort` | 73.6 ms | 76.5 ms | +4% | The vec win comes from `CallVec` skipping the per-element overload probe. No bench regresses outside noise — versus #141's reported +27% on AoC 2025/08 vec-heavy workloads. ## Test plan - [x] `cargo test --workspace` — 298 functional + 18 compiler + 64 unit tests, all green - [x] `cargo clippy --workspace --lib --tests` — zero new warnings from this change - [x] `cargo fmt --check` — clean - [x] REPL spot-checks of the four feature areas (unary vec, string `++`, vec op=, mixed-element error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andy-cpp is a small interpreted language with a typed analyser and a bytecode VM. Operators like
+,-,++are function calls under the hood, and vectorization is what we call applying them element-wise across tuples:(1, 2) + (3, 4)evaluates to(4, 6).What this PR changes
Vectorization now works for anything that has a scalar overload
Previously, vectorization only fired for binary numeric operations. Unary operators, string concat, and other non-numeric overloads silently errored. Now:
Vectorization only fires on operator syntax — regular function calls never broadcast. So
id((1, 2, 3))still returns(1, 2, 3), not three calls toid.Precise types survive chained operations
The analyser can now see through tuple-broadcast operators and keep precise types. The example below used to fail because the type checker had to widen each operation's result to
Anyfor soundness:Compound assignment (
+=etc.) also keeps the precise tuple type:More problems caught at compile time
Calls that return a misleading inferred type used to slip through to runtime. They now produce a clearer compile-time error:
Friendlier element-call errors
When a vec call fails on a specific element, the error includes the outer call name and the index:
Behaviour changes that might surprise users
Programs that previously passed through to runtime on type-mismatched calls now widen the call's result type to
Anyat compile time (instead of synthesising a misleading concrete type). The call still runs through to runtime; it's the type information of the returned value that becomes less specific, which can surface as a clearermismatched typeserror at the next use site.Benchmarks
Hyperfine, release-with-debug profile, current branch vs. master (10 runs after 3-run warmup):
enumerate_for_loop.ndchof_pipeline.ndcenumerate_to_list.ndcfibonacci.ndcmatrix_mul.ndcquicksort.ndcsieve.ndcWhere the wins come from. PR #140 widened many call-result types to
Anyfor soundness, which forced downstream operations onto the slower dynamic-dispatch path. This PR's per-binding vec tracking lets the analyser keep precise types through chains of scalar operations, so non-tuple workloads dispatch directly to the resolved scalar overload.Where the losses come from. The runtime vec dispatch now checks every element pair against the underlying scalar's parameter types, where the old
try_vectorized_callonly probed the first pair. The check is what catches mixed-element tuples like(1, "a") + (2, "b")cleanly instead of crashing mid-iteration. The cost is real on vec-dispatch-heavy hot loops (~25% on the worst case). The follow-on optimisation that the RFC describes — emitting unrolled element calls when the analyser has statically resolved the vec candidate — would recover most of this overhead but is deferred to its own PR.How to verify
Quick REPL spot-checks:
Design notes
The full design write-up lives at
docs/design/vectorization.mdand the language manual page atmanual/src/reference/types/tuple.mdcovers the user-facing behaviour.🤖 Generated with Claude Code