feat(vectorization): broaden tuple operators and recover precise types 📐 by timfennis · Pull Request #141 · timfennis/andy-cpp

timfennis · 2026-05-23T06:56:42Z

andy-cpp is a small interpreted language with a typed analyser and a bytecode VM. Operators like +, -, ++ are function calls under the hood, and vectorization is what we call applying them element-wise across tuples: (1, 2) + (3, 4) evaluates to (4, 6).

What this PR changes

Vectorization now works for anything that has a scalar overload

Previously, vectorization only fired for binary numeric operations. Unary operators, string concat, and other non-numeric overloads silently errored. Now:

-(1, 2, 3)               → (-1, -2, -3)        # unary
("a", "b") ++ ("c", "d") → ("ac", "bd")        # strings
([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4])    # lists
(1, 2) + 5               → (6, 7)              # scalar broadcasts

Vectorization only fires on operator syntax — regular function calls never broadcast. So id((1, 2, 3)) still returns (1, 2, 3), not three calls to id.

Precise types survive chained operations

The analyser can now see through tuple-broadcast operators and keep precise types. The example below used to fail because the type checker had to widen each operation's result to Any for soundness:

let a: Tuple<Int, Int> = (1, 2);
let b: Tuple<Int, Int> = (3, 4);
let c: Tuple<Int, Int> = a - b;     # was: inferred as Any
let d: Tuple<Int, Int> = c * c;     # was: could not compile

Compound assignment (+= etc.) also keeps the precise tuple type:

let a: Tuple<Int, Int> = (1, 2);
a += (3, 4);                        # was: "found Int but expected Tuple<Int, Int>"

Friendlier element-call errors

When a vec call fails on a specific element, the error includes the outer call name and the index:

("3", 2) ++ ("a", "b")              # error: while vectorising '++' at index 1: …

Behaviour changes that might surprise users

Programs that previously passed through to runtime on type-mismatched calls now widen the call's result type to Any at compile time (instead of synthesising a misleading concrete type). The call still runs through to runtime; it's the type information of the returned value that becomes less specific, which can surface as a clearer mismatched types error at the next use site.

Benchmarks

Hyperfine, release-with-debug profile, current branch vs. master (10 runs after 3-run warmup):

Script	Master	This branch	Δ	Notes
`enumerate_for_loop.ndc`	136 ms	110 ms	-20%	win from precision recovery
`hof_pipeline.ndc`	42 ms	34 ms	-20%	win from precision recovery
`enumerate_to_list.ndc`	69 ms	68 ms	-2%	noise
`fibonacci.ndc`	74 ms	77 ms	+4%	noise
`matrix_mul.ndc`	58 ms	56 ms	-4%	noise
`quicksort.ndc`	72 ms	73 ms	+1%	noise
`sieve.ndc`	106 ms	106 ms	0%	noise
AoC 2025/11 part1	8 ms	9 ms	+4%	noise
AoC 2025/08 part1 (vec-heavy)	490 ms	625 ms	+27%	vec dispatch overhead
AoC 2025/08 part2 (vec-heavy)	807 ms	919 ms	+13%	vec dispatch overhead

Where the wins come from. PR #140 widened many call-result types to Any for soundness, which forced downstream operations onto the slower dynamic-dispatch path. This PR's per-binding vec tracking lets the analyser keep precise types through chains of scalar operations, so non-tuple workloads dispatch directly to the resolved scalar overload.

Where the losses come from. The runtime vec dispatch now checks every element pair against the underlying scalar's parameter types, where the old try_vectorized_call only probed the first pair. The check is what catches mixed-element tuples like (1, "a") + (2, "b") cleanly instead of crashing mid-iteration. The cost is real on vec-dispatch-heavy hot loops (~25% on the worst case). The follow-on optimisation that the RFC describes — emitting unrolled element calls when the analyser has statically resolved the vec candidate — would recover most of this overhead but is deferred to its own PR.

How to verify

cargo test --workspace            # 296 functional tests, 11 new unit tests
cargo clippy --workspace --lib
cargo fmt --check

Quick REPL spot-checks:

echo 'print(-(1, 2, 3))'                          | cargo run --bin ndc
echo 'print(("a", "b") ++ ("c", "d"))'            | cargo run --bin ndc
echo 'let a = (1, 2); a += (3, 4); print(a)'      | cargo run --bin ndc
echo 'print((1, "a") + (2, "b"))'                 | cargo run --bin ndc  # error

Design notes

The full design write-up lives at docs/design/vectorization.md and the language manual page at manual/src/reference/types/tuple.md covers the user-facing behaviour.

🤖 Generated with Claude Code

Status flipped from Draft to Implemented. Patched the sections that drifted from what landed: - Worked-example commentary no longer claims the compiler emits unrolled element calls — that's RFC step 6, still deferred. - Type-inference table now reflects the actual mechanism (LUB across all candidates' inferred returns) instead of the separate "all scalar" vs "any vec" branches; documents the uniform LUB-collapse on vec return types and the exact-subtype precedence step. - Runtime broadening describes the new `find_overload`/ `dispatch_vec_call` path instead of the deleted try_vectorized_call. - Implementation checklist marked done per step with a ⏸ on step 6. - Open questions resolved or annotated with what shipped (LUB collapse over per-position, "no function found" over a dedicated arity error, empty tuple rejects, etc.). - References updated to point at the current code (and PR #141). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b56b7f7efd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c10338c61b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Status flipped from Draft to Implemented. Patched the sections that drifted from what landed: - Worked-example commentary no longer claims the compiler emits unrolled element calls — that's RFC step 6, still deferred. - Type-inference table now reflects the actual mechanism (LUB across all candidates' inferred returns) instead of the separate "all scalar" vs "any vec" branches; documents the uniform LUB-collapse on vec return types and the exact-subtype precedence step. - Runtime broadening describes the new `find_overload`/ `dispatch_vec_call` path instead of the deleted try_vectorized_call. - Implementation checklist marked done per step with a ⏸ on step 6. - Open questions resolved or annotated with what shipped (LUB collapse over per-position, "no function found" over a dedicated arity error, empty tuple rejects, etc.). - References updated to point at the current code (and PR #141). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9ecac3295

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ion model 📐 Vectorization — applying operators element-wise across tuples like `(1, 2) + (3, 4) → (4, 6)` — previously only fired for binary numeric operations and used a uniform "one scalar overload covers every position" model. This PR broadens the scope and refactors the dispatch to a single principle: > A vec call is a tuple of independent scalar dispatches, one per > element position. User-visible changes - Unary, n-ary, and non-numeric operators all vec now. `-(1, 2, 3)`, `("a", "b") ++ ("c", "d")`, `([1], [2]) ++ ([3], [4])`, `(1, 2) + 5` all work where they previously errored. - Heterogeneous tuples dispatch per position: `([1,2,3], "foo") ++ ([4,5,6], "bar")` evaluates element 0 via `++(List, List)` and element 1 via `++(String, String)`, yielding `([1,2,3,4,5,6], "foobar")`. - Chained operator calls keep precise types — the analyser no longer widens `Tuple<Int, Int> - Tuple<Int, Int>` to `Any`. - Compound assignment (`+=` etc.) on tuple lvalues type-checks correctly. - Mixed-element tuples with no overload at some position (`(1, "a") + (2, "b")`) fail at compile time with the failing position called out, instead of crashing mid-iteration at runtime. - Element-call failures wrap with `"while vectorising '<name>' at index N"` to preserve outer-call context. - Vectorization is gated on operator syntax — `id((1, 2, 3))` still returns the tuple verbatim and never element-wise calls `id`. Implementation - Parser: `Expression::Call` carries an `operator_form: bool` flag set by the desugaring sites for infix and prefix-unary operators. - Binding shape: every `Binding::Resolved` / `Binding::Dynamic` entry is a `Candidate { var: ResolvedVar, vectorized: bool }`. - Analyser: per-position resolution walks the scope chain once per tuple position. `Binding::Resolved(vec)` only when every position pins to the same scalar; otherwise `Binding::Dynamic` with the union of per-position candidates. Result types are computed element-by-element so heterogeneous-vec calls keep precise tuple types. - Runtime: `Callable::Vec { candidates, axis_len }` carries the full scalar list. `dispatch_vec_call` resolves a scalar per element position via the same `matches_value_args` lookup the scalar path uses, so the analyser and runtime can't drift on what counts as "applicable". - Dead code removed: `BinaryOperator::supports_vectorization`, `StaticType::supports_vectorization{,_with}`, `synthetic_vec_sig`, `candidate_is_compat`, `static_vec_axis_len`, `vec_candidate_applies`, and the old `try_vectorized_call` fallback path. Design and docs - Full RFC at `docs/design/vectorization.md`. - User-facing behaviour documented in `manual/src/reference/types/tuple.md`. Tests - 12 new functional `.ndc` programs covering unary vec, non-numeric vec, list-element vec, mixed-numeric promotion, chained-precision, compound-assignment aliasing, regular-call no-vec, exact-match precision, heterogeneous vec, per-position no-overload error. - 1 new regression program for the `Binding::Dynamic` misinference the original PR caught. - 5 new unit tests for the pure helpers. Benchmarks (release-with-debug, 10 runs) - AoC 2025/08 part1: 489ms master → 578ms branch (+18%). - AoC 2025/08 part2: 816ms master → 885ms branch (+8%). - Non-vec benches: ~20% faster on `enumerate_for_loop.ndc` and `hof_pipeline.ndc` from precision recovery on operator chains; others within ±4% noise. The vec-loop slowdown is the cost of per-element correctness checking (the old probe-first approach miscoupled mixed-element tuples). The RFC's deferred compile-time unrolling of `Resolved(vec)` is the natural follow-on perf PR. Closes #145. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecc91f453a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-24T08:11:11Z

+            let scalar = candidates
+                .iter()
+                .find(|f| f.matches_value_args(&element_args))
+                .ok_or_else(|| {


Preserve per-position overload priority in vectorized dispatch

dispatch_vec_call picks the first runtime-matching function from a single merged candidate list, so overload order from earlier tuple positions can override the best match for later positions. With heterogeneous tuples and overlapping overloads (e.g. +(Number, Number) and +(Int, Int)), a call like (1.5, 1) + (2.5, 2) can resolve element 0 to the generic overload first, then incorrectly reuse that precedence for element 1 even though the scalar resolver would prefer the Int overload there. This breaks the scalar precedence contract for per-position vectorization and can produce wrong results for user-defined operator overloads.

Useful? React with 👍 / 👎.

…s 📐 (#146) ## Summary An alternate take on #141 with the same four user-visible features but cleaner separation of concerns and matching-or-better performance on every benchmark. **Vectorization now works for anything that has a scalar overload.** Unary, n-ary, non-numeric, scalar broadcast — same as #141: ``` -(1, 2, 3) → (-1, -2, -3) ("a", "b") ++ ("c", "d") → ("ac", "bd") ([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4]) (1, 2) + 5 → (6, 7) ``` Vectorization only fires on operator syntax. `id((1, 2, 3))` still returns the tuple, never three calls to `id`. **Precise types survive chained operations.** Operator calls that the analyser can pin to one scalar overload keep their precise return type — chains of `Tuple<Int, Int>` ops no longer widen to `Any`. **More problems caught at compile time.** Per-position vec resolution errors on `(1, "a") + (2, "b")` and `(1,1,(1,)) + (1,1,(1,))` at analysis time instead of crashing mid-iteration. ## Design Three structural moves vs the existing PR: * New AST variant **`Expression::OperatorCall`** for desugared operator syntax. Distinct from `Call`, so downstream layers match exhaustively and the parser is the only crate that knows which token names are operators. No `operator_form: bool` riding `Expression::Call` across every layer. * **`Candidate::{Scalar, Vec}`** as a sum type rather than a struct with a bool. `Binding::Resolved(Candidate::Vec(scalar))` reads cleanly in pattern matches without `if c.vectorized`. * **`ScopeTree::resolve_call`** is a single walk that returns both the binding and the inferred return type. The analyser no longer runs per-position resolution twice per Dynamic operator-form call. ## Runtime dispatch * **`OpCode::CallVec(args)`** for analyser-pinned vec calls. The compiler emits the scalar function directly (no `OverloadSet` wrapper) and the VM broadcasts it across the tuple axis without overload probing. This is the missing "step 6" optimisation from #141's RFC, brought forward into the same change. * **`Object::OverloadSet { scalars, vec_candidates }`** keeps the hot scalar walk at master's footprint — a unified `Vec<Candidate>` was the source of the numerics-heavy regressions. * **`Vm::resolve_callee`** still returns `Option<Function>` (same shape as master), so the dispatch loop's `OpCode::Call` arm stays compact. Full design write-up at [`docs/design/vectorization.md`](docs/design/vectorization.md). ## Behaviour changes * Mixed-element tuples error at compile time instead of crashing mid-iteration. Existing test `003_vector_error2.ndc` updated to match the new analyser-side error message. * `BinaryOperator::supports_vectorization` and the `StaticType::supports_vectorization{,_with}` helpers deleted — vec decisions live entirely in the analyser now. ## Benchmarks Hyperfine, release-with-debug, 20+ runs per command: | Script | Master | This branch | Δ | |-------------------------|-----------|-------------|--------| | `vec_hot_loop` (new) | 58.1 ms | 42.3 ms | **−27%** | | `fibonacci` | 69.1 ms | 63.7 ms | −8% | | `hof_pipeline` | 35.7 ms | 34.6 ms | −3% | | `enumerate_for_loop` | 107.4 ms | 107.4 ms | 0% | | `sieve` | 107.7 ms | 107.3 ms | 0% | | `matrix_mul` | 57.3 ms | 56.1 ms | −2% | | `ackermann` | 127.6 ms | 124.6 ms | −2% | | `quicksort` | 73.6 ms | 76.5 ms | +4% | The vec win comes from `CallVec` skipping the per-element overload probe. No bench regresses outside noise — versus #141's reported +27% on AoC 2025/08 vec-heavy workloads. ## Test plan - [x] `cargo test --workspace` — 298 functional + 18 compiler + 64 unit tests, all green - [x] `cargo clippy --workspace --lib --tests` — zero new warnings from this change - [x] `cargo fmt --check` — clean - [x] REPL spot-checks of the four feature areas (unary vec, string `++`, vec op=, mixed-element error) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

timfennis changed the title ~~⬆️ improved vectorization~~ feat(vectorization): broaden to n-ary operator forms and recover return-type precision 📐 May 23, 2026

timfennis marked this pull request as ready for review May 23, 2026 09:41

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Comment thread ndc_analyser/src/analyser.rs Outdated

Comment thread ndc_analyser/src/analyser.rs Outdated

timfennis mentioned this pull request May 23, 2026

Make op= operators return () instead of lhs #142

Open

timfennis changed the title ~~feat(vectorization): broaden to n-ary operator forms and recover return-type precision 📐~~ feat(vectorization): broaden tuple operators and recover precise types 📐 May 23, 2026

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

Comment thread ndc_analyser/src/analyser.rs Outdated

timfennis mentioned this pull request May 24, 2026

Reassigning a for-loop variable widens its type via LUB, blocking runtime-dispatchable code #143

Open

timfennis force-pushed the feature/improved-vectorization branch from c10338c to a2998a1 Compare May 24, 2026 06:59

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

Comment thread ndc_analyser/src/analyser.rs Outdated

timfennis mentioned this pull request May 24, 2026

Vec dispatch should pick a scalar per element pair, not one for all pairs #145

Closed

timfennis force-pushed the feature/improved-vectorization branch from d9ecac3 to ecc91f4 Compare May 24, 2026 08:02

chatgpt-codex-connector Bot reviewed May 24, 2026

View reviewed changes

timfennis mentioned this pull request May 24, 2026

feat(vectorization): broaden tuple operators and recover precise types 📐 #146

Merged

4 tasks

timfennis closed this May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vectorization): broaden tuple operators and recover precise types 📐#141

feat(vectorization): broaden tuple operators and recover precise types 📐#141
timfennis wants to merge 1 commit into
masterfrom
feature/improved-vectorization

timfennis commented May 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timfennis commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR changes

Vectorization now works for anything that has a scalar overload

Precise types survive chained operations

More problems caught at compile time

Friendlier element-call errors

Behaviour changes that might surprise users

Benchmarks

How to verify

Design notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timfennis commented May 23, 2026 •

edited

Loading