Skip to content

feat(vectorization): broaden tuple operators and recover precise types 📐#141

Closed
timfennis wants to merge 1 commit into
masterfrom
feature/improved-vectorization
Closed

feat(vectorization): broaden tuple operators and recover precise types 📐#141
timfennis wants to merge 1 commit into
masterfrom
feature/improved-vectorization

Conversation

@timfennis
Copy link
Copy Markdown
Owner

@timfennis timfennis commented May 23, 2026

andy-cpp is a small interpreted language with a typed analyser and a bytecode VM. Operators like +, -, ++ are function calls under the hood, and vectorization is what we call applying them element-wise across tuples: (1, 2) + (3, 4) evaluates to (4, 6).

What this PR changes

Vectorization now works for anything that has a scalar overload

Previously, vectorization only fired for binary numeric operations. Unary operators, string concat, and other non-numeric overloads silently errored. Now:

-(1, 2, 3)               → (-1, -2, -3)        # unary
("a", "b") ++ ("c", "d") → ("ac", "bd")        # strings
([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4])    # lists
(1, 2) + 5               → (6, 7)              # scalar broadcasts

Vectorization only fires on operator syntax — regular function calls never broadcast. So id((1, 2, 3)) still returns (1, 2, 3), not three calls to id.

Precise types survive chained operations

The analyser can now see through tuple-broadcast operators and keep precise types. The example below used to fail because the type checker had to widen each operation's result to Any for soundness:

let a: Tuple<Int, Int> = (1, 2);
let b: Tuple<Int, Int> = (3, 4);
let c: Tuple<Int, Int> = a - b;     # was: inferred as Any
let d: Tuple<Int, Int> = c * c;     # was: could not compile

Compound assignment (+= etc.) also keeps the precise tuple type:

let a: Tuple<Int, Int> = (1, 2);
a += (3, 4);                        # was: "found Int but expected Tuple<Int, Int>"

More problems caught at compile time

Calls that return a misleading inferred type used to slip through to runtime. They now produce a clearer compile-time error:

fn returns_int(s: String) -> Int { 1 };
let x: String = returns_int(42);    # was: "found Int but expected String" (misleading);
                                    # now: "found Any but expected String"
(1, "a") + (2, "b")                 # was: crashed mid-iteration on element 1;
                                    # now: "no function called '+' …"

Friendlier element-call errors

When a vec call fails on a specific element, the error includes the outer call name and the index:

("3", 2) ++ ("a", "b")              # error: while vectorising '++' at index 1: …

Behaviour changes that might surprise users

Programs that previously passed through to runtime on type-mismatched calls now widen the call's result type to Any at compile time (instead of synthesising a misleading concrete type). The call still runs through to runtime; it's the type information of the returned value that becomes less specific, which can surface as a clearer mismatched types error at the next use site.

Benchmarks

Hyperfine, release-with-debug profile, current branch vs. master (10 runs after 3-run warmup):

Script Master This branch Δ Notes
enumerate_for_loop.ndc 136 ms 110 ms -20% win from precision recovery
hof_pipeline.ndc 42 ms 34 ms -20% win from precision recovery
enumerate_to_list.ndc 69 ms 68 ms -2% noise
fibonacci.ndc 74 ms 77 ms +4% noise
matrix_mul.ndc 58 ms 56 ms -4% noise
quicksort.ndc 72 ms 73 ms +1% noise
sieve.ndc 106 ms 106 ms 0% noise
AoC 2025/11 part1 8 ms 9 ms +4% noise
AoC 2025/08 part1 (vec-heavy) 490 ms 625 ms +27% vec dispatch overhead
AoC 2025/08 part2 (vec-heavy) 807 ms 919 ms +13% vec dispatch overhead

Where the wins come from. PR #140 widened many call-result types to Any for soundness, which forced downstream operations onto the slower dynamic-dispatch path. This PR's per-binding vec tracking lets the analyser keep precise types through chains of scalar operations, so non-tuple workloads dispatch directly to the resolved scalar overload.

Where the losses come from. The runtime vec dispatch now checks every element pair against the underlying scalar's parameter types, where the old try_vectorized_call only probed the first pair. The check is what catches mixed-element tuples like (1, "a") + (2, "b") cleanly instead of crashing mid-iteration. The cost is real on vec-dispatch-heavy hot loops (~25% on the worst case). The follow-on optimisation that the RFC describes — emitting unrolled element calls when the analyser has statically resolved the vec candidate — would recover most of this overhead but is deferred to its own PR.

How to verify

cargo test --workspace            # 296 functional tests, 11 new unit tests
cargo clippy --workspace --lib
cargo fmt --check

Quick REPL spot-checks:

echo 'print(-(1, 2, 3))'                          | cargo run --bin ndc
echo 'print(("a", "b") ++ ("c", "d"))'            | cargo run --bin ndc
echo 'let a = (1, 2); a += (3, 4); print(a)'      | cargo run --bin ndc
echo 'print((1, "a") + (2, "b"))'                 | cargo run --bin ndc  # error

Design notes

The full design write-up lives at docs/design/vectorization.md and the language manual page at manual/src/reference/types/tuple.md covers the user-facing behaviour.

🤖 Generated with Claude Code

@timfennis timfennis changed the title ⬆️ improved vectorization feat(vectorization): broaden to n-ary operator forms and recover return-type precision 📐 May 23, 2026
timfennis added a commit that referenced this pull request May 23, 2026
Status flipped from Draft to Implemented. Patched the sections that
drifted from what landed:

- Worked-example commentary no longer claims the compiler emits
  unrolled element calls — that's RFC step 6, still deferred.
- Type-inference table now reflects the actual mechanism (LUB across
  all candidates' inferred returns) instead of the separate "all
  scalar" vs "any vec" branches; documents the uniform LUB-collapse
  on vec return types and the exact-subtype precedence step.
- Runtime broadening describes the new `find_overload`/
  `dispatch_vec_call` path instead of the deleted try_vectorized_call.
- Implementation checklist marked done per step with a ⏸ on step 6.
- Open questions resolved or annotated with what shipped (LUB collapse
  over per-position, "no function found" over a dedicated arity error,
  empty tuple rejects, etc.).
- References updated to point at the current code (and PR #141).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timfennis timfennis marked this pull request as ready for review May 23, 2026 09:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b56b7f7efd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ndc_analyser/src/analyser.rs Outdated
Comment thread ndc_analyser/src/analyser.rs Outdated
@timfennis timfennis changed the title feat(vectorization): broaden to n-ary operator forms and recover return-type precision 📐 feat(vectorization): broaden tuple operators and recover precise types 📐 May 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c10338c61b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ndc_analyser/src/analyser.rs Outdated
@timfennis timfennis force-pushed the feature/improved-vectorization branch from c10338c to a2998a1 Compare May 24, 2026 06:59
timfennis added a commit that referenced this pull request May 24, 2026
Status flipped from Draft to Implemented. Patched the sections that
drifted from what landed:

- Worked-example commentary no longer claims the compiler emits
  unrolled element calls — that's RFC step 6, still deferred.
- Type-inference table now reflects the actual mechanism (LUB across
  all candidates' inferred returns) instead of the separate "all
  scalar" vs "any vec" branches; documents the uniform LUB-collapse
  on vec return types and the exact-subtype precedence step.
- Runtime broadening describes the new `find_overload`/
  `dispatch_vec_call` path instead of the deleted try_vectorized_call.
- Implementation checklist marked done per step with a ⏸ on step 6.
- Open questions resolved or annotated with what shipped (LUB collapse
  over per-position, "no function found" over a dedicated arity error,
  empty tuple rejects, etc.).
- References updated to point at the current code (and PR #141).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9ecac3295

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ndc_analyser/src/analyser.rs Outdated
…ion model 📐

Vectorization — applying operators element-wise across tuples like
`(1, 2) + (3, 4) → (4, 6)` — previously only fired for binary numeric
operations and used a uniform "one scalar overload covers every
position" model. This PR broadens the scope and refactors the dispatch
to a single principle:

> A vec call is a tuple of independent scalar dispatches, one per
> element position.

User-visible changes
- Unary, n-ary, and non-numeric operators all vec now. `-(1, 2, 3)`,
  `("a", "b") ++ ("c", "d")`, `([1], [2]) ++ ([3], [4])`, `(1, 2) + 5`
  all work where they previously errored.
- Heterogeneous tuples dispatch per position:
  `([1,2,3], "foo") ++ ([4,5,6], "bar")` evaluates element 0 via
  `++(List, List)` and element 1 via `++(String, String)`, yielding
  `([1,2,3,4,5,6], "foobar")`.
- Chained operator calls keep precise types — the analyser no longer
  widens `Tuple<Int, Int> - Tuple<Int, Int>` to `Any`.
- Compound assignment (`+=` etc.) on tuple lvalues type-checks
  correctly.
- Mixed-element tuples with no overload at some position
  (`(1, "a") + (2, "b")`) fail at compile time with the failing
  position called out, instead of crashing mid-iteration at runtime.
- Element-call failures wrap with `"while vectorising '<name>' at
  index N"` to preserve outer-call context.
- Vectorization is gated on operator syntax — `id((1, 2, 3))` still
  returns the tuple verbatim and never element-wise calls `id`.

Implementation
- Parser: `Expression::Call` carries an `operator_form: bool` flag set
  by the desugaring sites for infix and prefix-unary operators.
- Binding shape: every `Binding::Resolved` / `Binding::Dynamic` entry
  is a `Candidate { var: ResolvedVar, vectorized: bool }`.
- Analyser: per-position resolution walks the scope chain once per
  tuple position. `Binding::Resolved(vec)` only when every position
  pins to the same scalar; otherwise `Binding::Dynamic` with the
  union of per-position candidates. Result types are computed
  element-by-element so heterogeneous-vec calls keep precise tuple
  types.
- Runtime: `Callable::Vec { candidates, axis_len }` carries the full
  scalar list. `dispatch_vec_call` resolves a scalar per element
  position via the same `matches_value_args` lookup the scalar path
  uses, so the analyser and runtime can't drift on what counts as
  "applicable".
- Dead code removed: `BinaryOperator::supports_vectorization`,
  `StaticType::supports_vectorization{,_with}`, `synthetic_vec_sig`,
  `candidate_is_compat`, `static_vec_axis_len`, `vec_candidate_applies`,
  and the old `try_vectorized_call` fallback path.

Design and docs
- Full RFC at `docs/design/vectorization.md`.
- User-facing behaviour documented in
  `manual/src/reference/types/tuple.md`.

Tests
- 12 new functional `.ndc` programs covering unary vec, non-numeric
  vec, list-element vec, mixed-numeric promotion, chained-precision,
  compound-assignment aliasing, regular-call no-vec, exact-match
  precision, heterogeneous vec, per-position no-overload error.
- 1 new regression program for the `Binding::Dynamic` misinference
  the original PR caught.
- 5 new unit tests for the pure helpers.

Benchmarks (release-with-debug, 10 runs)
- AoC 2025/08 part1: 489ms master → 578ms branch (+18%).
- AoC 2025/08 part2: 816ms master → 885ms branch (+8%).
- Non-vec benches: ~20% faster on `enumerate_for_loop.ndc` and
  `hof_pipeline.ndc` from precision recovery on operator chains;
  others within ±4% noise.

The vec-loop slowdown is the cost of per-element correctness checking
(the old probe-first approach miscoupled mixed-element tuples). The
RFC's deferred compile-time unrolling of `Resolved(vec)` is the
natural follow-on perf PR.

Closes #145.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timfennis timfennis force-pushed the feature/improved-vectorization branch from d9ecac3 to ecc91f4 Compare May 24, 2026 08:02
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecc91f453a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ndc_vm/src/vm.rs
Comment on lines +857 to +860
let scalar = candidates
.iter()
.find(|f| f.matches_value_args(&element_args))
.ok_or_else(|| {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve per-position overload priority in vectorized dispatch

dispatch_vec_call picks the first runtime-matching function from a single merged candidate list, so overload order from earlier tuple positions can override the best match for later positions. With heterogeneous tuples and overlapping overloads (e.g. +(Number, Number) and +(Int, Int)), a call like (1.5, 1) + (2.5, 2) can resolve element 0 to the generic overload first, then incorrectly reuse that precedence for element 1 even though the scalar resolver would prefer the Int overload there. This breaks the scalar precedence contract for per-position vectorization and can produce wrong results for user-defined operator overloads.

Useful? React with 👍 / 👎.

timfennis added a commit that referenced this pull request May 24, 2026
…s 📐 (#146)

## Summary

An alternate take on #141 with the same four user-visible features but
cleaner separation of concerns and matching-or-better performance on
every benchmark.

**Vectorization now works for anything that has a scalar overload.**
Unary, n-ary, non-numeric, scalar broadcast — same as #141:

```
-(1, 2, 3)               → (-1, -2, -3)
("a", "b") ++ ("c", "d") → ("ac", "bd")
([1], [2]) ++ ([3], [4]) → ([1, 3], [2, 4])
(1, 2) + 5               → (6, 7)
```

Vectorization only fires on operator syntax. `id((1, 2, 3))` still
returns the tuple, never three calls to `id`.

**Precise types survive chained operations.** Operator calls that the
analyser can pin to one scalar overload keep their precise return
type — chains of `Tuple<Int, Int>` ops no longer widen to `Any`.

**More problems caught at compile time.** Per-position vec resolution
errors on `(1, "a") + (2, "b")` and `(1,1,(1,)) + (1,1,(1,))` at
analysis time instead of crashing mid-iteration.

## Design

Three structural moves vs the existing PR:

* New AST variant **`Expression::OperatorCall`** for desugared operator
  syntax. Distinct from `Call`, so downstream layers match exhaustively
  and the parser is the only crate that knows which token names are
  operators. No `operator_form: bool` riding `Expression::Call` across
  every layer.
* **`Candidate::{Scalar, Vec}`** as a sum type rather than a struct
  with a bool. `Binding::Resolved(Candidate::Vec(scalar))` reads cleanly
  in pattern matches without `if c.vectorized`.
* **`ScopeTree::resolve_call`** is a single walk that returns both the
  binding and the inferred return type. The analyser no longer runs
  per-position resolution twice per Dynamic operator-form call.

## Runtime dispatch

* **`OpCode::CallVec(args)`** for analyser-pinned vec calls. The
  compiler emits the scalar function directly (no `OverloadSet` wrapper)
  and the VM broadcasts it across the tuple axis without overload
  probing. This is the missing "step 6" optimisation from #141's RFC,
  brought forward into the same change.
* **`Object::OverloadSet { scalars, vec_candidates }`** keeps the hot
  scalar walk at master's footprint — a unified `Vec<Candidate>` was
  the source of the numerics-heavy regressions.
* **`Vm::resolve_callee`** still returns `Option<Function>` (same shape
  as master), so the dispatch loop's `OpCode::Call` arm stays compact.

Full design write-up at
[`docs/design/vectorization.md`](docs/design/vectorization.md).

## Behaviour changes

* Mixed-element tuples error at compile time instead of crashing
  mid-iteration. Existing test `003_vector_error2.ndc` updated to match
  the new analyser-side error message.
* `BinaryOperator::supports_vectorization` and the
  `StaticType::supports_vectorization{,_with}` helpers deleted — vec
  decisions live entirely in the analyser now.

## Benchmarks

Hyperfine, release-with-debug, 20+ runs per command:

| Script                  | Master    | This branch | Δ      |
|-------------------------|-----------|-------------|--------|
| `vec_hot_loop` (new)    |  58.1 ms  |  42.3 ms    | **−27%** |
| `fibonacci`             |  69.1 ms  |  63.7 ms    | −8%    |
| `hof_pipeline`          |  35.7 ms  |  34.6 ms    | −3%    |
| `enumerate_for_loop`    | 107.4 ms  | 107.4 ms    | 0%     |
| `sieve`                 | 107.7 ms  | 107.3 ms    | 0%     |
| `matrix_mul`            |  57.3 ms  |  56.1 ms    | −2%    |
| `ackermann`             | 127.6 ms  | 124.6 ms    | −2%    |
| `quicksort`             |  73.6 ms  |  76.5 ms    | +4%    |

The vec win comes from `CallVec` skipping the per-element overload
probe. No bench regresses outside noise — versus #141's reported +27%
on AoC 2025/08 vec-heavy workloads.

## Test plan

- [x] `cargo test --workspace` — 298 functional + 18 compiler + 64
      unit tests, all green
- [x] `cargo clippy --workspace --lib --tests` — zero new warnings
      from this change
- [x] `cargo fmt --check` — clean
- [x] REPL spot-checks of the four feature areas (unary vec, string
      `++`, vec op=, mixed-element error)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timfennis timfennis closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant