Skip to content

Perf/fri merkle folding#611

Draft
diegokingston wants to merge 4 commits into
mainfrom
perf/fri-merkle-folding
Draft

Perf/fri merkle folding#611
diegokingston wants to merge 4 commits into
mainfrom
perf/fri-merkle-folding

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

Codex Code Review

Findings

  • Low Security / Bug: proof.rs lets cap_height exceed depth when cap.len() > num_leaves, then depth - cap_height underflows at line 179. A malformed external cap can make verify_capped panic in debug builds, or behave incorrectly in release. Add an explicit guard after computing both values:
    if cap_height > depth {
        return false;
    }

No other real issues found in the PR diff.

I attempted to run cargo test -p crypto merkle_cap_tests --quiet, but the environment could not create rustup temp files under /home/runner/.rustup because it is read-only.

let cap_height = cap.len().trailing_zeros() as usize;
// Process level by level, from the leaves up to the cap level, same as
// `get_batch_auth_path_positions`.
for _ in 0..depth - cap_height {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — DoS via usize underflow

depth - cap_height is an unchecked usize subtraction. If a caller passes a cap with more nodes than the tree has leaves (e.g. cap.len() = 2 * num_leaves), then cap_height > depth, which:

  • debug builds: panics immediately.
  • release builds: wraps to usize::MAX, causing the for loop to iterate ~2⁶⁴ times (effectively an infinite hang).

The is_power_of_two guard a few lines up doesn't prevent this — it only ensures the cap is a power of two, not that it's ≤ num_leaves.

The verifier is meant to be called with untrusted inputs, so this needs a bounds check before the arithmetic:

Suggested change
for _ in 0..depth - cap_height {
let depth = num_leaves.trailing_zeros() as usize;
let cap_height = cap.len().trailing_zeros() as usize;
if cap_height > depth {
return false;
}
// Process level by level, from the leaves up to the cap level, same as
// `get_batch_auth_path_positions`.
for _ in 0..depth - cap_height {

A Merkle cap commits the top 2^cap_height tree nodes instead of a single
root, so every authentication path stops cap_height levels early — smaller
proofs and cap_height fewer hash compressions per opening.

The cap is just a slice of the existing heap-ordered node array (the nodes
at tree depth cap_height), so no new tree storage is needed. Adds
`cap`, `get_proof_by_pos_capped`, `get_batch_proof_capped`, and
`Proof`/`BatchProof::verify_capped`; the existing root-based methods become
cap_height=0 wrappers and are bit-for-bit unchanged in behavior. cap_height
clamps to the tree depth, so small trees degrade gracefully to a single root.

Stage 1 of the caps + arity-4 FRI plan; purely additive, no STARK-layer
changes yet.
`FieldElementQuadBackend` hashes a fixed group of four field elements into
one leaf. With arity-4 FRI folding a single fold orbit is four evaluations
that are always opened together, so they belong under one leaf hash — this
is the FRI-layer leaf for the arity-4 commit phase.

Mirrors the existing `FieldElementPairBackend`; exposed as
`QuadKeccak256Backend`. Building block for stage 2b (arity-4 FRI).
Commit the top 2^c Merkle nodes as a cap instead of a single root, so
every opening path is c hashes shorter, and fold FRI by 4 per committed
layer (two binary folds) to halve the number of FRI trees and paths.

- StarkProof / prover / verifier carry MerkleCap commitments; preprocessed
  trees stay uncapped to preserve the AIR-hardcoded constants.
- FRI layers commit quad-leaf trees (one leaf per arity-4 fold orbit);
  commit_phase does one uncommitted initial fold then number_layers/2
  arity-4 layers, folding to a constant last value.
- Verifier replays two challenges per committed layer and folds each
  4-element orbit, with the (-1)^(index&1) twiddle parity handled.

The CUDA pair-leaf FRI tree builder is now stale w.r.t. arity-4; its
CPU-parity test is disabled with a TODO until the CUDA builder is updated.
- Guard against `cap_height > depth` underflow in `BatchProof::verify_capped`
  (Codex finding: a malformed proof with `cap.len() > num_leaves` would
  underflow `depth - cap_height` and panic in debug / wrap in release).
- Validate FRI cap sizes in `verify_query_and_sym_openings`: each cap must
  be non-empty, a power of two, and at most `2^MERKLE_CAP_HEIGHT`. Prevents
  both the same underflow class and DoS from absurdly large caps multiplied
  per-query.
- Fix doc comment on `commit_phase_from_evaluations`: the "extra binary
  fold for even number_layers" wording was misleading. The algorithm is
  correct for both parities — for odd number_layers the final 2-element
  slice is naturally constant because the polynomial is degree 0 by then.
- Drop dangling `transcript.append_bytes(&proof.lde_trace_main_merkle_root)`
  in `replay_transcript_phase_a`: leftover from main's pre-cap version
  surfaced during the rebase; the field no longer exists on `StarkProof`.

`cargo test -p stark -p crypto` (125 + 59) and `cargo test -p lambda-vm-prover --lib`
(278 pass / 77 pre-existing failures) both stay green. `make lint` clean.
@diegokingston diegokingston force-pushed the perf/fri-merkle-folding branch from eb50cc2 to bb99e2d Compare May 26, 2026 17:55
diegokingston added a commit that referenced this pull request May 26, 2026
…/proof.rs untouched

The earlier dead-code sweep removed `Deserializable` + `DeserializationError`
from `math` because its only out-of-crate user was the `impl Deserializable`
in `crypto/crypto/src/merkle_tree/proof.rs`, so the PR also dropped those
impls. PR #611 (arity-4 FRI + Merkle caps) explicitly asks this PR to leave
`proof.rs` alone, so revert that pair of deletions:

- Restore `proof.rs` to main's contents (diff vs main: empty).
- Restore `DeserializationError` enum and `From<ByteConversionError>` impl
  in `math/src/errors.rs`.
- Restore `Deserializable` trait + the `DeserializationError` import in
  `math/src/traits.rs`.

The `Serializable` half of the impl block in `proof.rs` is gated on
`crypto/crypto`'s `alloc` feature, which no consumer in this repo
enables — that code path was already non-compiling and is unchanged here.

Tests: `cargo test -p math -p crypto` → 46 + 198 green. `make lint` clean.
github-merge-queue Bot pushed a commit that referenced this pull request May 26, 2026
* refactor(math): remove dead Deserializable trait and unreachable error variants

`Deserializable` had a single impl (`Proof<T>`) whose only `deserialize`
call site was inside that impl itself — a recursion with no external
entry point. Production deserialization uses `bincode`/serde everywhere.
Delete the trait and the `Proof<T>` impl.

`merkle_tree/proof.rs` also carried a dead `Serializable` impl behind
`#[cfg(feature = "alloc")]` — a feature the `crypto` crate does not even
define, so the block never compiled. It referenced `math::traits::Serializable`,
which does not exist. Remove it.

Error cleanup (none constructed anywhere in the workspace):
- `DeserializationError` enum + `From<ByteConversionError>` impl — only
  reachable through the now-deleted `Deserializable`.
- `PairingError` enum — lambda_vm has no pairings.
- `ByteConversionError::{InvalidValue, PointNotInSubgroup, ValueNotCompressed}`
  — the latter two are elliptic-curve concepts; there is no EC here.
- `CreationError::InvalidDecString`.

Pure dead-code removal; `cargo build`/`cargo test -p math` unchanged
(216 tests pass).

* refactor(math): remove vestigial helpers module

`helpers::next_power_of_two(u64)` was bit-for-bit equivalent to stdlib
`u64::next_power_of_two()`. Its only caller, `resize_to_next_power_of_two`,
was used solely by the `fibonacci_rap` example — production trace sizing
already calls stdlib `.next_power_of_two()` directly everywhere.

Move `resize_to_next_power_of_two` into `examples/fibonacci_rap.rs` as a
local helper (rewritten on `usize` with the stdlib call, dropping the old
`try_into().unwrap()` TODO), and delete the `helpers` module.

* refactor(math): gate evaluate_fft_bit_reversed behind cfg(test)

`evaluate_fft_bit_reversed` was a non-gated `pub fn` but is only called
from the FFT property tests — no production caller in stark or prover.
Gate it `#[cfg(test)]` so it stops appearing in the production API
surface. Restore the gate (or remove it) when a real caller appears.

* refactor(math): gate test-only Polynomial::{truncate, reverse}

Both methods are called only from `fast_division` / `invert_polynomial_mod`,
which are themselves already `#[cfg(test)]`. No production caller in stark
or prover. Gate them `#[cfg(test)]` so they leave the production API surface.

`long_division_with_remainder` is left public — it is used by the
debug-checks path (`check_boundary_polys_divisibility`) and is a
legitimate general polynomial operation other code could want.

* refactor(math): flatten fft/cpu/* up to fft/*

The `fft/cpu/` directory implied a `fft/gpu/` sibling that never existed —
GPU code lives in the separate `crypto/math-cuda` crate. The `cpu` nesting
was a vestigial holdover.

Move `bit_reversing`, `bowers_fft`, `roots_of_unity`, `fft`, and
`bowers_fft_tests` from `fft/cpu/` up to `fft/`; delete `fft/cpu/mod.rs`
and merge its declarations into `fft/mod.rs`. Update every
`fft::cpu::...` / `super::cpu::...` path across math, stark, prover, and
math-cuda to `fft::...`.

Pure module move — no logic change. 216 math tests pass; stark, prover,
math-cuda all build.

* refactor(math): collapse polynomial/ folder into one polynomial.rs

Two problems compounded:
1. `polynomial/` was a folder holding a single file (`mod.rs`) — the
   directory level served no purpose.
2. Two files were named `polynomial` (`polynomial/mod.rs` and
   `fft/polynomial.rs`), both defining `impl Polynomial<...>` blocks.

Merge `fft/polynomial.rs` (the FFT-specific `impl Polynomial` blocks plus
the `evaluate_fft_cpu` / `interpolate_fft_cpu` free functions) into
`polynomial/mod.rs`, then rename it to a top-level `crypto/math/src/polynomial.rs`
and delete the folder.

Result: one `polynomial.rs` holding the `Polynomial` type and all of its
methods (core ring ops + FFT). `fft/` is now purely the FFT *algorithm*
layer (Bowers, twiddles, bit-reversal, roots of unity).

Also delete the stale `polynomial/README.md` — inherited lambdaworks
docs referencing multilinear-polynomial files that don't exist in
lambda_vm.

Updated the one external path (`fft::polynomial::compose_fft` ->
`polynomial::compose_fft`). 216 math + 124 stark tests pass.

* refactor(math): consolidate scattered tests into tests/ directory

Three test conventions coexisted: the `tests/` directory, a sibling
`bowers_fft_tests.rs` next to source, and an inline `#[cfg(test)] mod`
in the polynomial module. Unify on the `tests/` directory.

- Move `fft/bowers_fft_tests.rs` -> `tests/bowers_fft_tests.rs`, register
  it in `tests/mod.rs`, drop the `mod bowers_fft_tests;` from `fft/mod.rs`.
- Move the inline coset-LDE `mod tests` out of `polynomial.rs` into a new
  `coset_lde_tests` module in `tests/fft_tests.rs`.
- Rename `fft/fft.rs` -> `fft/reference_fft.rs`: after the Step 6 flatten
  it collided with its own parent module name (`fft::fft`), which clippy
  rejects. The new name also says what it is — the reference radix-2 FFT
  used only to cross-check the production Bowers FFT. Gate the module
  `#[cfg(test)] pub(crate)` since its whole content is test-only.

216 math tests pass (same count — no tests lost in the move).

* refactor(math): drop redundant FFT cross-check code, relocate inline tests

The reference radix-2 FFT existed only to cross-check the production Bowers
FFT. Bowers is already verified directly against a naive O(n^2) DFT at the
same sizes, so reference_fft.rs and its four comparison tests are dead
weight. gate get_powers_of_primitive_root as test-only (production twiddles
go through LayerTwiddles), collapse the evaluate_fft_cpu / interpolate_fft_cpu
wrappers into their sole callers, and narrow coset_lde_full_into to
pub(crate).

Move the inline test modules from bit_reversing.rs, goldilocks.rs and
extensions_goldilocks.rs into the dedicated tests/ directory.

* refactor(math): remove dead polynomial-division cluster, fix zero-offset panic

Addresses PR review feedback.

Reviewers flagged that `#[cfg(test)]`-gating previously-public methods is a
breaking change. Rather than keep them as `pub(crate)` dead code, the cluster
is removed outright — none of it has a production caller:

- Delete `Polynomial::{fast_division, invert_polynomial_mod,
  fast_fft_multiplication, truncate, reverse}` and `evaluate_fft_bit_reversed`,
  plus their 8 tests. The fast polynomial-division algorithm had no production
  use; it existed only to be tested.
- `get_powers_of_primitive_root` stays `#[cfg(test)]` — it is a genuine test
  helper used to validate the production FFT, not dead code.

Also fixes the pre-existing panic reviewers noted in `interpolate_offset_fft`:
`offset.inv().unwrap()` panicked on a zero coset offset. Add
`FFTError::InvalidCosetOffset` and propagate it instead.

* refactor(math): remove dead Polynomial APIs and self-referential tests

Continues the dead-code audit of polynomial.rs. Every item removed here
has no production caller — each is a function reachable only from its own
tests, or a test-file algorithm that tests itself.

Removed from polynomial.rs:
- to_extension (zero callers anywhere)
- long_division_with_remainder, leading_coefficient
- ruffini_division_inplace
- interpolate_coset_eval_with_g_n_inv (base-field variant)

Removed the matching self-referential tests/helpers from
polynomial_tests.rs (division_works, division_by_zero_degree_polynomial_works,
test_xgcd, ruffini_inplace_* and the div_with_ref / xgcd / ruffini_division
helpers).

The 3 barycentric tests that exercised the base-field
interpolate_coset_eval_with_g_n_inv now call the production-used
interpolate_coset_eval_ext_with_g_n_inv (instantiated E=F), so they keep
covering the real code path.

* refactor(math): keep Deserializable in math; leave crypto/merkle_tree/proof.rs untouched

The earlier dead-code sweep removed `Deserializable` + `DeserializationError`
from `math` because its only out-of-crate user was the `impl Deserializable`
in `crypto/crypto/src/merkle_tree/proof.rs`, so the PR also dropped those
impls. PR #611 (arity-4 FRI + Merkle caps) explicitly asks this PR to leave
`proof.rs` alone, so revert that pair of deletions:

- Restore `proof.rs` to main's contents (diff vs main: empty).
- Restore `DeserializationError` enum and `From<ByteConversionError>` impl
  in `math/src/errors.rs`.
- Restore `Deserializable` trait + the `DeserializationError` import in
  `math/src/traits.rs`.

The `Serializable` half of the impl block in `proof.rs` is gated on
`crypto/crypto`'s `alloc` feature, which no consumer in this repo
enables — that code path was already non-compiling and is unchanged here.

Tests: `cargo test -p math -p crypto` → 46 + 198 green. `make lint` clean.

* refactor(math): drop all 14 Polynomial operator impls + naive Lagrange (#618)

After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form)
landed on main, no production code in the prover uses `Polynomial`
operators anymore. The only remaining user was `Polynomial::interpolate`
(test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a
naive reference by the FFT cross-check tests). With both production
references gone, the operators + the naive interpolant can go too.

Removed:
- All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement);
  the previous PR kept these four families because `transition.rs` and
  `boundary.rs` used them. They are now fully unreachable.
- `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`.
- Operator unit tests in `polynomial_tests.rs` (adding_, negating_,
  multiply_*) and their helpers (`polynomial_a`, `polynomial_b`,
  `polynomial_minus_a`, `polynomial_a_plus_b`).
- `compose` test helper + `composition_works` test (depended on the
  removed `interpolate`).
- Direct `interpolate_x_*_y_*` tests (tested the removed function).

Rewrote:
- FFT correctness tests (both `fft_tests.rs` and
  `fft_friendly_u64_goldilocks_tests.rs`): the
  `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked
  `interpolate_fft` against naive Lagrange. Replaced with FFT
  round-trip via direct Horner evaluation — interpolate via FFT, then
  re-evaluate at every twiddle with `Polynomial::evaluate` (a different
  algorithm). Same independence property, no naive Lagrange needed.
- `simple_interpolating_polynomial_by_hand_works`: swapped `*` for
  `mul_with_ref(&...)` (the only remaining call site that needed it).

Verified:
- `cargo build --workspace` clean.
- `cargo test -p math` -> 178 passing.
- `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing
  failures unrelated to constraints (same baseline as main).
- `make lint` clean across all three clippy configs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant