Perf/fri merkle folding by diegokingston · Pull Request #611 · yetanotherco/lambda_vm

diegokingston · 2026-05-22T16:25:22Z

No description provided.

github-actions · 2026-05-22T16:27:34Z

Codex Code Review

Findings

Low Security / Bug: proof.rs lets cap_height exceed depth when cap.len() > num_leaves, then depth - cap_height underflows at line 179. A malformed external cap can make verify_capped panic in debug builds, or behave incorrectly in release. Add an explicit guard after computing both values:
```
if cap_height > depth {
    return false;
}
```

No other real issues found in the PR diff.

I attempted to run cargo test -p crypto merkle_cap_tests --quiet, but the environment could not create rustup temp files under /home/runner/.rustup because it is read-only.

claude · 2026-05-22T16:30:20Z

+        let cap_height = cap.len().trailing_zeros() as usize;
+        // Process level by level, from the leaves up to the cap level, same as
+        // `get_batch_auth_path_positions`.
+        for _ in 0..depth - cap_height {


Medium — DoS via usize underflow

depth - cap_height is an unchecked usize subtraction. If a caller passes a cap with more nodes than the tree has leaves (e.g. cap.len() = 2 * num_leaves), then cap_height > depth, which:

debug builds: panics immediately.

release builds: wraps to usize::MAX, causing the for loop to iterate ~2⁶⁴ times (effectively an infinite hang).

The is_power_of_two guard a few lines up doesn't prevent this — it only ensures the cap is a power of two, not that it's ≤ num_leaves.

The verifier is meant to be called with untrusted inputs, so this needs a bounds check before the arithmetic:

Suggested change

for _ in 0..depth - cap_height {

let depth = num_leaves.trailing_zeros() as usize;

let cap_height = cap.len().trailing_zeros() as usize;

if cap_height > depth {

return false;

}

// Process level by level, from the leaves up to the cap level, same as

// `get_batch_auth_path_positions`.

for _ in 0..depth - cap_height {

A Merkle cap commits the top 2^cap_height tree nodes instead of a single root, so every authentication path stops cap_height levels early — smaller proofs and cap_height fewer hash compressions per opening. The cap is just a slice of the existing heap-ordered node array (the nodes at tree depth cap_height), so no new tree storage is needed. Adds `cap`, `get_proof_by_pos_capped`, `get_batch_proof_capped`, and `Proof`/`BatchProof::verify_capped`; the existing root-based methods become cap_height=0 wrappers and are bit-for-bit unchanged in behavior. cap_height clamps to the tree depth, so small trees degrade gracefully to a single root. Stage 1 of the caps + arity-4 FRI plan; purely additive, no STARK-layer changes yet.

`FieldElementQuadBackend` hashes a fixed group of four field elements into one leaf. With arity-4 FRI folding a single fold orbit is four evaluations that are always opened together, so they belong under one leaf hash — this is the FRI-layer leaf for the arity-4 commit phase. Mirrors the existing `FieldElementPairBackend`; exposed as `QuadKeccak256Backend`. Building block for stage 2b (arity-4 FRI).

Commit the top 2^c Merkle nodes as a cap instead of a single root, so every opening path is c hashes shorter, and fold FRI by 4 per committed layer (two binary folds) to halve the number of FRI trees and paths. - StarkProof / prover / verifier carry MerkleCap commitments; preprocessed trees stay uncapped to preserve the AIR-hardcoded constants. - FRI layers commit quad-leaf trees (one leaf per arity-4 fold orbit); commit_phase does one uncommitted initial fold then number_layers/2 arity-4 layers, folding to a constant last value. - Verifier replays two challenges per committed layer and folds each 4-element orbit, with the (-1)^(index&1) twiddle parity handled. The CUDA pair-leaf FRI tree builder is now stale w.r.t. arity-4; its CPU-parity test is disabled with a TODO until the CUDA builder is updated.

- Guard against `cap_height > depth` underflow in `BatchProof::verify_capped` (Codex finding: a malformed proof with `cap.len() > num_leaves` would underflow `depth - cap_height` and panic in debug / wrap in release). - Validate FRI cap sizes in `verify_query_and_sym_openings`: each cap must be non-empty, a power of two, and at most `2^MERKLE_CAP_HEIGHT`. Prevents both the same underflow class and DoS from absurdly large caps multiplied per-query. - Fix doc comment on `commit_phase_from_evaluations`: the "extra binary fold for even number_layers" wording was misleading. The algorithm is correct for both parities — for odd number_layers the final 2-element slice is naturally constant because the polynomial is degree 0 by then. - Drop dangling `transcript.append_bytes(&proof.lde_trace_main_merkle_root)` in `replay_transcript_phase_a`: leftover from main's pre-cap version surfaced during the rebase; the field no longer exists on `StarkProof`. `cargo test -p stark -p crypto` (125 + 59) and `cargo test -p lambda-vm-prover --lib` (278 pass / 77 pre-existing failures) both stay green. `make lint` clean.

…/proof.rs untouched The earlier dead-code sweep removed `Deserializable` + `DeserializationError` from `math` because its only out-of-crate user was the `impl Deserializable` in `crypto/crypto/src/merkle_tree/proof.rs`, so the PR also dropped those impls. PR #611 (arity-4 FRI + Merkle caps) explicitly asks this PR to leave `proof.rs` alone, so revert that pair of deletions: - Restore `proof.rs` to main's contents (diff vs main: empty). - Restore `DeserializationError` enum and `From<ByteConversionError>` impl in `math/src/errors.rs`. - Restore `Deserializable` trait + the `DeserializationError` import in `math/src/traits.rs`. The `Serializable` half of the impl block in `proof.rs` is gated on `crypto/crypto`'s `alloc` feature, which no consumer in this repo enables — that code path was already non-compiling and is unchanged here. Tests: `cargo test -p math -p crypto` → 46 + 198 green. `make lint` clean.

* refactor(math): remove dead Deserializable trait and unreachable error variants `Deserializable` had a single impl (`Proof<T>`) whose only `deserialize` call site was inside that impl itself — a recursion with no external entry point. Production deserialization uses `bincode`/serde everywhere. Delete the trait and the `Proof<T>` impl. `merkle_tree/proof.rs` also carried a dead `Serializable` impl behind `#[cfg(feature = "alloc")]` — a feature the `crypto` crate does not even define, so the block never compiled. It referenced `math::traits::Serializable`, which does not exist. Remove it. Error cleanup (none constructed anywhere in the workspace): - `DeserializationError` enum + `From<ByteConversionError>` impl — only reachable through the now-deleted `Deserializable`. - `PairingError` enum — lambda_vm has no pairings. - `ByteConversionError::{InvalidValue, PointNotInSubgroup, ValueNotCompressed}` — the latter two are elliptic-curve concepts; there is no EC here. - `CreationError::InvalidDecString`. Pure dead-code removal; `cargo build`/`cargo test -p math` unchanged (216 tests pass). * refactor(math): remove vestigial helpers module `helpers::next_power_of_two(u64)` was bit-for-bit equivalent to stdlib `u64::next_power_of_two()`. Its only caller, `resize_to_next_power_of_two`, was used solely by the `fibonacci_rap` example — production trace sizing already calls stdlib `.next_power_of_two()` directly everywhere. Move `resize_to_next_power_of_two` into `examples/fibonacci_rap.rs` as a local helper (rewritten on `usize` with the stdlib call, dropping the old `try_into().unwrap()` TODO), and delete the `helpers` module. * refactor(math): gate evaluate_fft_bit_reversed behind cfg(test) `evaluate_fft_bit_reversed` was a non-gated `pub fn` but is only called from the FFT property tests — no production caller in stark or prover. Gate it `#[cfg(test)]` so it stops appearing in the production API surface. Restore the gate (or remove it) when a real caller appears. * refactor(math): gate test-only Polynomial::{truncate, reverse} Both methods are called only from `fast_division` / `invert_polynomial_mod`, which are themselves already `#[cfg(test)]`. No production caller in stark or prover. Gate them `#[cfg(test)]` so they leave the production API surface. `long_division_with_remainder` is left public — it is used by the debug-checks path (`check_boundary_polys_divisibility`) and is a legitimate general polynomial operation other code could want. * refactor(math): flatten fft/cpu/* up to fft/* The `fft/cpu/` directory implied a `fft/gpu/` sibling that never existed — GPU code lives in the separate `crypto/math-cuda` crate. The `cpu` nesting was a vestigial holdover. Move `bit_reversing`, `bowers_fft`, `roots_of_unity`, `fft`, and `bowers_fft_tests` from `fft/cpu/` up to `fft/`; delete `fft/cpu/mod.rs` and merge its declarations into `fft/mod.rs`. Update every `fft::cpu::...` / `super::cpu::...` path across math, stark, prover, and math-cuda to `fft::...`. Pure module move — no logic change. 216 math tests pass; stark, prover, math-cuda all build. * refactor(math): collapse polynomial/ folder into one polynomial.rs Two problems compounded: 1. `polynomial/` was a folder holding a single file (`mod.rs`) — the directory level served no purpose. 2. Two files were named `polynomial` (`polynomial/mod.rs` and `fft/polynomial.rs`), both defining `impl Polynomial<...>` blocks. Merge `fft/polynomial.rs` (the FFT-specific `impl Polynomial` blocks plus the `evaluate_fft_cpu` / `interpolate_fft_cpu` free functions) into `polynomial/mod.rs`, then rename it to a top-level `crypto/math/src/polynomial.rs` and delete the folder. Result: one `polynomial.rs` holding the `Polynomial` type and all of its methods (core ring ops + FFT). `fft/` is now purely the FFT *algorithm* layer (Bowers, twiddles, bit-reversal, roots of unity). Also delete the stale `polynomial/README.md` — inherited lambdaworks docs referencing multilinear-polynomial files that don't exist in lambda_vm. Updated the one external path (`fft::polynomial::compose_fft` -> `polynomial::compose_fft`). 216 math + 124 stark tests pass. * refactor(math): consolidate scattered tests into tests/ directory Three test conventions coexisted: the `tests/` directory, a sibling `bowers_fft_tests.rs` next to source, and an inline `#[cfg(test)] mod` in the polynomial module. Unify on the `tests/` directory. - Move `fft/bowers_fft_tests.rs` -> `tests/bowers_fft_tests.rs`, register it in `tests/mod.rs`, drop the `mod bowers_fft_tests;` from `fft/mod.rs`. - Move the inline coset-LDE `mod tests` out of `polynomial.rs` into a new `coset_lde_tests` module in `tests/fft_tests.rs`. - Rename `fft/fft.rs` -> `fft/reference_fft.rs`: after the Step 6 flatten it collided with its own parent module name (`fft::fft`), which clippy rejects. The new name also says what it is — the reference radix-2 FFT used only to cross-check the production Bowers FFT. Gate the module `#[cfg(test)] pub(crate)` since its whole content is test-only. 216 math tests pass (same count — no tests lost in the move). * refactor(math): drop redundant FFT cross-check code, relocate inline tests The reference radix-2 FFT existed only to cross-check the production Bowers FFT. Bowers is already verified directly against a naive O(n^2) DFT at the same sizes, so reference_fft.rs and its four comparison tests are dead weight. gate get_powers_of_primitive_root as test-only (production twiddles go through LayerTwiddles), collapse the evaluate_fft_cpu / interpolate_fft_cpu wrappers into their sole callers, and narrow coset_lde_full_into to pub(crate). Move the inline test modules from bit_reversing.rs, goldilocks.rs and extensions_goldilocks.rs into the dedicated tests/ directory. * refactor(math): remove dead polynomial-division cluster, fix zero-offset panic Addresses PR review feedback. Reviewers flagged that `#[cfg(test)]`-gating previously-public methods is a breaking change. Rather than keep them as `pub(crate)` dead code, the cluster is removed outright — none of it has a production caller: - Delete `Polynomial::{fast_division, invert_polynomial_mod, fast_fft_multiplication, truncate, reverse}` and `evaluate_fft_bit_reversed`, plus their 8 tests. The fast polynomial-division algorithm had no production use; it existed only to be tested. - `get_powers_of_primitive_root` stays `#[cfg(test)]` — it is a genuine test helper used to validate the production FFT, not dead code. Also fixes the pre-existing panic reviewers noted in `interpolate_offset_fft`: `offset.inv().unwrap()` panicked on a zero coset offset. Add `FFTError::InvalidCosetOffset` and propagate it instead. * refactor(math): remove dead Polynomial APIs and self-referential tests Continues the dead-code audit of polynomial.rs. Every item removed here has no production caller — each is a function reachable only from its own tests, or a test-file algorithm that tests itself. Removed from polynomial.rs: - to_extension (zero callers anywhere) - long_division_with_remainder, leading_coefficient - ruffini_division_inplace - interpolate_coset_eval_with_g_n_inv (base-field variant) Removed the matching self-referential tests/helpers from polynomial_tests.rs (division_works, division_by_zero_degree_polynomial_works, test_xgcd, ruffini_inplace_* and the div_with_ref / xgcd / ruffini_division helpers). The 3 barycentric tests that exercised the base-field interpolate_coset_eval_with_g_n_inv now call the production-used interpolate_coset_eval_ext_with_g_n_inv (instantiated E=F), so they keep covering the real code path. * refactor(math): keep Deserializable in math; leave crypto/merkle_tree/proof.rs untouched The earlier dead-code sweep removed `Deserializable` + `DeserializationError` from `math` because its only out-of-crate user was the `impl Deserializable` in `crypto/crypto/src/merkle_tree/proof.rs`, so the PR also dropped those impls. PR #611 (arity-4 FRI + Merkle caps) explicitly asks this PR to leave `proof.rs` alone, so revert that pair of deletions: - Restore `proof.rs` to main's contents (diff vs main: empty). - Restore `DeserializationError` enum and `From<ByteConversionError>` impl in `math/src/errors.rs`. - Restore `Deserializable` trait + the `DeserializationError` import in `math/src/traits.rs`. The `Serializable` half of the impl block in `proof.rs` is gated on `crypto/crypto`'s `alloc` feature, which no consumer in this repo enables — that code path was already non-compiling and is unchanged here. Tests: `cargo test -p math -p crypto` → 46 + 198 green. `make lint` clean. * refactor(math): drop all 14 Polynomial operator impls + naive Lagrange (#618) After #604 (boundary.rs eval-form) and #621 (end_exemptions_poly eval-form) landed on main, no production code in the prover uses `Polynomial` operators anymore. The only remaining user was `Polynomial::interpolate` (test-only Lagrange interpolation, gated `#[cfg(test)]` and used as a naive reference by the FFT cross-check tests). With both production references gone, the operators + the naive interpolant can go too. Removed: - All 14 operator `impl`s (Add P+/-P, Neg P, Mul P*P, Sub P-FieldElement); the previous PR kept these four families because `transition.rs` and `boundary.rs` used them. They are now fully unreachable. - `Polynomial::interpolate` and `InterpolateError` from `polynomial.rs`. - Operator unit tests in `polynomial_tests.rs` (adding_, negating_, multiply_*) and their helpers (`polynomial_a`, `polynomial_b`, `polynomial_minus_a`, `polynomial_a_plus_b`). - `compose` test helper + `composition_works` test (depended on the removed `interpolate`). - Direct `interpolate_x_*_y_*` tests (tested the removed function). Rewrote: - FFT correctness tests (both `fft_tests.rs` and `fft_friendly_u64_goldilocks_tests.rs`): the `gen_fft_and_naive_interpolate` / `_coset_` helpers cross-checked `interpolate_fft` against naive Lagrange. Replaced with FFT round-trip via direct Horner evaluation — interpolate via FFT, then re-evaluate at every twiddle with `Polynomial::evaluate` (a different algorithm). Same independence property, no naive Lagrange needed. - `simple_interpolating_polynomial_by_hand_works`: swapped `*` for `mul_with_ref(&...)` (the only remaining call site that needed it). Verified: - `cargo build --workspace` clean. - `cargo test -p math` -> 178 passing. - `cargo test -p lambda-vm-prover --lib` -> 281 pass / 77 pre-existing failures unrelated to constraints (same baseline as main). - `make lint` clean across all three clippy configs.

claude Bot reviewed May 22, 2026

View reviewed changes

diegokingston added 4 commits May 26, 2026 14:41

diegokingston force-pushed the perf/fri-merkle-folding branch from eb50cc2 to bb99e2d Compare May 26, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/fri merkle folding#611

Perf/fri merkle folding#611
diegokingston wants to merge 4 commits into
mainfrom
perf/fri-merkle-folding

diegokingston commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

claude Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

diegokingston commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Codex Code Review

Uh oh!

claude Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant