Complex M31 field extension and NTT by weijiekoh · Pull Request #19 · worldfnd/provekit

weijiekoh · 2025-04-15T04:45:25Z

This draft PR adds a cm31_ntt member to ProveKit.

cm31_ntt is an implementation of the complex Mersenne-31 field extension (CM31) as well as the number-theoretic transform NTT algorithm for polynomials with CM31 coefficients. It is optimised for the ARM platform, and has been benchmarked on a Raspberry Pi 5.

`RF` and `CF`

CM31 builds upon Mersenne-31 field elements whose arithmetic is implemented using redundant representation. We repersent these redundant M31 field elements as 32-bit unsigned integers wrapped in the RF type defined in src/rm31.rs. Please refer to this note by Solbert and Domb to learn more about the algorithms used.

Each CM31 field element consists of a real RF part, and an imaginary RF part. The CF type and complex arithmetic operations are defined in src/cm31.rs.

NTT

ntt.rs contains an optimised implementation of the NTT algorithm. Use ntt() as such:

// Assume that f is a Vec of CM31 coefficients 
// and is at least 8 elements long.

// precomp contains precomputed twiddle factors.
// It can be computed in advance. It can be
// easily serialised and deserialised for easy
// storage and retrieval.
let precomp = precompute_twiddles(n).unwrap();
let res = ntt(&f, &precomp).unwrap();

Optimisations

The fist optimisation is simply to precompute twiddle factors.

The most important optimisation is to combine an approach that allocates memory with each recursive iteration with one that performs the NTT in-place.

The traditional divide-and-conquer approach allocates memory with each recursive iteration. It is very fast, but leaves some room for improvement. Take a look at ntt_r8_vec() and ntt_r8_vec_p() in ntt.rs. They are straightforward implementations of the divide-and-conquer algorithm (the former does not precompute twiddle factors, and the latter does), but they allocate new Vecs with each recursive implementation, resulting in some performance overhead.

An in-place algorithm (ntt_r8_ip and ntt_r8_ip_p) which avoids memory allocation altogether, however, is extremely slow, especially past 32768 elements. This is likely due to the small CPU cache space leading to costly cache misses.

We had a breakthrough when we found that for NTTs over sizes lower than 262144, ntt_r8_vec_p() is slower than its in-place counterpart ntt_r8_ip_p. This led us to develop our most efficient algorithm by combining the two approaches. It is implemented in ntt_r8_hybrid_p() which uses the divide-and-conquer approach via recursion, but when the NTT size is NTT_BLOCK_SIZE_FOR_CACHE (hardcoded to 32768), it uses the in-place algorithm.

Benchmark results can be found in README.md.

…ybrid_p

recmo

LGTM, but could use some help making it more canonically Rust (which also helps performance).
@Dzejkop Can you help here?

recmo · 2025-05-01T04:37:19Z

+    let b5_j = b5.mul_j();
+    let b7_j = b7.mul_j();
+    let b6_w8 = b6 * W_8;
+    let b7_j_w8 = b7_j * W_8;


I'm surprised we are not exploiting the special structure here. Does the compiler turn this into bitshifts?

Resolved in commit 496dbce, though benchmarks don't show a significant improvement.

recmo · 2025-05-01T04:38:11Z

+
+/// A radix-8 NTT butterfly.
+#[inline]
+pub fn ntt_block_8(


@xrvdg We will want to aggressively optimize this function.

weijiekoh · 2025-05-02T18:18:41Z

Thanks for the feedback and suggestions! I'll incorporate them into a new commit.

weijiekoh · 2025-05-04T04:01:09Z

Thank you @Dzejkop ! Your suggestions are super helpful. Things look much cleaner and consistent now!

recmo · 2025-05-06T12:30:11Z

Thank you @Dzejkop and @weijiekoh. This looks good to merge (we can always follow up with further PRs).

Complex M31 field extension and NTT

weijiekoh added 5 commits April 14, 2025 21:31

cm31_ntt

a8febd8

ntt_2_24_optimisation_attempt benchmark

1e866b4

pass inputs and twiddle factors as params to ntt_block_8

49de383

slightly reduced memory alloc in ntt_block_8()

76d0707

Use a 32768-block for NTT and cache precomputed twiddles

e3187dd

weijiekoh force-pushed the main branch from 85b78ef to a2fd5f4 Compare April 21, 2025 21:43

changed the underlying data type of rm31 to u32

6309518

weijiekoh force-pushed the main branch from a2fd5f4 to e74a219 Compare April 23, 2025 04:02

Updated readme with benchmarks; fixed benchmarks

16b4451

weijiekoh force-pushed the main branch from e74a219 to 16b4451 Compare April 23, 2025 04:10

weijiekoh added 5 commits April 29, 2025 16:28

ntt_r8_s2_hybrid_p and ntt_r8_s4_hybrid_p

432c806

updated readme with benchmarks for ntt_r8_s2_hybrid_p and ntt_r8_s4_h…

50c1028

…ybrid_p

Refactor precomputation function and params

0245903

serialisation and deserialisation of PrecomputedTwiddles

3badc85

tweaked benches/ntt_r8_hybrid_p.rs; deleted unused old files

c27b45a

weijiekoh marked this pull request as ready for review April 30, 2025 21:08

recmo approved these changes May 1, 2025

View reviewed changes

recmo requested a review from Dzejkop May 1, 2025 04:52

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/rm31.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/rm31.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt_utils.rs Outdated

resolve most PR worldfnd#19 comments

2bf0322

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 2, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs

resolve more PR worldfnd#19 comments

aaddb59

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

Dzejkop reviewed May 3, 2025

View reviewed changes

Comment thread cm31_ntt/src/ntt.rs Outdated

weijiekoh added 6 commits May 3, 2025 14:40

re-enabled ntt_block_8 benchmark

a6e2bfd

optimised mul_by_w8() instead of * W_8

496dbce

updated benchmarks in the readme

5c63090

use anyhow::ensure and anyhow::Result consistently

5b2a8d9

added fuzz tests for rm31 arith ops

22a45aa

added safety comments

719c4f9

recmo merged commit 1503b1f into worldfnd:main May 6, 2025
1 check failed

dcbuild3r pushed a commit that referenced this pull request May 16, 2026

resolve most PR #19 comments

e1431e9

dcbuild3r pushed a commit that referenced this pull request May 16, 2026

resolve more PR #19 comments

aa46bc8

dcbuild3r pushed a commit that referenced this pull request May 16, 2026

Merge pull request #19 from weijiekoh/main

784b2d5

Complex M31 field extension and NTT

Conversation

weijiekoh commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RF and CF

NTT

Optimisations

Uh oh!

recmo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

recmo May 1, 2025

Choose a reason for hiding this comment

Uh oh!

weijiekoh May 4, 2025

Choose a reason for hiding this comment

Uh oh!

recmo May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weijiekoh commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

weijiekoh commented May 4, 2025

Uh oh!

Uh oh!

recmo commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weijiekoh commented Apr 15, 2025 •

edited

Loading

`RF` and `CF`