Skip to content

Complex M31 field extension and NTT#19

Merged
recmo merged 20 commits into
worldfnd:mainfrom
weijiekoh:main
May 6, 2025
Merged

Complex M31 field extension and NTT#19
recmo merged 20 commits into
worldfnd:mainfrom
weijiekoh:main

Conversation

@weijiekoh
Copy link
Copy Markdown
Contributor

@weijiekoh weijiekoh commented Apr 15, 2025

This draft PR adds a cm31_ntt member to ProveKit.

cm31_ntt is an implementation of the complex Mersenne-31 field extension (CM31) as well as the number-theoretic transform NTT algorithm for polynomials with CM31 coefficients. It is optimised for the ARM platform, and has been benchmarked on a Raspberry Pi 5.

RF and CF

CM31 builds upon Mersenne-31 field elements whose arithmetic is implemented using redundant representation. We repersent these redundant M31 field elements as 32-bit unsigned integers wrapped in the RF type defined in src/rm31.rs. Please refer to this note by Solbert and Domb to learn more about the algorithms used.

Each CM31 field element consists of a real RF part, and an imaginary RF part. The CF type and complex arithmetic operations are defined in src/cm31.rs.

NTT

ntt.rs contains an optimised implementation of the NTT algorithm. Use ntt() as such:

// Assume that f is a Vec of CM31 coefficients 
// and is at least 8 elements long.

// precomp contains precomputed twiddle factors.
// It can be computed in advance. It can be
// easily serialised and deserialised for easy
// storage and retrieval.
let precomp = precompute_twiddles(n).unwrap();
let res = ntt(&f, &precomp).unwrap();

Optimisations

The fist optimisation is simply to precompute twiddle factors.

The most important optimisation is to combine an approach that allocates memory with each recursive iteration with one that performs the NTT in-place.

The traditional divide-and-conquer approach allocates memory with each recursive iteration. It is very fast, but leaves some room for improvement. Take a look at ntt_r8_vec() and ntt_r8_vec_p() in ntt.rs. They are straightforward implementations of the divide-and-conquer algorithm (the former does not precompute twiddle factors, and the latter does), but they allocate new Vecs with each recursive implementation, resulting in some performance overhead.

An in-place algorithm (ntt_r8_ip and ntt_r8_ip_p) which avoids memory allocation altogether, however, is extremely slow, especially past 32768 elements. This is likely due to the small CPU cache space leading to costly cache misses.

We had a breakthrough when we found that for NTTs over sizes lower than 262144, ntt_r8_vec_p() is slower than its in-place counterpart ntt_r8_ip_p. This led us to develop our most efficient algorithm by combining the two approaches. It is implemented in ntt_r8_hybrid_p() which uses the divide-and-conquer approach via recursion, but when the NTT size is NTT_BLOCK_SIZE_FOR_CACHE (hardcoded to 32768), it uses the in-place algorithm.

Benchmark results can be found in README.md.

@weijiekoh weijiekoh marked this pull request as ready for review April 30, 2025 21:08
Copy link
Copy Markdown
Contributor

@recmo recmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but could use some help making it more canonically Rust (which also helps performance).
@Dzejkop Can you help here?

Comment thread cm31_ntt/src/cm31.rs Outdated
Comment thread cm31_ntt/src/cm31.rs Outdated
Comment thread cm31_ntt/src/cm31.rs Outdated
Comment thread cm31_ntt/src/cm31.rs Outdated
Comment thread cm31_ntt/src/ntt_utils.rs Outdated
let b5_j = b5.mul_j();
let b7_j = b7.mul_j();
let b6_w8 = b6 * W_8;
let b7_j_w8 = b7_j * W_8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we are not exploiting the special structure here. Does the compiler turn this into bitshifts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in commit 496dbce, though benchmarks don't show a significant improvement.

Comment thread cm31_ntt/src/ntt_utils.rs

/// A radix-8 NTT butterfly.
#[inline]
pub fn ntt_block_8(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xrvdg We will want to aggressively optimize this function.

Comment thread cm31_ntt/src/ntt_utils.rs
Comment thread cm31_ntt/src/rm31.rs Outdated
Comment thread cm31_ntt/src/rm31.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
@recmo recmo requested a review from Dzejkop May 1, 2025 04:52
Comment thread cm31_ntt/src/rm31.rs Outdated
Comment thread cm31_ntt/src/rm31.rs Outdated
Comment thread cm31_ntt/src/ntt_utils.rs Outdated
@weijiekoh
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback and suggestions! I'll incorporate them into a new commit.

Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs
Comment thread cm31_ntt/src/ntt.rs
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
Comment thread cm31_ntt/src/ntt.rs Outdated
@weijiekoh
Copy link
Copy Markdown
Contributor Author

Thank you @Dzejkop ! Your suggestions are super helpful. Things look much cleaner and consistent now!

@recmo recmo merged commit 1503b1f into worldfnd:main May 6, 2025
1 check failed
@recmo
Copy link
Copy Markdown
Contributor

recmo commented May 6, 2025

Thank you @Dzejkop and @weijiekoh. This looks good to merge (we can always follow up with further PRs).

dcbuild3r pushed a commit that referenced this pull request May 16, 2026
dcbuild3r pushed a commit that referenced this pull request May 16, 2026
dcbuild3r pushed a commit that referenced this pull request May 16, 2026
Complex M31 field extension and NTT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants