Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Speed up random poly generation #742

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

CPerezz
Copy link
Contributor

@CPerezz CPerezz commented Feb 27, 2023

As noted in
privacy-scaling-explorations#151 the generation of a random poly for degrees bigger than 20 starts to get quite slow.

This PR tries to include some minimal changes in the commit fn so that we upstream the improvements achieved in PSE/halo2

Here I forward the benches that I implemented in the PSE PR:

Blinder_poly/serial/18  time:   [28.717 ms 28.779 ms 28.860 ms]                                   
                        change: [+0.0164% +0.2972% +0.5927%] (p = 0.05 < 0.05)
                       
Blinder_poly/parallel/18                                                                             
                        time:   [2.4532 ms 2.4792 ms 2.5064 ms]
                        change: [+0.7593% +2.2239% +3.7108%] (p = 0.00 < 0.05)
                        


Blinder_poly/serial/19  time:   [57.679 ms 57.764 ms 57.866 ms]                                   
                        change: [-1.1429% -0.7681% -0.4110%] (p = 0.00 < 0.05)
                        
Blinder_poly/parallel/19                                                                             
                        time:   [5.4919 ms 5.5360 ms 5.5832 ms]
                        change: [-0.2678% +0.8128% +1.9086%] (p = 0.14 > 0.05)
                        


Blinder_poly/serial/20  time:   [124.00 ms 124.12 ms 124.26 ms]                                   
                        change: [+0.4412% +0.6006% +0.7628%] (p = 0.00 < 0.05)
                        
Blinder_poly/parallel/20                                                                            
                        time:   [19.797 ms 19.868 ms 19.948 ms]
                        change: [-0.8847% -0.3244% +0.2474%] (p = 0.27 > 0.05)
                        


Blinder_poly/serial/21  time:   [243.33 ms 245.28 ms 246.94 ms]                                   
                        change: [-1.4546% -0.6227% +0.0895%] (p = 0.11 > 0.05)
                       
Blinder_poly/parallel/21                                                                            
                        time:   [42.218 ms 44.177 ms 46.453 ms]
                        change: [+4.8431% +9.3787% +14.967%] (p = 0.00 < 0.05)
                        


Blinder_poly/serial/22  time:   [496.53 ms 497.06 ms 497.65 ms]                                   
                        change: [+0.0300% +0.3798% +0.6700%] (p = 0.02 < 0.05)
                       
Blinder_poly/parallel/22                                                                            
                        time:   [79.689 ms 79.914 ms 80.162 ms]
                        change: [-7.9753% -4.9272% -2.8485%] (p = 0.00 < 0.05)
                        

Also leaving CPU info in case is useful:

Model name:            Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
    CPU family:          6
    Model:               165
    Thread(s) per core:  2
    Core(s) per socket:  8

As noted in
privacy-scaling-explorations#151 the
generation of a random poly for degrees bigger than 20 starts to get
quite slow.

This PR tries to include some minimal changes in the `commit` fn so that
we upstream the improvements achieved in PSE/halo2
@CPerezz CPerezz force-pushed the feat/parallel_commit_blinding branch from 95055a4 to a4cc9f6 Compare February 28, 2023 08:13
Comment on lines -45 to +53
let mut random_poly = domain.empty_coeff();
for coeff in random_poly.iter_mut() {
*coeff = C::Scalar::random(&mut rng);
}
#[cfg(feature = "multicore")]
let random_poly = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than expanding this inline here, we should just add an EvaluationDomain::random_coeff<R: RngCore>(&self, rng: R) method. This would also avoid the need to make EvaluationDomain.k pub(crate).

#[cfg(feature = "multicore")]
let random_poly = {
let n_threads = current_num_threads();
let n = 1usize << domain.k as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to EvaluationDomain::random_coeff would also mean we can use self.n instead of recomputing it here:

Suggested change
let n = 1usize << domain.k as usize;

let random_poly = {
let n_threads = current_num_threads();
let n = 1usize << domain.k as usize;
let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 };
let chunk_size = (self.n + n_threads - 1) / n_threads;
let num_chunks = (self.n + chunk_size - 1) / chunk_size;

let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 };
let mut rand_vec = vec![C::Scalar::ZERO; n];

let mut thread_seeds: Vec<ChaCha20Rng> = (0..n_chunks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut thread_seeds: Vec<ChaCha20Rng> = (0..n_chunks)
let mut thread_seeds: Vec<ChaCha20Rng> = (0..num_chunks)


thread_seeds
.par_iter_mut()
.zip_eq(rand_vec.par_chunks_mut(n / n_threads))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.zip_eq(rand_vec.par_chunks_mut(n / n_threads))
.zip_eq(rand_vec.par_chunks_mut(chunk_size))

@daira
Copy link
Contributor

daira commented Feb 28, 2023

The benchmarks in the description don't look like they are an overall improvement except for Blinder_poly/parallel/22 ?

.map(|_| {
let mut seed = [0u8; 32];
rng.fill_bytes(&mut seed);
ChaCha20Rng::from_seed(seed)
Copy link
Contributor

@daira daira Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this could safely use ChaCha12Rng while conservatively meeting our security requirements. The paper Too Much Crypto argues in favour of ChaCha8; I do not entirely endorse all of the conclusions of that paper, but there is no significant doubt in my mind that 12 rounds is sufficient. There is a more recent (Nov 2021) paper on cryptanalysis of ChaCha, PNB-focused Differential Cryptanalysis of ChaCha Stream Cipher, published since the ones considered in Too Much Crypto (which is originally from Dec 2019), but the newer paper does not improve on brute force and in any case would only extend to 7.25 rounds if it did.

@str4d str4d added this to the 0.3.0 milestone Mar 2, 2023
@str4d
Copy link
Contributor

str4d commented Mar 20, 2023

Given that there are still unaddressed review comments, and that this is a performance improvement that doesn't affect the public crate API, I propose bumping this out of 0.3.0 (we can cut a point release like 0.3.1 when it is ready).

@ebfull ebfull removed this from the 0.3.0 milestone Mar 21, 2023
@daira daira added this to the 0.3.1 milestone Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants