Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

einar/pr/extended.jacobian.coordinates #1

Draft
wants to merge 10 commits into
base: taiko/unstable
Choose a base branch
from

Conversation

einar-taiko
Copy link

This resolves taikoxyz/zkevm-circuits#121

To measure the performance impact:

cd /tmp
git clone https://github.com/einar-taiko/halo2curves.git
cd halo2curves
git checkout b09144e
cargo bench multiexp
git checkout einar/pr/extended.jacobian.coordinates
cargo bench multiexp

@mratsim
Copy link
Collaborator

mratsim commented Aug 22, 2023

I get even less improvement than previously reported which is quite strange

image

git clone https://github.com/taikoxyz/halo2curves taiko-halo2curves
cd taiko-halo2curves/
git fetch origin pull/1/head
git checkout 9692b29
cargo bench multiexp
git checkout -b review-pr1 FETCH_HEAD
cargo bench multiexp

});
sum
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EC implementation is correct with a small issue on inline usage that will lead to code explosion.

The multiexp implementation should be done by just modifying the buckets accumulation scheme in https://github.com/privacy-scaling-explorations/halo2/blob/f3487575f00e627705fcb7778d7d1049eed79601/halo2_proofs/src/arithmetic.rs#L67-L71

Also, there are 2 passes over the coeffs+base instead of 1 in this PR which should explain most of the slowness compared to theoretical (15%) and conservative estimate (10% speedup).

Cargo.toml Outdated Show resolved Hide resolved

// Jacobian implementations

// From affine to homogeneous
Copy link
Collaborator

@mratsim mratsim Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it from affine to homogeneous or from affine to Jacobian?

This seems to contradict the comment above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I added the comment because I found it confusing too, since $name refers to homogeneous coordinates and $name_affine to affine coordinates. Alternatively, I can remove the first comment, since non-extended Jacobian coordinates should be gone anyway.

let zzz1 = p.zzz;

// curve constants
let a = Self::curve_constant_a();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.

If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least a comment is needed to point out an future easy optimization opportunity

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.

If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0

I have changed it to:

// curve constants
const A: $base = $name_jac_ext::curve_constant_a();

This should enforce compile time evaluation and hopefully imply that

let m = (x1_sqr.double()+x1_sqr) + A*zz1.square();

gets optimized to

let m = (x1_sqr.double()+x1_sqr) 

for the curves where $a=0$. But I am currently not sure how to verify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use if A == 0 and if A != 0.

The multiplication and addition operations are pure assembly and will not be optimized away.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with -opt-level=2 here https://godbolt.org/z/4Gr41o8Kr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with -opt-level=2 here https://godbolt.org/z/4Gr41o8Kr

I realise that * is overloaded for field arithmetic, so my argument may or may not be applicable.

let w = u*v;
let s = x1 * v;
let x1_sqr = x1.square();
let m = (x1_sqr.double()+x1_sqr) + a*zz1.square();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a = 0, a*zz1.square() can be skipped

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried commenting out the last term and then ran cargo test. All tests pass.

I put in a debug_assert_eq!(a, 0);, so if we can catch future curves where a != 0.

}

/// <http://www.hyperelliptic.org/EFD/g1p/auto-shortw-xyzz.html#doubling-dbl-2008-s-1>
#[inline]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline] is likely not worth it here. Especially because the underlying field operations are already inline so it will lead to code size explosion:

#[inline]
pub fn double(&self) -> $field {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I will remove it from the add as well then.

let num_threads = num_cpus::get();
if n > num_threads && n > 32 {
let chunk = n / num_threads;
let results: Vec<C::ExtendedJacobianCoordinates> = coeffs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the inner buckets result should be collected in Extended Jacobian.

Extended Jacobian mixed addition is faster than projective, but for plain addition, projective is faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the signature of best_multiexp should stay with projective coordinates

})
.sample_size(30);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To measure perf of this PR without multithreading interference, a serial benchmark is needed as well.

}
}

pub(crate) fn multiexp_serial<C: CurveJacExt>(coeffs: &[C::Scalar], bases: &[C]) -> C::ExtendedJacobianCoordinates {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, the signature of multiexp_serial should stay homogeneous projective

.enumerate()
.rev()
.map(|(i, bucket)| {
for (coeff, base) in coeffs.iter().zip(bases.iter()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map and for are inefficient, you loop twice over the data.

You should loop over the coeff+base and directly mutate/add_assign in the matching bucket

}
bucket
})
.fold(C::jac_ext_identity(), |mut sum, bucket| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final reduction should use projective coordinates (and explicitly convert the bucket sum to projective)

@einar-taiko einar-taiko self-assigned this Aug 23, 2023
@mratsim
Copy link
Collaborator

mratsim commented Aug 29, 2023

Note on our discussion on constant_a, some more changes upstream privacy-scaling-explorations#82.

@einar-taiko einar-taiko force-pushed the einar/pr/extended.jacobian.coordinates branch from d52078b to 747c6e2 Compare August 30, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants