Skip to content

Faster variable-base scalar multiplication in zk-SNARK circuits #3924

@daira

Description

@daira

The best general way to perform a variable-base scalar multiplication in an R1CS circuit, required (before this write-up) 9 constraints per scalar bit. There's a way that requires 8.5 constraints/bit, but only with a recoded scalar using a nonstandard and inconvenient digit set (e.g. [0, 1, 2, -1]).

I believe it's possible to implement this in 6 constraints/bit, by using a modification of a technique from [Eisentraeger, Lauter, and Montgomery]. They state their technique for curves in short Weierstrass form, but it is also applicable to Montgomery form. (Everything in this ticket is easily adapted to both.) The basic idea is that when we need to find [2] P + Q, where P is the accumulator in a double-and-add algorithm, we compute it as (P + Q) + P. This allows two optimizations:

  • we do not need to compute the y-coordinate of the intermediate point P + Q (details below);
  • we've replaced a doubling with an addition, which is more efficient in R1CS by one constraint.

Here we adapt [Eisentraeger, Lauter, and Montgomery]'s formulae to a Montgomery curve with equation B·y2 = x3 + A·x2 + x. The constraint system for the incomplete addition P + Q = R on its own would be:

    (xQ - xP) × (λ1) = (yQ - yP)
    (B·λ1) × (λ1) = (A + xP + xQ + xR)
    (xP - xR) × (λ1) = (yR + yP)

When computing (P + Q) + P = S, we drop the last constraint which is only needed for yR. The outer addition requires the gradient λ2 = (yP - yR)/(xP - xR), but we have:

    λ1 = (yR + yP)/(xP - xR)

therefore

    λ2 = 2·yP/(xP - xR) - λ1.

So in R1CS we can write:

    (xQ - xP) × (λ1) = (yQ - yP)
    (B·λ1) × (λ1) = (A + xP + xQ + xR)
    (xP - xR) × (λ1 + λ2) = (2·yP)

and then complete the outer addition with:

    (B·λ2) × (λ2) = (A + xR + xP + xS)
    (xP - xS) × (λ2) = (yS + yP)

The practical problem with applying this technique in an R1CS circuit is that we cannot efficiently implement a conditional that sometimes computes [2] P, and sometimes [2] P + Q. (Conditionally replacing Q with 𝓞 does not work, because 𝓞 does not have an affine Montgomery representation.)

The following trick works around this. Suppose that r is of length n bits. Consider the following algorithm:

    Acc := [2] T
    for i from n-1 down to 0 {
        Q := ri ? T : −T
        Acc := (Acc + Q) + Acc
    }

For each step we can compute the y-coordinate of Q using:

    (yT) × (2.ri - 1) = (yQ)

This requires a total of 6 constraints per scalar bit. However, at the end we have computed [2n+1 - (2n - 1) + 2·r] T = [2n + 1 + 2·r] T.

Not to worry. Suppose that we actually want to compute [2n + k] T, where k < 2n+1. Without loss of generality, assume k is odd (if it is even then add one to k and subtract T from the result). Let k = 1 + 2·r, and solve to give r = (k - 1)/2. Conveniently, this is equivalent to setting r = k >> 1 where >> is the bitwise right-shift operator.

So the full algorithm is:

    Acc := [2] T
    for i from n-1 down to 0 {
        Q := ki+1 ? T : −T
        Acc := (Acc + Q) + Acc
    }
    return (k0 = 0) ? (Acc - T) : Acc

This requires 4C for the initial doubling, n * 6C for the loop, 3C to compute Acc - T, and 2C for the conditional.

There is a minor further improvement by specializing for kn = 0. In that case the first iteration of the loop calculates Acc = [3] T, which can be implemented directly as [2] T + T saving 3C (since we have replaced one loop iteration by an incomplete addition):

    Acc := [2] T + T
    for i from n-2 down to 0 {
        Q := ki+1 ? T : −T
        Acc := (Acc + Q) + Acc
    }
    return (k0 = 0) ? (Acc - T) : Acc

Let s be the order of the large prime-order subgroup. Assume that T is of order s and that 2n+1 - 1 ≤ (s-1)/2. Under these conditions, we can calculate [2n + k] T for k < 2n in (n+1) * 6C.

It remains to check that the x-coordinates of each pair of points to be added are distinct.

When adding points in the large prime-order subgroup, we can rely on Theorem A.3.4 from the Zcash protocol spec, which says that if we have two such points with nonzero indices wrt a given odd-prime order base, where the indices taken in the range -(s-1)/2..(s-1)/2 are distinct disregarding sign, then they have different x-coordinates. This is helpful, because it is easier to reason about the indices of points occurring in the scalar multiplication algorithm than it is to reason about their x-coordinates directly.

So, the required check is equivalent to saying that the following "indexed version" of the above algorithm never asserts:

    acc := 3
    for i from n-2 down to 0 {
        q = ki+1 ? 1 : −1
        assert acc ≠ ± q
        assert (acc + q) ≠ acc    // X
        acc := (acc + q) + acc
        assert 0 < acc ≤ (s-1)/2
    }
    if k0 = 0 {
        assert acc ≠ 1
        acc := acc - 1
    }

The assertion labelled X obviously cannot fail because q ≠ 0. It is easy to see that acc is monotonically increasing except in the last conditional. It reaches its largest value when k is maximal, i.e. 2n+1 - 1, which justifies the condition on n above. This discharges all of the other assertions.

[Edit: the constraint count was off-by-one.]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-circuitArea: zk-SNARK circuitsC-researchCategory: Engineering notes in support of design choicesI-performanceProblems and improvements with respect to performanceelliptic curves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions