Calculate circuit costs of prime group operations (Ristretto or ctEdwards-subgroup) #4024
Note that if we convert to Montgomery for expensive operations like scalar multiplication, we need to special-case points of small order (which all represent the group identity). For example, when we do a multiplication in Montgomery, we should first set a boolean flag according to whether the base point is of small order, and replace the result of the multiplication with Edwards zero when that flag indicates to do so.
Apparently points of order 8 do not occur as valid Ristretto representations. Therefore we can set the "nonzero flag" b according to whether u = 0 or v = 0 in affine-Edwards coordinates, which requires 4 constraints:
(1 - b) × (b) = 0 // b is boolean
We'll represent group elements either as affine-Edwards (u, v), or temporarily as affine-Montgomery (b, x, y) where b is 0 for the zero point and 1 otherwise.
For scalar multiplication, we need 4C to calculate b and 2C to convert to Montgomery. Multiplication preserves b, but if b was 0 we must reset (u, v) to (0, 1) after the conversion back to Edwards. That conversion costs 2C and the conditional reset another 2C. So, the cost of a scalar multiplication [2n + k] T, for k < 2n, is (16 + n * 6)C. We assume that 2n + k is less than the group order.
Doubling requires 5C, and addition requires 6C.
Encoding is described as follows here, with a change of notation from (x, y) to (u, v):
Calculating u·v requires 1C. A field element is defined to be negative iff the least significant bit of its encoding is 1. Testing this involves unpacking the field element to a canonical binary encoding; for example in the Jubjub field that requires 255C for the binary unpacking, plus 68C to check canonicity. We require another 4C to test whether u = 0 and OR that with the condition that u·v is negative (the technique is the same as for calculating the nonzero flag above). The addition of Q4 only costs 3C, because we are adding a constant point and so T, A, and B in the Edwards addition (protocol spec appendix A.3.3.5) can be expressed as linear combinations of u and v. Then the selection of either (u, v) or (u, v) + Q4 costs 2C. So the cost of this step is (1 + 255 + 68 + 4 + 3 + 2)C = 333C for Ristretto-Jubjub.
This costs another (1 + 255 + 68 + 4 + 2)C = 330C for Ristretto-Jubjub. (The 2C is for a conditional negation; we don't need an addition.)
This costs 1C for the division (the multiplication by −a is free), 1C for checking the square, and (ouch) another (254 + 68)C = 322C for Ristretto-Jubjub to check that the square root is positive.
s was already boolean-unpacked in the previous step in order to check that it is positive, so if we need it as bits or bytes, we have it.
The overall cost of encoding is (333 + 330 + 322)C = 985C for Ristretto-Jubjub.
If the input is given as bits, this costs 68C for Ristretto-Jubjub. If the input is given just as a field element, it costs an additional 254C. (The least significant bit is always 0.)
This costs 1C for the squaring and 1C for the division.
s2 was already computed, so this step costs 2C to compute the denominator, 1C for the division, 1C to check the square, and (254 + 68)C for Ristretto-Jubjub to check that the square root is positive.
(We can't save 1C by completing the square in the denominator, because a·d is nonsquare for a complete twisted Edwards curve.)
This costs 1C to compute u·v, (254 + 68)C for Ristretto-Jubjub to check that it is nonnegative, and 1C to check that v ≠ 0.
The total cost of decoding, assuming the input is already boolean-constrained, is (68 + 1 + 1 + 2 + 1 + 1 + 254 + 68 + 1 + 254 + 68 + 1)C = 720C for Ristretto-Jubjub.
It's also worth considering the cost of an alternative approach that uses only the prime-order subgroup, where the decoding operation checks that the representation is in that subgroup.
It costs 3C to test whether or not a point, that is guaranteed to be in the prime-order subgroup, is the zero point. So the cost of a multiplication via Montgomery conversion is (15 + n * 6)C. Doubling costs 5C, and addition costs 6C.
Encoding for Jubjub costs (255 + 68)C = 323C to unpack and range-check the u-coordinate, in order to obtain the least significant bit u~. This does not count unpacking the v-coordinate to bits (which was free in the Ristretto case), but I think this is reasonable because we might not need it as bits. If we do, that's an extra (255 + 68)C.
Decoding starts with a DecompressValidate operation on (u~, v), as in appendix A.3.3.2 of the protocol spec. For Jubjub this costs 3C for the on-curve check, and (255 + 68)C to unpack and range-check the witnessed u-coordinate, equating the least significant bit with u~. (We could save 1C if u~ is already boolean-constrained, but we'll include that in the count. We take v as a field element; if it's provided as a bit string then we do not include the cost of boolean-constraining or range-checking it.)
Then we have to check that the resulting affine Edwards point is in the prime-order subgroup. This is complicated to do efficiently. Logically, we can multiply by the prime order and check that the result is the zero point. However, the cheapest way to do fixed-scalar multiplication is on the Montgomery curve, and we don't know that the point is of prime order since that's what we are trying to check, so we must be very careful of exceptional cases.
[to be continued]
I was able to find an addition-subtraction chain for the Jubjub curve order that requires 204 doubles (4C each), 6 adds (3C each), and 45 double-adds or double-subtracts (5C each), for a total multiplication cost of 1059C. (This is probably not optimal, but it is fairly close.) For the arithmetic to be valid, we need to also check that the point is not of small order, which combined with the on-curve check costs 8C, as noted in appendix A.3.3.6 of the protocol spec, but we need an additional 3C because the zero point should be accepted. The conversion to Montgomery coordinates takes 2C. We also need to check that the result of the multiplication is the zero point, but that is free. (The final addition of the multiplication would be exceptional if we computed it explicitly in affine Montgomery coordinates; instead we check that the inputs are negations of each other. Note that we can arrange for this check to pass if the original point was the zero point.) Putting all this together with the rest of the decoding, we need a total of (8 + 3 + 255 + 68 + 2 + 1059)C = 1395C for decoding.
The conclusion is that relative to the costs for Ristretto-Jubjub, encoding for the prime-order subgroup representation is less expensive, but decoding is more expensive (not prohibitively so).
One thing to note is that the choice to define negativity using the low bit of the field element in the ristretto255 draft is an arbitrary choice, made so that ristretto255 aligns with existing Ed25519 code. The original Decaf suggestion was actually to use "least absolute residue for x is in [0,(p-1)/2] means nonnegative", and this is used for Ed448. (This can be implemented very efficiently in software because it's equivalent to checking the low bit of 2*x).
It may be worth checking whether a different definition of negativity would be friendlier, since the sign checks are a significant cost. I don't see a way to check the Ed448-style condition more efficiently inside of a circuit, but the Decaf paper also mentions using the Legendre symbol when p = 3 mod 4. Unfortunately, this is not the case for JubJub (or Doppio), which are both 1 mod 4. (Off the top of my head, I'm not sure why 3 mod 4 is required, maybe something to do with sqrt(-1)). The Decaf paper doesn't specify exactly what properties are required for the sign function, but maybe it's possible to work out what those are, and whether it's possible to have a sign function with a more algebraic definition that would be cheaper.
There's a more efficient way to check for a point in the prime-order subgroup. This was pointed out to us by Qedit in their audit of Sapling. The idea is to witness [h-1 mod q] P where h is the cofactor and q is the group order. This is multiplied by h in the circuit and checked to be equal to the original point P.
This does not affect the cost of encoding, which is 323C as calculated above.
Decoding, however, is much more efficient. We start by checking that [h-1 mod q] P is on the (affine-Edwards) curve, which costs 3C. Then we multiply by h; normally this would cost 15C, but we can save 2C by merging the on-curve check with the first doubling, like this:
(u) × (u) = (uu)
After two more Edwards doublings, this gives a point that is guaranteed to be in the prime subgroup. Now we need (255 + 68)C to unpack and range-check the u-coordinate, equating its least-significant bit with u~. So the total cost of decoding is (6 + 10 + 255 + 68)C = 339C.
Recall that Ristretto-Jubjub required 985C for encoding and 720C for decoding, so this is a substantial improvement for both. The trade-off is that we need to compute [h-1 mod q] P outside the circuit (but we can do so using an efficient addition chain, and this is more than offset by the improvement in circuit size).