Skip to content

Commit 015d158

Browse files
committed
use NAF for mul_add()
while we were using the more clever algorithm, with addition of two points at the time when possible, it's possible to do it slightly faster by performing something similar but with 2-ary NAF this speeds up single-shot verify by about 5% with python int() and by about 4% with gmpy's mpz()
1 parent b61c75c commit 015d158

File tree

4 files changed

+174
-72
lines changed

4 files changed

+174
-72
lines changed

README.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -72,32 +72,35 @@ pip install ecdsa[gmpy]
7272

7373
The following table shows how long this library takes to generate keypairs
7474
(`keygen`), to sign data (`sign`), to verify those signatures (`verify`),
75-
and to derive a shared secret (`ecdh`).
75+
to derive a shared secret (`ecdh`), and
76+
to verify the signatures with no key specific precomputation (`no PC verify`).
7677
All those values are in seconds.
7778
For convenience, the inverses of those values are also provided:
7879
how many keys per second can be generated (`keygen/s`), how many signatures
7980
can be made per second (`sign/s`), how many signatures can be verified
80-
per second (`verify/s`), and how many shared secrets can be derived per second
81-
(`ecdh/s`). The size in bytes of a raw signature (generally the smallest
81+
per second (`verify/s`), how many shared secrets can be derived per second
82+
(`ecdh/s`), and how many signatures with no key specific
83+
precomputation can be verified per second (`no PC verify/s`). The size of raw
84+
signature (generally the smallest
8285
way a signature can be encoded) is also provided in the `siglen` column.
8386
Use `tox -e speed` to generate this table on your own computer.
8487
On an Intel Core i7 4790K @ 4.0GHz I'm getting the following performance:
8588

8689
```
87-
siglen keygen keygen/s sign sign/s verify verify/s
88-
NIST192p: 48 0.00035s 2893.02 0.00038s 2620.53 0.00069s 1458.92
89-
NIST224p: 56 0.00043s 2307.11 0.00048s 2092.00 0.00088s 1131.33
90-
NIST256p: 64 0.00056s 1793.70 0.00061s 1639.87 0.00113s 883.79
91-
NIST384p: 96 0.00116s 864.33 0.00124s 806.29 0.00233s 429.87
92-
NIST521p: 132 0.00221s 452.16 0.00234s 427.31 0.00460s 217.19
93-
SECP256k1: 64 0.00056s 1772.65 0.00061s 1628.73 0.00110s 912.13
94-
BRAINPOOLP160r1: 40 0.00026s 3801.86 0.00029s 3401.11 0.00052s 1930.47
95-
BRAINPOOLP192r1: 48 0.00034s 2925.73 0.00038s 2634.34 0.00070s 1438.06
96-
BRAINPOOLP224r1: 56 0.00044s 2287.98 0.00048s 2083.87 0.00088s 1137.52
97-
BRAINPOOLP256r1: 64 0.00056s 1774.11 0.00061s 1628.25 0.00112s 890.71
98-
BRAINPOOLP320r1: 80 0.00081s 1238.18 0.00087s 1146.71 0.00151s 661.95
99-
BRAINPOOLP384r1: 96 0.00117s 855.47 0.00124s 804.56 0.00241s 414.83
100-
BRAINPOOLP512r1: 128 0.00223s 447.99 0.00234s 427.49 0.00437s 229.09
90+
siglen keygen keygen/s sign sign/s verify verify/s no PC verify no PC verify/s
91+
NIST192p: 48 0.00033s 2991.13 0.00036s 2740.86 0.00067s 1502.11 0.00136s 737.54
92+
NIST224p: 56 0.00042s 2360.67 0.00046s 2190.16 0.00083s 1201.83 0.00170s 587.79
93+
NIST256p: 64 0.00053s 1872.02 0.00057s 1743.08 0.00103s 968.53 0.00219s 457.36
94+
NIST384p: 96 0.00110s 907.45 0.00116s 861.63 0.00218s 459.38 0.00445s 224.92
95+
NIST521p: 132 0.00214s 467.72 0.00223s 448.70 0.00430s 232.76 0.00888s 112.66
96+
SECP256k1: 64 0.00054s 1841.11 0.00058s 1722.33 0.00111s 903.07 0.00216s 464.01
97+
BRAINPOOLP160r1: 40 0.00026s 3780.81 0.00029s 3422.67 0.00054s 1863.09 0.00109s 914.93
98+
BRAINPOOLP192r1: 48 0.00034s 2942.79 0.00037s 2710.56 0.00070s 1435.59 0.00138s 724.79
99+
BRAINPOOLP224r1: 56 0.00044s 2278.35 0.00047s 2145.32 0.00090s 1115.34 0.00182s 549.72
100+
BRAINPOOLP256r1: 64 0.00055s 1832.95 0.00059s 1704.50 0.00110s 911.02 0.00234s 427.22
101+
BRAINPOOLP320r1: 80 0.00077s 1305.78 0.00082s 1222.47 0.00156s 640.27 0.00321s 311.56
102+
BRAINPOOLP384r1: 96 0.00112s 893.07 0.00118s 849.32 0.00228s 438.75 0.00478s 209.35
103+
BRAINPOOLP512r1: 128 0.00213s 470.08 0.00221s 451.98 0.00419s 238.70 0.00940s 106.44
101104
102105
ecdh ecdh/s
103106
NIST192p: 0.00110s 910.70
@@ -118,20 +121,20 @@ On an Intel Core i7 4790K @ 4.0GHz I'm getting the following performance:
118121
To test performance with `gmpy2` loaded, use `tox -e speedgmpy2`.
119122
On the same machine I'm getting the following performance with `gmpy2`:
120123
```
121-
siglen keygen keygen/s sign sign/s verify verify/s
122-
NIST192p: 48 0.00017s 5945.50 0.00018s 5544.66 0.00033s 3002.54
123-
NIST224p: 56 0.00021s 4742.14 0.00022s 4463.52 0.00044s 2248.59
124-
NIST256p: 64 0.00024s 4155.73 0.00025s 3994.28 0.00047s 2105.34
125-
NIST384p: 96 0.00041s 2415.06 0.00043s 2316.41 0.00085s 1177.18
126-
NIST521p: 132 0.00072s 1391.14 0.00074s 1359.63 0.00140s 716.31
127-
SECP256k1: 64 0.00024s 4216.50 0.00025s 3994.52 0.00047s 2120.57
128-
BRAINPOOLP160r1: 40 0.00014s 7038.99 0.00015s 6501.55 0.00029s 3397.79
129-
BRAINPOOLP192r1: 48 0.00017s 5983.18 0.00018s 5626.08 0.00035s 2843.62
130-
BRAINPOOLP224r1: 56 0.00021s 4727.54 0.00022s 4464.86 0.00043s 2326.84
131-
BRAINPOOLP256r1: 64 0.00024s 4221.00 0.00025s 4010.26 0.00049s 2046.40
132-
BRAINPOOLP320r1: 80 0.00032s 3142.14 0.00033s 3009.15 0.00061s 1652.88
133-
BRAINPOOLP384r1: 96 0.00041s 2415.98 0.00043s 2340.35 0.00083s 1198.77
134-
BRAINPOOLP512r1: 128 0.00064s 1567.27 0.00066s 1526.33 0.00127s 788.51
124+
siglen keygen keygen/s sign sign/s verify verify/s no PC verify no PC verify/s
125+
NIST192p: 48 0.00017s 5878.39 0.00018s 5670.66 0.00034s 2971.38 0.00067s 1484.97
126+
NIST224p: 56 0.00021s 4705.08 0.00022s 4587.19 0.00040s 2499.96 0.00088s 1140.97
127+
NIST256p: 64 0.00024s 4252.73 0.00024s 4108.48 0.00049s 2038.80 0.00096s 1043.03
128+
NIST384p: 96 0.00041s 2455.84 0.00042s 2406.31 0.00079s 1260.03 0.00172s 580.61
129+
NIST521p: 132 0.00070s 1419.16 0.00072s 1392.50 0.00139s 719.35 0.00307s 325.96
130+
SECP256k1: 64 0.00024s 4228.87 0.00024s 4086.32 0.00047s 2124.86 0.00096s 1037.53
131+
BRAINPOOLP160r1: 40 0.00014s 6932.12 0.00015s 6678.36 0.00030s 3387.90 0.00056s 1784.02
132+
BRAINPOOLP192r1: 48 0.00017s 5886.05 0.00017s 5720.63 0.00034s 2941.22 0.00067s 1490.87
133+
BRAINPOOLP224r1: 56 0.00021s 4748.89 0.00022s 4638.15 0.00041s 2460.86 0.00089s 1128.91
134+
BRAINPOOLP256r1: 64 0.00024s 4248.00 0.00024s 4135.19 0.00045s 2209.69 0.00099s 1006.45
135+
BRAINPOOLP320r1: 80 0.00032s 3096.85 0.00033s 3012.43 0.00065s 1547.07 0.00137s 728.60
136+
BRAINPOOLP384r1: 96 0.00041s 2436.12 0.00042s 2396.23 0.00083s 1211.13 0.00176s 568.39
137+
BRAINPOOLP512r1: 128 0.00063s 1580.09 0.00064s 1562.78 0.00129s 778.09 0.00279s 358.12
135138
136139
ecdh ecdh/s
137140
NIST192p: 0.00051s 1960.26

src/ecdsa/ellipticcurve.py

Lines changed: 76 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
from gmpy2 import mpz
4040

4141
GMPY = True
42-
except ImportError:
42+
except ImportError: # pragma: no branch
4343
try:
4444
from gmpy import mpz
4545

@@ -57,7 +57,7 @@
5757
class CurveFp(object):
5858
"""Elliptic Curve over the field of integers modulo a prime."""
5959

60-
if GMPY:
60+
if GMPY: # pragma: no branch
6161

6262
def __init__(self, p, a, b, h=None):
6363
"""
@@ -75,7 +75,7 @@ def __init__(self, p, a, b, h=None):
7575
# gmpy with it
7676
self.__h = h
7777

78-
else:
78+
else: # pragma: no branch
7979

8080
def __init__(self, p, a, b, h=None):
8181
"""
@@ -164,12 +164,12 @@ def __init__(self, curve, x, y, z, order=None, generator=False):
164164
# since it's generally better (faster) to use scaled points vs unscaled
165165
# ones, use writer-biased RWLock for locking:
166166
self._update_lock = RWLock()
167-
if GMPY:
167+
if GMPY: # pragma: no branch
168168
self.__x = mpz(x)
169169
self.__y = mpz(y)
170170
self.__z = mpz(z)
171171
self.__order = order and mpz(order)
172-
else:
172+
else: # pragma: no branch
173173
self.__x = x
174174
self.__y = y
175175
self.__z = z
@@ -359,7 +359,8 @@ def from_affine(point, generator=False):
359359
point.curve(), point.x(), point.y(), 1, point.order(), generator
360360
)
361361

362-
# plese note that all the methods that use the equations from hyperelliptic
362+
# please note that all the methods that use the equations from
363+
# hyperelliptic
363364
# are formatted in a way to maximise performance.
364365
# Things that make code faster: multiplying instead of taking to the power
365366
# (`xx = x * x; xxxx = xx * xx % p` is faster than `xxxx = x**4 % p` and
@@ -389,7 +390,7 @@ def _double(self, X1, Y1, Z1, p, a):
389390
"""Add a point to itself, arbitrary z."""
390391
if Z1 == 1:
391392
return self._double_with_z_1(X1, Y1, p, a)
392-
if not Z1:
393+
if not Y1 or not Z1:
393394
return 0, 0, 1
394395
# after:
395396
# http://hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html#doubling-dbl-2007-bl
@@ -579,11 +580,11 @@ def _naf(mult):
579580
if mult % 2:
580581
nd = mult % 4
581582
if nd >= 2:
582-
nd = nd - 4
583-
ret += [nd]
583+
nd -= 4
584+
ret.append(nd)
584585
mult -= nd
585586
else:
586-
ret += [0]
587+
ret.append(0)
587588
mult //= 2
588589
return ret
589590

@@ -621,15 +622,6 @@ def __mul__(self, other):
621622

622623
return PointJacobi(self.__curve, X3, Y3, Z3, self.__order)
623624

624-
@staticmethod
625-
def _leftmost_bit(x):
626-
"""Return integer with the same magnitude as x but only one bit set"""
627-
assert x > 0
628-
result = 1
629-
while result <= x:
630-
result = 2 * result
631-
return result // 2
632-
633625
def mul_add(self, self_mul, other, other_mul):
634626
"""
635627
Do two multiplications at the same time, add results.
@@ -643,7 +635,7 @@ def mul_add(self, self_mul, other, other_mul):
643635
if not isinstance(other, PointJacobi):
644636
other = PointJacobi.from_affine(other)
645637
# when the points have precomputed answers, then multiplying them alone
646-
# is faster (as it uses NAF)
638+
# is faster (as it uses NAF and no point doublings)
647639
self._maybe_precompute()
648640
other._maybe_precompute()
649641
if self.__precompute and other.__precompute:
@@ -653,32 +645,76 @@ def mul_add(self, self_mul, other, other_mul):
653645
self_mul = self_mul % self.__order
654646
other_mul = other_mul % self.__order
655647

656-
i = self._leftmost_bit(max(self_mul, other_mul)) * 2
648+
# (X3, Y3, Z3) is the accumulator
657649
X3, Y3, Z3 = 0, 0, 1
658650
p, a = self.__curve.p(), self.__curve.a()
659-
self = self.scale()
660-
# after scaling, point is immutable, no need for locking
661-
X1, Y1 = self.__x, self.__y
662-
other = other.scale()
663-
X2, Y2 = other.__x, other.__y
664-
both = self + other
665-
if both is INFINITY:
666-
X4, Y4 = 0, 0
667-
else:
668-
both.scale()
669-
X4, Y4 = both.__x, both.__y
651+
652+
# as we have 6 unique points to work with, we can't scale all of them,
653+
# but do scale the ones that are used most often
654+
# (post scale() points are immutable so no need for locking)
655+
self.scale()
656+
X1, Y1, Z1 = self.__x, self.__y, self.__z
657+
other.scale()
658+
X2, Y2, Z2 = other.__x, other.__y, other.__z
659+
670660
_double = self._double
671661
_add = self._add
672-
while i > 1:
662+
663+
# with NAF we have 3 options: no add, subtract, add
664+
# so with 2 points, we have 9 combinations:
665+
# 0, -A, +A, -B, -A-B, +A-B, +B, -A+B, +A+B
666+
# so we need 4 combined points:
667+
mAmB_X, mAmB_Y, mAmB_Z = _add(X1, -Y1, Z1, X2, -Y2, Z2, p)
668+
pAmB_X, pAmB_Y, pAmB_Z = _add(X1, Y1, Z1, X2, -Y2, Z2, p)
669+
mApB_X, mApB_Y, mApB_Z = _add(X1, -Y1, Z1, X2, Y2, Z2, p)
670+
pApB_X, pApB_Y, pApB_Z = _add(X1, Y1, Z1, X2, Y2, Z2, p)
671+
# when the self and other sum to infinity, we need to add them
672+
# one by one to get correct result but as that's very unlikely to
673+
# happen in regular operation, we don't need to optimise this case
674+
if not pApB_Y or not pApB_Z:
675+
return self * self_mul + other * other_mul
676+
677+
# gmp object creation has cumulatively higher overhead than the
678+
# speedup we get from calculating the NAF using gmp so ensure use
679+
# of int()
680+
self_naf = list(reversed(self._naf(int(self_mul))))
681+
other_naf = list(reversed(self._naf(int(other_mul))))
682+
# ensure that the lists are the same length (zip() will truncate
683+
# longer one otherwise)
684+
if len(self_naf) < len(other_naf):
685+
self_naf = [0] * (len(other_naf)-len(self_naf)) + self_naf
686+
elif len(self_naf) > len(other_naf):
687+
other_naf = [0] * (len(self_naf)-len(other_naf)) + other_naf
688+
689+
for A, B in zip(self_naf, other_naf):
673690
X3, Y3, Z3 = _double(X3, Y3, Z3, p, a)
674-
i = i // 2
675691

676-
if self_mul & i and other_mul & i:
677-
X3, Y3, Z3 = _add(X3, Y3, Z3, X4, Y4, 1, p)
678-
elif self_mul & i:
679-
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, Y1, 1, p)
680-
elif other_mul & i:
681-
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, Y2, 1, p)
692+
# conditions ordered from most to least likely
693+
if A == 0:
694+
if B == 0:
695+
pass
696+
elif B < 0:
697+
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, -Y2, Z2, p)
698+
else:
699+
assert B > 0
700+
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, Y2, Z2, p)
701+
elif A < 0:
702+
if B == 0:
703+
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, -Y1, Z1, p)
704+
elif B < 0:
705+
X3, Y3, Z3 = _add(X3, Y3, Z3, mAmB_X, mAmB_Y, mAmB_Z, p)
706+
else:
707+
assert B > 0
708+
X3, Y3, Z3 = _add(X3, Y3, Z3, mApB_X, mApB_Y, mApB_Z, p)
709+
else:
710+
assert A > 0
711+
if B == 0:
712+
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, Y1, Z1, p)
713+
elif B < 0:
714+
X3, Y3, Z3 = _add(X3, Y3, Z3, pAmB_X, pAmB_Y, pAmB_Z, p)
715+
else:
716+
assert B > 0
717+
X3, Y3, Z3 = _add(X3, Y3, Z3, pApB_X, pApB_Y, pApB_Z, p)
682718

683719
if not Y3 or not Z3:
684720
return INFINITY

src/ecdsa/test_jacobi.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,11 @@ def test_add_point_3_times(self):
367367

368368
self.assertEqual(j_g * 3, j_g + j_g + j_g)
369369

370+
def test_mul_without_order(self):
371+
j_g = PointJacobi(curve_256, generator_256.x(), generator_256.y(), 1)
372+
373+
self.assertEqual(j_g * generator_256.order(), INFINITY)
374+
370375
def test_mul_add_inf(self):
371376
j_g = PointJacobi.from_affine(generator_256)
372377

@@ -405,6 +410,21 @@ def test_mul_add_to_mul(self):
405410

406411
self.assertEqual(a, b)
407412

413+
def test_mul_add_differnt(self):
414+
j_g = PointJacobi.from_affine(generator_256)
415+
416+
w_a = j_g * 2
417+
418+
self.assertEqual(j_g.mul_add(1, w_a, 1), j_g * 3)
419+
420+
def test_mul_add_slightly_different(self):
421+
j_g = PointJacobi.from_affine(generator_256)
422+
423+
w_a = j_g * 2
424+
w_b = j_g * 3
425+
426+
self.assertEqual(w_a.mul_add(1, w_b, 3), w_a * 1 + w_b * 3)
427+
408428
def test_mul_add(self):
409429
j_g = PointJacobi.from_affine(generator_256)
410430

@@ -428,11 +448,54 @@ def test_mul_add_large(self):
428448
j_g * (0xFF00 + 255 * 0xF0F0), j_g.mul_add(0xFF00, b, 0xF0F0)
429449
)
430450

451+
def test_mul_add_with_infinity_as_result(self):
452+
j_g = PointJacobi.from_affine(generator_256)
453+
454+
order = generator_256.order()
455+
456+
b = PointJacobi.from_affine(generator_256 * 256)
457+
458+
self.assertEqual(j_g.mul_add(order % 256, b, order // 256),
459+
INFINITY)
460+
461+
def test_mul_add_without_order(self):
462+
j_g = PointJacobi(curve_256, generator_256.x(), generator_256.y(), 1)
463+
464+
order = generator_256.order()
465+
466+
w_b = generator_256 * 34
467+
w_b.scale()
468+
469+
b = PointJacobi(curve_256, w_b.x(), w_b.y(), 1)
470+
471+
self.assertEqual(j_g.mul_add(order % 34, b, order // 34),
472+
INFINITY)
473+
474+
def test_mul_add_with_doubled_negation_of_itself(self):
475+
j_g = PointJacobi.from_affine(generator_256 * 17)
476+
477+
order = generator_256.order()
478+
479+
dbl_neg = 2 * (-j_g)
480+
481+
self.assertEqual(j_g.mul_add(4, dbl_neg, 2), INFINITY)
482+
431483
def test_equality(self):
432484
pj1 = PointJacobi(curve=CurveFp(23, 1, 1, 1), x=2, y=3, z=1, order=1)
433485
pj2 = PointJacobi(curve=CurveFp(23, 1, 1, 1), x=2, y=3, z=1, order=1)
434486
self.assertEqual(pj1, pj2)
435487

488+
def test_equality_with_invalid_object(self):
489+
j_g = PointJacobi.from_affine(generator_256)
490+
491+
self.assertNotEqual(j_g, 12)
492+
493+
def test_equality_with_wrong_curves(self):
494+
p_a = PointJacobi.from_affine(generator_256)
495+
p_b = PointJacobi.from_affine(generator_224)
496+
497+
self.assertNotEqual(p_a, p_b)
498+
436499
def test_pickle(self):
437500
pj = PointJacobi(curve=CurveFp(23, 1, 1, 1), x=2, y=3, z=1, order=1)
438501
self.assertEqual(pickle.loads(pickle.dumps(pj)), pj)

tox.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ sitepackages=True
6060
whitelist_externals=coverage
6161
commands =
6262
coverage run --branch -m pytest --hypothesis-show-statistics {posargs:src/ecdsa}
63-
coverage xml
63+
coverage html
6464
coverage report -m
6565

6666
[testenv:speed]

0 commit comments

Comments
 (0)