Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turbocharge by using Jacobi representation, 2-ary NAF and precomputation #127

Merged
merged 34 commits into from
Dec 2, 2019

Conversation

tomato42
Copy link
Member

@tomato42 tomato42 commented Oct 6, 2019

Use Jacobi representation for internal calculations. Precompute point doublings for curve generators and public keys. Use 2-ary Non-Adjacent Form for multiplication (both for regular and precomputed points).

ver 0.14.1:

                  siglen    keygen   keygen/s      sign     sign/s    verify   verify/s
        NIST192p:     48   0.01550s     64.50   0.00826s    121.04   0.01605s     62.32
        NIST224p:     56   0.02100s     47.61   0.01131s     88.44   0.02281s     43.85
        NIST256p:     64   0.02828s     35.36   0.01514s     66.04   0.02988s     33.47
        NIST384p:     96   0.06678s     14.97   0.03536s     28.28   0.07114s     14.06
        NIST521p:    132   0.13108s      7.63   0.07089s     14.11   0.13921s      7.18
       SECP256k1:     64   0.02723s     36.72   0.01472s     67.93   0.02901s     34.47

this branch:

                  siglen    keygen   keygen/s      sign     sign/s    verify   verify/s
        NIST192p:     48   0.00033s   3062.43   0.00036s   2748.08   0.00062s   1605.66
        NIST224p:     56   0.00041s   2424.07   0.00046s   2196.71   0.00083s   1205.81
        NIST256p:     64   0.00053s   1892.05   0.00058s   1735.23   0.00106s    944.82
        NIST384p:     96   0.00110s    904.98   0.00118s    847.82   0.00217s    460.26
        NIST521p:    132   0.00234s    428.24   0.00245s    408.54   0.00443s    225.95
       SECP256k1:     64   0.00053s   1891.93   0.00058s   1734.46   0.00109s    913.35
 BRAINPOOLP160r1:     40   0.00025s   3982.49   0.00029s   3490.15   0.00053s   1878.51
 BRAINPOOLP192r1:     48   0.00032s   3086.07   0.00036s   2761.68   0.00063s   1578.22
 BRAINPOOLP224r1:     56   0.00041s   2412.41   0.00046s   2185.52   0.00076s   1311.65
 BRAINPOOLP256r1:     64   0.00054s   1866.84   0.00058s   1719.45   0.00110s    906.85
 BRAINPOOLP320r1:     80   0.00077s   1306.97   0.00083s   1201.59   0.00158s    632.82
 BRAINPOOLP384r1:     96   0.00112s    892.44   0.00119s    841.48   0.00229s    436.71
 BRAINPOOLP512r1:    128   0.00214s    467.05   0.00226s    441.64   0.00422s    237.13

for comparison, OpenSSL on same machine achieves:

                              sign    verify    sign/s verify/s
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  45320.1  14171.3
 384 bits ecdsa (nistp384)   0.0008s   0.0006s   1259.5   1653.3
 256 bits ecdsa (brainpoolP256r1)   0.0003s   0.0003s   2983.5   3333.2
 384 bits ecdsa (brainpoolP384r1)   0.0008s   0.0007s   1258.8   1528.1
 512 bits ecdsa (brainpoolP512r1)   0.0015s   0.0012s    675.1    860.1

(OpenSSL has hand-optimised assembly with AVX for nistp256/NIST256p curve)

depends on #126, fixes #94 (while there are more efficient algorithms still, being 3.6 times slower than generic C implementation for verification and 33% slower for signing is acceptable to me 😄 )

or when gmpy2 is used, being like 7% slower for verify and more than 2 times faster for signing 😁

                  siglen    keygen   keygen/s      sign     sign/s    verify   verify/s
        NIST192p:     48   0.00016s   6180.12   0.00017s   5846.30   0.00033s   3029.51
        NIST224p:     56   0.00021s   4861.86   0.00021s   4662.63   0.00042s   2366.47
        NIST256p:     64   0.00023s   4343.93   0.00024s   4152.79   0.00047s   2148.83
        NIST384p:     96   0.00040s   2507.97   0.00041s   2435.99   0.00079s   1260.01
        NIST521p:    132   0.00088s   1135.13   0.00089s   1121.94   0.00139s    721.07
       SECP256k1:     64   0.00023s   4360.83   0.00024s   4147.61   0.00044s   2254.53
 BRAINPOOLP160r1:     40   0.00014s   7261.37   0.00015s   6824.47   0.00031s   3248.21
 BRAINPOOLP192r1:     48   0.00016s   6219.18   0.00017s   5862.93   0.00034s   2933.74
 BRAINPOOLP224r1:     56   0.00021s   4876.48   0.00022s   4640.40   0.00041s   2413.48
 BRAINPOOLP256r1:     64   0.00023s   4397.89   0.00024s   4178.48   0.00044s   2272.76
 BRAINPOOLP320r1:     80   0.00031s   3246.64   0.00032s   3138.38   0.00063s   1593.14
 BRAINPOOLP384r1:     96   0.00040s   2491.04   0.00041s   2421.67   0.00079s   1262.64
 BRAINPOOLP512r1:    128   0.00062s   1618.30   0.00063s   1577.42   0.00125s    799.29

To do:

  • rename the "classical" representation to "affine"
  • merge fixup commits to earlier ones
  • add documentation
  • test coverage
  • make sure that all the expensive test cases use the new PointJacobi implementation
  • use jacobi representation for Brainpool curves too
  • avoid constructing PointJacobi objects over and over when multiplying
  • add to travis tests with oldest gmpy and gmpy2 that work for us, specify them in setup.py

@tomato42 tomato42 added the feature functionality to be implemented label Oct 6, 2019
@tomato42 tomato42 self-assigned this Oct 6, 2019
@coveralls
Copy link

coveralls commented Oct 6, 2019

Coverage Status

Coverage increased (+1.5%) to 96.233% when pulling a67da69 on tomato42:jacobi into 34864b1 on warner:master.

@tomato42 tomato42 force-pushed the jacobi branch 2 times, most recently from cce7792 to ee9775b Compare October 10, 2019 23:31
@tomato42 tomato42 changed the title Turbocharge by using Jacobi representation internally Turbocharge by using Jacobi representation, 2-ary NAF and precomputation Oct 12, 2019
@tomato42 tomato42 added this to the someday/future milestone Oct 13, 2019
@tomato42 tomato42 changed the title Turbocharge by using Jacobi representation, 2-ary NAF and precomputation [WIP] Turbocharge by using Jacobi representation, 2-ary NAF and precomputation Oct 19, 2019
@tomato42 tomato42 force-pushed the jacobi branch 3 times, most recently from e29976c to a8466ca Compare November 9, 2019 22:57
@tomato42 tomato42 force-pushed the jacobi branch 5 times, most recently from 77e7e69 to d3d9f07 Compare November 13, 2019 01:45
@tomato42 tomato42 changed the title [WIP] Turbocharge by using Jacobi representation, 2-ary NAF and precomputation Turbocharge by using Jacobi representation, 2-ary NAF and precomputation Nov 13, 2019
@tomato42 tomato42 modified the milestones: someday/future, v0.15 Nov 13, 2019
@tomato42 tomato42 force-pushed the jacobi branch 5 times, most recently from 072980a to c0eafc9 Compare November 23, 2019 15:30
tomato42 and others added 25 commits December 2, 2019 18:04
since inverse_mod is very computationally expensive (around 100
multiplications) it's cheaper to just bring the fractions to the
same denominator
when loading public keys, perform the point verification just once
when loading private keys, do not verify the derived public point
make all test cases execute in less than 0.3s on i7 4790K
don't treat the universal code for point addition specially
since this avoids creating new PointJacobi after every addition
it makes the signing about 20% faster
looks like few merges/rebases didn't go exactly as planned and ended up
duplicating test code, remove it
since some branching in hypothesis strategies and in handling
different python, hypothesis, openssl and unittest versions is necessary,
ignore them for branch coverage

remove benchmarking code and dead code from test_pyecdsa.py
(we have speed.py now)

and exclude a disabled test case from coverage
the x and y needs to be on curve, so they need to be smaller than the
curve's prime, not the base point order

See Section 3.2.2.1 of SEC 1 v2
since multiplying a point by the order is farily expensive, skipping
it (when safe to do so) greatly increases performance

does not increase the speed.py numbers as point verification happens
outside the signing and verifying operations
since on older distros like CentOS 6 there is python-gmpy but not
python-gmpy2, support gmpy too
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature functionality to be implemented
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use more efficient algorithm for scalar multiplication
3 participants