Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES RISC-V 64-bit ASM: ECB/CBC/CTR/GCM/CCM #7569

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

SparkiDev
Copy link
Contributor

Description

Add implementations of AES for ECB/CBC/CTR/GCM/CCM for RISC-V using assembly.
Assembly with standard/scalar cryptography/vector cryptographt instructions.

Testing

./configure --enable-all --enable-riscv-asm

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this May 22, 2024
@SparkiDev
Copy link
Contributor Author

retest this please

@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev May 22, 2024
@dgarske dgarske self-assigned this May 22, 2024
@dgarske
Copy link
Contributor

dgarske commented May 22, 2024

@SparkiDev I've got a few RISC-V targets here now, so I will try this on actual HW.

Copy link
Contributor

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./configure --enable-all --enable-riscv-asm

wolfcrypt/src/aes.c: In function '_AesXtsHelper':
wolfcrypt/src/aes.c:12631:16: error: implicit declaration of function '_AesEcbEncrypt'; did you mean 'wc_AesEcbEncrypt'? [-Werror=implicit-function-declaration]
12631 |         return _AesEcbEncrypt(aes, out, out, totalSz);
      |                ^~~~~~~~~~~~~~
      |                wc_AesEcbEncrypt
wolfcrypt/src/aes.c:12631:16: error: nested extern declaration of '_AesEcbEncrypt' [-Werror=nested-externs]
wolfcrypt/src/aes.c:12634:16: error: implicit declaration of function '_AesEcbDecrypt'; did you mean 'wc_AesEcbDecrypt'? [-Werror=implicit-function-declaration]
12634 |         return _AesEcbDecrypt(aes, out, out, totalSz);
      |                ^~~~~~~~~~~~~~
      |                wc_AesEcbDecrypt
wolfcrypt/src/aes.c:12634:16: error: nested extern declaration of '_AesEcbDecrypt' [-Werror=nested-externs]

@dgarske
Copy link
Contributor

dgarske commented May 26, 2024

HiFive Unleashed at 1.4GHz
The new asm is like 50 times faster

./configure --enable-riscv-asm && make

root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm------------------------------------------------------------------------------
 wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc             20 MiB took 1.076 seconds,   18.588 MiB/s
AES-128-CBC-dec             20 MiB took 1.083 seconds,   18.473 MiB/s
AES-192-CBC-enc             20 MiB took 1.245 seconds,   16.062 MiB/s
AES-192-CBC-dec             20 MiB took 1.246 seconds,   16.047 MiB/s
AES-256-CBC-enc             15 MiB took 1.057 seconds,   14.189 MiB/s
AES-256-CBC-dec             15 MiB took 1.055 seconds,   14.212 MiB/s
AES-128-GCM-enc             15 MiB took 1.300 seconds,   11.543 MiB/s
AES-128-GCM-dec             15 MiB took 1.300 seconds,   11.535 MiB/s
AES-192-GCM-enc             15 MiB took 1.425 seconds,   10.526 MiB/s
AES-192-GCM-dec             15 MiB took 1.425 seconds,   10.523 MiB/s
AES-256-GCM-enc             10 MiB took 1.032 seconds,    9.687 MiB/s
AES-256-GCM-dec             10 MiB took 1.032 seconds,    9.691 MiB/s
GMAC Table 4-bit            31 MiB took 1.025 seconds,   30.251 MiB/s
Benchmark complete

On master

./configure —enable-all && make

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm
------------------------------------------------------------------------------
 wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc              5 MiB took 12.798 seconds,    0.391 MiB/s
AES-128-CBC-dec              5 MiB took 12.672 seconds,    0.395 MiB/s
AES-192-CBC-enc              5 MiB took 15.301 seconds,    0.327 MiB/s
AES-192-CBC-dec              5 MiB took 15.181 seconds,    0.329 MiB/s
AES-256-CBC-enc              5 MiB took 17.820 seconds,    0.281 MiB/s
AES-256-CBC-dec              5 MiB took 17.669 seconds,    0.283 MiB/s
AES-128-GCM-enc              5 MiB took 12.870 seconds,    0.388 MiB/s
AES-128-GCM-dec              5 MiB took 12.870 seconds,    0.388 MiB/s
AES-192-GCM-enc              5 MiB took 15.375 seconds,    0.325 MiB/s
AES-192-GCM-dec              5 MiB took 15.376 seconds,    0.325 MiB/s
AES-256-GCM-enc              5 MiB took 17.878 seconds,    0.280 MiB/s
AES-256-GCM-dec              5 MiB took 17.896 seconds,    0.279 MiB/s
AES-128-GCM-STREAM-enc       5 MiB took 12.878 seconds,    0.388 MiB/s
AES-128-GCM-STREAM-dec       5 MiB took 12.878 seconds,    0.388 MiB/s
AES-192-GCM-STREAM-enc       5 MiB took 15.379 seconds,    0.325 MiB/s
AES-192-GCM-STREAM-dec       5 MiB took 15.385 seconds,    0.325 MiB/s
AES-256-GCM-STREAM-enc       5 MiB took 17.881 seconds,    0.280 MiB/s
AES-256-GCM-STREAM-dec       5 MiB took 17.888 seconds,    0.280 MiB/s
GMAC Table 4-bit            30 MiB took 1.006 seconds,   29.831 MiB/s
Benchmark complete

@dgarske dgarske assigned SparkiDev and unassigned dgarske May 26, 2024
@dgarske
Copy link
Contributor

dgarske commented May 29, 2024

./configure --enable-all --enable-riscv-asm

wolfcrypt/src/aes.c: In function '_AesXtsHelper':
wolfcrypt/src/aes.c:12631:16: error: implicit declaration of function '_AesEcbEncrypt'; did you mean 'wc_AesEcbEncrypt'? [-Werror=implicit-function-declaration]
12631 |         return _AesEcbEncrypt(aes, out, out, totalSz);
      |                ^~~~~~~~~~~~~~
      |                wc_AesEcbEncrypt
wolfcrypt/src/aes.c:12631:16: error: nested extern declaration of '_AesEcbEncrypt' [-Werror=nested-externs]
wolfcrypt/src/aes.c:12634:16: error: implicit declaration of function '_AesEcbDecrypt'; did you mean 'wc_AesEcbDecrypt'? [-Werror=implicit-function-declaration]
12634 |         return _AesEcbDecrypt(aes, out, out, totalSz);
      |                ^~~~~~~~~~~~~~
      |                wc_AesEcbDecrypt
wolfcrypt/src/aes.c:12634:16: error: nested extern declaration of '_AesEcbDecrypt' [-Werror=nested-externs]

@SparkiDev says AES XTS is not yet support with RISC-V ASM. Note: I tried to use ./configure --enable-all --disable-aesxtx --enable-riscv-asm but that didn't work. We normally support a way to disable a specific option with all. Sean please review.

@dgarske
Copy link
Contributor

dgarske commented Jun 3, 2024

@SparkiDev is this RISC-V ASM PR ready for merge? I can’t tell if you are planning to push anything else to it.

Add implementations of AES for ECB/CBC/CTR/GCM/CCM for RISC-V using
assembly.
Assembly with standard/scalar cryptography/vector cryptographt
instructions.
@SparkiDev
Copy link
Contributor Author

Fixed --enable-all to work.

@SparkiDev
Copy link
Contributor Author

retest this please

@SparkiDev SparkiDev assigned dgarske and wolfSSL-Bot and unassigned SparkiDev Jun 6, 2024
@dgarske
Copy link
Contributor

dgarske commented Jun 6, 2024

Updated benchmarks:

HiFive Unleashed at 1.4GHz

./configure --enable-all --enable-riscv-asm
make

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.488 seconds,    6.721 MiB/s
AES-128-CBC-enc             20 MiB took 1.139 seconds,   17.554 MiB/s
AES-128-CBC-dec             20 MiB took 1.145 seconds,   17.470 MiB/s
AES-192-CBC-enc             20 MiB took 1.321 seconds,   15.144 MiB/s
AES-192-CBC-dec             20 MiB took 1.321 seconds,   15.139 MiB/s
AES-256-CBC-enc             15 MiB took 1.115 seconds,   13.450 MiB/s
AES-256-CBC-dec             15 MiB took 1.123 seconds,   13.361 MiB/s
AES-128-GCM-enc             15 MiB took 1.395 seconds,   10.750 MiB/s
AES-128-GCM-dec             15 MiB took 1.372 seconds,   10.933 MiB/s
AES-192-GCM-enc             10 MiB took 1.007 seconds,    9.930 MiB/s
AES-192-GCM-dec             10 MiB took 1.006 seconds,    9.940 MiB/s
AES-256-GCM-enc             10 MiB took 1.088 seconds,    9.188 MiB/s
AES-256-GCM-dec             10 MiB took 1.088 seconds,    9.192 MiB/s
GMAC Table 4-bit            31 MiB took 1.029 seconds,   30.136 MiB/s
AES-128-ECB-enc             22 MiB took 1.218 seconds,   18.063 MiB/s
AES-128-ECB-dec             22 MiB took 1.209 seconds,   18.191 MiB/s
AES-192-ECB-enc             22 MiB took 1.414 seconds,   15.556 MiB/s
AES-192-ECB-dec             22 MiB took 1.406 seconds,   15.644 MiB/s
AES-256-ECB-enc             22 MiB took 1.601 seconds,   13.740 MiB/s
AES-256-ECB-dec             22 MiB took 1.608 seconds,   13.677 MiB/s
AES-XTS-enc                 15 MiB took 1.193 seconds,   12.569 MiB/s
AES-XTS-dec                 15 MiB took 1.190 seconds,   12.608 MiB/s
AES-128-CFB                 20 MiB took 1.319 seconds,   15.167 MiB/s
AES-192-CFB                 15 MiB took 1.115 seconds,   13.447 MiB/s
AES-256-CFB                 15 MiB took 1.240 seconds,   12.092 MiB/s
AES-128-OFB                 20 MiB took 1.316 seconds,   15.202 MiB/s
AES-192-OFB                 15 MiB took 1.114 seconds,   13.461 MiB/s
AES-256-OFB                 15 MiB took 1.240 seconds,   12.094 MiB/s
AES-128-CTR                 20 MiB took 1.134 seconds,   17.639 MiB/s
AES-192-CTR                 20 MiB took 1.317 seconds,   15.181 MiB/s
AES-256-CTR                 15 MiB took 1.109 seconds,   13.526 MiB/s
AES-CCM-enc                 10 MiB took 1.087 seconds,    9.202 MiB/s
AES-CCM-dec                 10 MiB took 1.088 seconds,    9.194 MiB/s
AES-256-SIV-enc             10 MiB took 1.151 seconds,    8.686 MiB/s
AES-256-SIV-dec             10 MiB took 1.149 seconds,    8.704 MiB/s
AES-384-SIV-enc             10 MiB took 1.330 seconds,    7.521 MiB/s
AES-384-SIV-dec             10 MiB took 1.329 seconds,    7.526 MiB/s
AES-512-SIV-enc             10 MiB took 1.497 seconds,    6.681 MiB/s
AES-512-SIV-dec             10 MiB took 1.496 seconds,    6.683 MiB/s
Camellia                    15 MiB took 1.297 seconds,   11.563 MiB/s
ARC4                        30 MiB took 1.121 seconds,   26.756 MiB/s
CHACHA                      30 MiB took 1.016 seconds,   29.525 MiB/s
CHA-POLY                    25 MiB took 1.140 seconds,   21.934 MiB/s
3DES                         5 MiB took 1.632 seconds,    3.064 MiB/s
MD5                         75 MiB took 1.050 seconds,   71.403 MiB/s
POLY1305                    90 MiB took 1.053 seconds,   85.442 MiB/s
SHA                         35 MiB took 1.101 seconds,   31.787 MiB/s
SHA-224                     20 MiB took 1.112 seconds,   17.980 MiB/s
SHA-256                     20 MiB took 1.114 seconds,   17.952 MiB/s
SHA-384                     15 MiB took 1.359 seconds,   11.038 MiB/s
SHA-512                     15 MiB took 1.315 seconds,   11.406 MiB/s
SHA-512/224                 15 MiB took 1.461 seconds,   10.269 MiB/s
SHA-512/256                 15 MiB took 1.461 seconds,   10.266 MiB/s
SHA3-224                    20 MiB took 1.187 seconds,   16.849 MiB/s
SHA3-256                    20 MiB took 1.250 seconds,   15.998 MiB/s
SHA3-384                    15 MiB took 1.197 seconds,   12.532 MiB/s
SHA3-512                    10 MiB took 1.140 seconds,    8.770 MiB/s
SHAKE128                    20 MiB took 1.034 seconds,   19.339 MiB/s
SHAKE256                    20 MiB took 1.250 seconds,   16.002 MiB/s
RIPEMD                      20 MiB took 1.071 seconds,   18.679 MiB/s
BLAKE2b                     30 MiB took 1.155 seconds,   25.973 MiB/s
BLAKE2s                     20 MiB took 1.202 seconds,   16.637 MiB/s
AES-128-CMAC                20 MiB took 1.166 seconds,   17.149 MiB/s
AES-256-CMAC                15 MiB took 1.136 seconds,   13.200 MiB/s
HMAC-MD5                    75 MiB took 1.050 seconds,   71.403 MiB/s
HMAC-SHA                    35 MiB took 1.099 seconds,   31.834 MiB/s
HMAC-SHA224                 20 MiB took 1.115 seconds,   17.931 MiB/s
HMAC-SHA256                 20 MiB took 1.116 seconds,   17.921 MiB/s
HMAC-SHA384                 20 MiB took 1.134 seconds,   17.640 MiB/s
HMAC-SHA512                 20 MiB took 1.182 seconds,   16.917 MiB/s
PBKDF2                       2 KiB took 1.011 seconds,    2.195 KiB/s
SipHash-8                  130 MiB took 1.018 seconds,  127.690 MiB/s
SipHash-16                 130 MiB took 1.018 seconds,  127.697 MiB/s
KDF      128     SRTP    205045 ops took 1.000 sec, avg 0.005 ms, 205041.431 ops/sec
KDF      256     SRTP    140095 ops took 1.000 sec, avg 0.007 ms, 140092.996 ops/sec
KDF      128    SRTCP    204845 ops took 1.000 sec, avg 0.005 ms, 204843.486 ops/sec
KDF      256    SRTCP    139070 ops took 1.000 sec, avg 0.007 ms, 139067.480 ops/sec
scrypt    17                 10 ops took 5.608 sec, avg 560.843 ms, 1.783 ops/sec
RSA     1024  key gen         6 ops took 1.163 sec, avg 193.831 ms, 5.159 ops/sec
RSA     2048  key gen         1 ops took 2.187 sec, avg 2186.849 ms, 0.457 ops/sec
RSA     2048   public      1400 ops took 1.065 sec, avg 0.761 ms, 1314.340 ops/sec
RSA     2048  private       100 ops took 3.932 sec, avg 39.325 ms, 25.429 ops/sec
DH      2048  key gen       109 ops took 1.007 sec, avg 9.242 ms, 108.205 ops/sec
DH      2048    agree       100 ops took 1.953 sec, avg 19.530 ms, 51.202 ops/sec
ECC   [      SECP256R1]   256  key gen      1000 ops took 1.065 sec, avg 1.065 ms, 939.342 ops/sec
ECDHE [      SECP256R1]   256    agree      1000 ops took 1.014 sec, avg 1.014 ms, 985.994 ops/sec
ECDSA [      SECP256R1]   256     sign       900 ops took 1.112 sec, avg 1.236 ms, 809.309 ops/sec
ECDSA [      SECP256R1]   256   verify       700 ops took 1.030 sec, avg 1.472 ms, 679.428 ops/sec
ECC   [      SECP256R1]   256  encrypt       900 ops took 1.051 sec, avg 1.168 ms, 856.368 ops/sec
ECC   [      SECP256R1]   256  decrypt       800 ops took 1.106 sec, avg 1.382 ms, 723.377 ops/sec
ECC   [BRAINPOOLP256R1]   256  key gen       900 ops took 1.080 sec, avg 1.200 ms, 833.102 ops/sec
ECDHE [BRAINPOOLP256R1]   256    agree       900 ops took 1.034 sec, avg 1.149 ms, 870.528 ops/sec
ECDSA [BRAINPOOLP256R1]   256     sign       800 ops took 1.093 sec, avg 1.366 ms, 731.855 ops/sec
ECDSA [BRAINPOOLP256R1]   256   verify       700 ops took 1.088 sec, avg 1.554 ms, 643.652 ops/sec
ECC   [BRAINPOOLP256R1]   256  encrypt       800 ops took 1.044 sec, avg 1.305 ms, 766.018 ops/sec
ECC   [BRAINPOOLP256R1]   256  decrypt       700 ops took 1.101 sec, avg 1.574 ms, 635.508 ops/sec
CURVE  25519  key gen      1154 ops took 1.000 sec, avg 0.867 ms, 1153.836 ops/sec
CURVE  25519    agree      1200 ops took 1.013 sec, avg 0.844 ms, 1184.526 ops/sec
ED     25519  key gen      2273 ops took 1.000 sec, avg 0.440 ms, 2272.384 ops/sec
ED     25519     sign      2100 ops took 1.032 sec, avg 0.491 ms, 2035.573 ops/sec
ED     25519   verify      1000 ops took 1.035 sec, avg 1.035 ms, 966.428 ops/sec
CURVE    448  key gen       373 ops took 1.002 sec, avg 2.685 ms, 372.413 ops/sec
CURVE    448    agree       400 ops took 1.063 sec, avg 2.659 ms, 376.125 ops/sec
ED       448  key gen       746 ops took 1.000 sec, avg 1.341 ms, 745.990 ops/sec
ED       448     sign       800 ops took 1.121 sec, avg 1.401 ms, 713.916 ops/sec
ED       448   verify       400 ops took 1.316 sec, avg 3.289 ms, 303.998 ops/sec
ECCSI    256  key gen       774 ops took 1.001 sec, avg 1.293 ms, 773.568 ops/sec
ECCSI    256 pair gen       937 ops took 1.000 sec, avg 1.067 ms, 936.782 ops/sec
ECCSI    256    valid       582 ops took 1.000 sec, avg 1.719 ms, 581.719 ops/sec
ECCSI    256     sign       854 ops took 1.001 sec, avg 1.172 ms, 853.512 ops/sec
ECCSI    256   verify       247 ops took 1.000 sec, avg 4.049 ms, 246.951 ops/sec
SAKKE   1024  key gen        15 ops took 1.008 sec, avg 67.193 ms, 14.882 ops/sec
SAKKE   1024  rsk gen        39 ops took 1.017 sec, avg 26.083 ms, 38.339 ops/sec
SAKKE   1024    valid         4 ops took 1.105 sec, avg 276.192 ms, 3.621 ops/sec
SAKKE   1024    encap-1       6 ops took 1.037 sec, avg 172.764 ms, 5.788 ops/sec
SAKKE   1024   derive-1       4 ops took 1.189 sec, avg 297.185 ms, 3.365 ops/sec
SAKKE   1024    encap-2       6 ops took 1.032 sec, avg 172.008 ms, 5.814 ops/sec
SAKKE   1024   derive-2       4 ops took 1.188 sec, avg 296.979 ms, 3.367 ops/sec
SAKKE   1024   derive-3       4 ops took 1.187 sec, avg 296.857 ms, 3.369 ops/sec
SAKKE   1024   derive-4       4 ops took 1.188 sec, avg 296.881 ms, 3.368 ops/sec
Benchmark complete

@dgarske dgarske merged commit b69482f into wolfSSL:master Jun 6, 2024
109 checks passed
jefferyq2 pushed a commit to jefferyq2/wolfssl that referenced this pull request Jun 9, 2024
AES RISC-V 64-bit ASM: ECB/CBC/CTR/GCM/CCM
@gojimmypi
Copy link
Contributor

@dgarske - question on the benchmarks fix data size vs fixed time:

In the master and riscv_aes_asm branch you ran these commands, respectively:

# before, on master
./configure —enable-all

vs

# after, with ASM Optimization
./configure --enable-all --enable-riscv-asm

Then for comparison your ran this for both:

./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm

The output on master took a fixed 5MB chunk of data and timed the completion: in this example 12.798 seconds:

AES-128-CBC-enc              5 MiB took 12.798 seconds,    0.391 MiB/s

The output on riscv_aes_asm completed as soon as reasonable after a fixed one second duration and determined the amount of data processed:

AES-128-CBC-enc             20 MiB took 1.076 seconds,   18.588 MiB/s

Why the difference in fixed data size vs fixed time?

Additionally, perhaps just nit-picky, but curious: it appears there was also difference in bits size generated. bits=4096:

# master
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c

vs bits=3072

# ASM
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c

It appears that ./configure --enable-all --enable-riscv-asm produces different user_settings.h than ./configure —enable-all affecting more than just assembly optimization. Perhaps it should be consistent, at least for the benchmark configuration? I'm also left wondering for a real apples-to-apples if master was set to bits=3072 whether there would be a performance difference?

In any case - that's an astonishing performance boost by @SparkiDev :)

@SparkiDev
Copy link
Contributor Author

The size of data processed is the number of 1048576 byte (=1MB) buffers encrypted/decrypted.
We do a minimum number of buffers regardless of platform but no less than for 1 second.

I have no idea why the number of bits in SP changed though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants