Skip to content

Thumb2 ASM, Curve25519#6607

Merged
dgarske merged 1 commit intowolfSSL:masterfrom
SparkiDev:curve25519_thumb2
Aug 24, 2023
Merged

Thumb2 ASM, Curve25519#6607
dgarske merged 1 commit intowolfSSL:masterfrom
SparkiDev:curve25519_thumb2

Conversation

@SparkiDev
Copy link
Copy Markdown
Contributor

@SparkiDev SparkiDev commented Jul 13, 2023

Description

Add support for compiling ASM for Thumb2
Add Curve25519 ASM for Thumb2
Limit assembly code compiled when Ed25519 not required.

If ED25519 verify only is needed the sign/keygen can be disabled to reduce code size. The base point is only needed for public key generation (signing/keygen). Use these build options: #define NO_ED25519_MAKE_KEY and #define NO_ED25519_SIGN

Testing

./configure '--disable-shared' 'LDFLAGS=--static' '--host=armv7m' 'CC=arm-linux-gnueabi-gcc' '--enable-curve25519' '--enable-ed25519' '--enable-cryptonly' '--enable-armasm'

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Jul 13, 2023
@dgarske dgarske self-requested a review July 13, 2023 15:24
@dgarske dgarske self-assigned this Jul 13, 2023
@dgarske
Copy link
Copy Markdown
Member

dgarske commented Jul 13, 2023

Note @JacobBarthelmeh or @lealem47 : We'll need to make sure the two new port/arm thumb2 files get added to the CMSIS packs here: https://github.com/wolfSSL/scripts/blob/master/CMSIS/CubePack/wolfSSL/Files/wolfSSL.I-CUBE-wolfSSL.pdsc#L207C24-L207C24

Copy link
Copy Markdown
Member

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I only expect CURVE25519 to work / be accelerated?

CURVE25519 test passed!
ED25519  test failed!
 error L=31069 i=0
 [fiducial line numbers: 7803 23957 34911 47088]

Note: I am using these build options:

#define WOLFSSL_ARM_ARCH 7
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM

Copy link
Copy Markdown
Member

@dgarske dgarske Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With CURVED25519_SMALL fe does not exist:

make: *** Waiting for unfinished jobs....
../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/thumb2-curve25519_c.c:52:19: error: unknown type name 'fe'
   52 | void fe_frombytes(fe out, const unsigned char* in)
      |                   ^~

Seems the new thumb2 code only supports the large build option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Made it so that building small will not compile this file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building from sources where armv8-32-curve25519_c.c gets included it produces build errors. Example:

../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/armv8-32-curve25519_c.c:56:34: error: invalid initializer
   56 |     register fe out asm ("r0") = out_p;
      |                                  ^~~~~
../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/armv8-32-curve25519_c.c: In function 'fe_tobytes':
../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/armv8-32-curve25519_c.c:102:38: error: invalid initializer
  102 |     register const fe n asm ("r1") = n_p;
      |                                      ^~~

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also issues with armv8-*.c being included for people not using configure. Those will likely need some macro protections. Also between the inline .S and .C version of the thumb2. No build option to gate the inline, so manually excluded .S.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it so that WOLFSSL_ARMASM_INLINE is required for the C file to be built. If not defined the assembly file will build.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put protection around each arm32 and thumb2 asm file to check whether or not thumb is defined.

@dgarske
Copy link
Copy Markdown
Member

dgarske commented Jul 13, 2023

Note: The wolfCrypt test past in release mode for ED25519.

Benchmark results on STM32H723 (Cortex M7) at 550MHz (-Os):

Using thumb2-curve25519_c.c:

CURVE  25519  key gen       169 ops took 1.000 sec, avg 5.917 ms, 169.000 ops/sec
CURVE  25519    agree       172 ops took 1.000 sec, avg 5.814 ms, 172.000 ops/sec
ED     25519  key gen       461 ops took 1.000 sec, avg 2.169 ms, 461.000 ops/sec
ED     25519     sign       318 ops took 1.000 sec, avg 3.145 ms, 318.000 ops/sec
ED     25519   verify       142 ops took 1.008 sec, avg 7.099 ms, 140.873 ops/sec

Using the normal software "large" implementation in fe_operations.c:

CURVE  25519  key gen        98 ops took 1.008 sec, avg 10.286 ms, 97.222 ops/sec
CURVE  25519    agree        98 ops took 1.000 sec, avg 10.204 ms, 98.000 ops/sec
ED     25519  key gen       232 ops took 1.004 sec, avg 4.328 ms, 231.076 ops/sec
ED     25519     sign       188 ops took 1.004 sec, avg 5.340 ms, 187.251 ops/sec
ED     25519   verify        88 ops took 1.015 sec, avg 11.534 ms, 86.700 ops/sec

Using the small implementation in fe_low_mem.c:

CURVE  25519  key gen         3 ops took 1.172 sec, avg 390.667 ms, 2.560 ops/sec
CURVE  25519    agree         4 ops took 1.553 sec, avg 388.250 ms, 2.576 ops/sec
ED     25519  key gen         3 ops took 1.188 sec, avg 396.000 ms, 2.525 ops/sec
ED     25519     sign         4 ops took 1.608 sec, avg 402.000 ms, 2.488 ops/sec
ED     25519   verify         2 ops took 1.659 sec, avg 829.500 ms, 1.206 ops/sec

@dgarske dgarske removed their assignment Jul 13, 2023
@SparkiDev SparkiDev force-pushed the curve25519_thumb2 branch 7 times, most recently from ca43133 to cc21a8b Compare July 20, 2023 06:51
@dgarske
Copy link
Copy Markdown
Member

dgarske commented Jul 20, 2023

Updated benchmarks:

Benchmark results on STM32H723 (Cortex M7) at 550MHz (-Os):

Using thumb2-curve25519_c.c:

CURVE  25519  key gen       180 ops took 1.000 sec, avg 5.556 ms, 180.000 ops/sec
CURVE  25519    agree       186 ops took 1.004 sec, avg 5.398 ms, 185.259 ops/sec
ED     25519  key gen       485 ops took 1.000 sec, avg 2.062 ms, 485.000 ops/sec
ED     25519     sign       330 ops took 1.000 sec, avg 3.030 ms, 330.000 ops/sec
ED     25519   verify       150 ops took 1.008 sec, avg 6.720 ms, 148.810 ops/sec

Using the normal software "large" implementation in fe_operations.c:

CURVE  25519  key gen        97 ops took 1.000 sec, avg 10.309 ms, 97.000 ops/sec
CURVE  25519    agree        98 ops took 1.004 sec, avg 10.245 ms, 97.610 ops/sec
ED     25519  key gen       230 ops took 1.000 sec, avg 4.348 ms, 230.000 ops/sec
ED     25519     sign       188 ops took 1.008 sec, avg 5.362 ms, 186.508 ops/sec
ED     25519   verify        86 ops took 1.000 sec, avg 11.628 ms, 86.000 ops/sec

Using the small implementation in fe_low_mem.c:

CURVE  25519  key gen         3 ops took 1.173 sec, avg 391.000 ms, 2.558 ops/sec
CURVE  25519    agree         4 ops took 1.549 sec, avg 387.250 ms, 2.582 ops/sec
ED     25519  key gen         3 ops took 1.192 sec, avg 397.333 ms, 2.517 ops/sec
ED     25519     sign         4 ops took 1.608 sec, avg 402.000 ms, 2.488 ops/sec
ED     25519   verify         2 ops took 1.655 sec, avg 827.500 ms, 1.208 ops/sec
SHA-512/256                  1 MiB took 1.012 seconds,    1.158 MiB/s

@SparkiDev SparkiDev force-pushed the curve25519_thumb2 branch 4 times, most recently from 7da98ca to 263d5bf Compare August 14, 2023 08:13
@dgarske dgarske self-requested a review August 14, 2023 15:23
@dgarske dgarske self-assigned this Aug 14, 2023
@dgarske
Copy link
Copy Markdown
Member

dgarske commented Aug 14, 2023

Updated benchmarks:

Benchmark results on STM32H723 (Cortex M7) at 550MHz (-Os):

SHA-256                      4 MiB took 1.000 seconds,    3.833 MiB/s
SHA-512                      3 MiB took 1.000 seconds,    2.661 MiB/s

CURVE  25519  key gen       391 ops took 1.000 sec, avg 2.558 ms, 391.000 ops/sec
CURVE  25519    agree       404 ops took 1.000 sec, avg 2.475 ms, 404.000 ops/sec
ED     25519  key gen      1282 ops took 1.000 sec, avg 0.780 ms, 1282.000 ops/sec
ED     25519     sign       836 ops took 1.000 sec, avg 1.196 ms, 836.000 ops/sec
ED     25519   verify       348 ops took 1.004 sec, avg 2.885 ms, 346.614 ops/sec

Build options:

#define WOLFSSL_ARM_ARCH 7
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM
#define WC_NO_CACHE_RESISTANT

@dgarske dgarske assigned SparkiDev and unassigned dgarske and SparkiDev Aug 14, 2023
dgarske
dgarske previously approved these changes Aug 15, 2023
Copy link
Copy Markdown
Member

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

STM32H723 (Cortex M7) at 550MHz (-Os):

SHA-256                      4 MiB took 1.000 seconds,    3.882 MiB/s
SHA-512                      3 MiB took 1.000 seconds,    2.686 MiB/s

CURVE  25519  key gen       398 ops took 1.000 sec, avg 2.513 ms, 398.000 ops/sec
CURVE  25519    agree       412 ops took 1.004 sec, avg 2.437 ms, 410.359 ops/sec
ED     25519  key gen      1261 ops took 1.000 sec, avg 0.793 ms, 1261.000 ops/sec
ED     25519     sign       818 ops took 1.000 sec, avg 1.222 ms, 818.000 ops/sec
ED     25519   verify       346 ops took 1.003 sec, avg 2.899 ms, 344.965 ops/sec
#define WOLFSSL_ARM_ARCH 7
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM
#define WC_NO_CACHE_RESISTANT

@SparkiDev SparkiDev force-pushed the curve25519_thumb2 branch 2 times, most recently from 1437040 to 1e467e2 Compare August 23, 2023 00:46
@dgarske
Copy link
Copy Markdown
Member

dgarske commented Aug 23, 2023

Retest this please

@dgarske
Copy link
Copy Markdown
Member

dgarske commented Aug 23, 2023

@SparkiDev : If enabling just ED25519 with the new Thumb feature I get errors. Does the ASM require both Curve25519 and Ed25519 be enabled?

If I enable only ED25519 verify there is only one error:

ed25519.c:(.text.wc_ed25519_init_ex+0x12): undefined reference to `fe_init'

@bandi13
Copy link
Copy Markdown
Contributor

bandi13 commented Aug 23, 2023

Retest this please

2 similar comments
@bandi13
Copy link
Copy Markdown
Contributor

bandi13 commented Aug 23, 2023

Retest this please

@bandi13
Copy link
Copy Markdown
Contributor

bandi13 commented Aug 23, 2023

Retest this please

Add support for compiling ASM for Thumb2
Add Curve25519 ASM for Thumb2
Limit assembly code compiled when Ed25519 not required.
Rework all assembly implementations to replace ge_*() functions instead
of having fe_ge_*() versions that take many parameters.
Get ARM32 inline asm working.
@SparkiDev
Copy link
Copy Markdown
Contributor Author

I believe I've fixed the issue with building only ED25519.

Copy link
Copy Markdown
Member

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully tested and meets customers expectations. Thanks Sean!

@dgarske dgarske merged commit 88ad5ce into wolfSSL:master Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants