Please sign in to comment.
ICP: Improve AES-GCM performance
Currently SIMD accelerated AES-GCM performance is limited by two factors: a. The need to disable preemption and interrupts and save the FPU state before using it and to do the reverse when done. Due to the way the code is organized (see (b) below) we have to pay this price twice for each 16 byte GCM block processed. b. Most processing is done in C, operating on single GCM blocks. The use of SIMD instructions is limited to the AES encryption of the counter block (AES-NI) and the Galois multiplication (PCLMULQDQ). This leads to the FPU not being fully utilized for crypto operations. To solve (a) we do crypto processing in larger chunks while owning the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced to make this chunk size tweakable. It defaults to 32 KiB. This step alone roughly doubles performance. (b) is tackled by porting and using the highly optimized openssl AES-GCM assembler routines, which do all the processing (CTR, AES, GMULT) in a single routine. Both steps together result in up to 32x reduction of the time spend in the en/decryption routines, leading up to approximately 12x throughput increase for large (128 KiB) blocks. Signed-off-by: Attila Fülöp <firstname.lastname@example.org>
- Loading branch information
Showing with 2,397 additions and 21 deletions.
- +20 −0 config/toolchain-simd.m4
- +13 −0 include/os/linux/kernel/linux/simd_x86.h
- +2 −0 lib/libicp/Makefile.am
- +14 −1 lib/libspl/include/sys/simd.h
- +2 −0 module/icp/Makefile.in
- +708 −18 module/icp/algs/modes/gcm.c
- +892 −0 module/icp/asm-x86_64/modes/aesni-gcm-x86_64.S
- +714 −0 module/icp/asm-x86_64/modes/ghash-x86_64.S
- +5 −0 module/icp/include/aes/aes_impl.h
- +27 −2 module/icp/include/modes/modes.h
Oops, something went wrong.