Skip to content

cgemm and zgemm subroutines for large matrices, using avx2 and fma3 instructions, with performance comparable to MKL2018

License

Notifications You must be signed in to change notification settings

wjc404/COMPLEX_GEMM_AVX2_FMA3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMPLEX_GEMM_AVX2_FMA3

Heavily-optimized cgemm and zgemm subroutines for large matrices(dim 3000-30000), using avx2 and fma3 instructions, with performance comparable to MKL2018, able to achieve >95% theoretical performance in serial executions.

interface: fortran, 32-bit integer

Tuned parameters (see Makefile):

Core i9 9900K: BlkDimN = 192, B_PR_ELEM = 40, A_PR_BYTE = 192 or 256; BlkDimK: 128 for ZGEMM, 256 for CGEMM.
Ryzen 7 3700X: BlkDimN = 96,  B_PR_ELEM = 24, A_PR_BYTE = 256; BlkDimK: 128 for ZGEMM, 256 for CGEMM.

About

cgemm and zgemm subroutines for large matrices, using avx2 and fma3 instructions, with performance comparable to MKL2018

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published