Code Generator for Tall & Skinny Matrix Multiplications in CUDA
The code that was used to produce the results described in the papers "Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs"] (http://dx.doi.org/10.1007/978-3-030-43229-4_43) and "Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs" (http://dx.doi.org/10.1177/1094342020965661).