Optimization of Attention layers for efficient inferencing on the CPU and GPU. It covers optimizations for AVX and CUDA also efficient memory processing techniques.
deep-learning
hpc
transformers
avx
avx2
cuda-kernels
attention-is-all-you-need
cuda-programming
openm
-
Updated
Dec 18, 2023 - C++