Fast inference engine for Transformer models
-
Updated
Jul 11, 2024 - C++
Fast inference engine for Transformer models
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.
Tuned OpenCL BLAS
OpenMP Matrix Multiplication Offloading Playground
Development of deep learning inference code by OpenCL kerenl function.
DGEMM on KNL, achieve 75% MKL
CUDA Gemm Convolution implementation
Course Programming on new Architecture-1 (GPU), autumn 2021
Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.
Serial and parallel implementations of matrix multiplication
My experiments with convolution
Low Precision Arithmetic for Convolutional Neural Network Inference
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."