Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Updated
Nov 7, 2023 - Cuda
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Algorithms implemented in CUDA + resources about GPGPU
code for benchmarking GPU performance based on cublasSgemm and cublasHgemm
bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码
Harness the power of GPU acceleration for fusing visual odometry and IMU data with an advanced Unscented Kalman Filter (UKF) implementation. Developed in C++ and utilizing CUDA, cuBLAS, and cuSOLVER, this system offers unparalleled real-time performance in state and covariance estimation for robotics and autonomous system applications.
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Lab exercise of Parallel Processing course in NTUA regarding CUDA programming
CUDA kernel functions
Matrix Exponential Approximation using CUDA
Generalized Orthogonal Least-Squares in CUDA
Nonnegative matrix factorizations using CUDA
GPGPU Inverse Distance Weighting using matrix vector multiplication
Newton's and Halley's Method for the Matrix Polar Decomposition using CUDA
A MNIST handwritten digit classifier written from scratch in Cuda - C
A CUDA approach for computing the multiplication of a transposed matrix with the initial one, using the cuBLAS library.
Add a description, image, and links to the cublas topic page so that developers can more easily learn about it.
To associate your repository with the cublas topic, visit your repo's landing page and select "manage topics."