Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts
-
Updated
Aug 29, 2022 - Cuda
Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts
A simple and understandable CUDA kernel for batch-matmul operation
The simplest but fast implementation of matrix multiplication in CUDA.
Code for Sparse Matrix and Vector multiplication. Parallelised using CUDA and MPI
High performance computing with GPU
CUDA kernel functions
Bellman-Ford Algorithm and Matrix Multiplication in CUDA
Classical and Strassen's Matrix Mutiplication in CUDA and OpenMP
The repo contains program for matrix multiplication using CUDA
Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.
Level 3 matrix multiplication using both cublas and mkl.
A CUBLAS‐CUDA Based Implementation of Multi-GPU Large Matrix Multiplication
Simulating applying a single gate on a single qubit in an n-qubits circuit with CUDA kernel.
Boolean matrix multiplication accelerated by the four-Russians algorithm
Add a description, image, and links to the matrix-multiplication topic page so that developers can more easily learn about it.
To associate your repository with the matrix-multiplication topic, visit your repo's landing page and select "manage topics."