Fast multi-threaded matrix multiplication in C.
-
Updated
Jul 9, 2024 - C
Fast multi-threaded matrix multiplication in C.
XNOR-Net with binary conv2d kernels with XNOR GEMM op, support both CPU and GPU.
A lightweight matrix computation software library aim for MCU or embedded system
BLISlab: A Sandbox for Optimizing GEMM
Fast Matrix Multiplication Implementation in C programming language. This matrix multiplication algorithm is similar to what Numpy uses to compute dot products.
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.
To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."