#

gemm

Here are 20 public repositories matching this topic...

OpenNMT / CTranslate2

Fast inference engine for Transformer models

Updated Sep 25, 2024
C++

CNugteren / CLBlast

Tuned OpenCL BLAS

gpu opencl matrix-multiplication blas gemm blas-libraries clblas

Updated Jun 13, 2024
C++

CoffeeBeforeArch / mmul

Serial and parallel implementations of matrix multiplication

serial parallel matrix-multiplication benchmarks gemm mmul

Updated Feb 19, 2021
C++

eth-cscs / spla

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

linear-algebra mpi cuda gemm rocm

Updated Jun 26, 2024
C++

XiaoSong9905 / dgemm-knl

DGEMM on KNL, achieve 75% MKL

hpc high-performance linear-algebra x86 gemm dgemm

Updated May 19, 2022
C++

CambriconECO / BANGC_Gemm_Tutorial

algorithm gemm cambricon bangc

Updated Apr 7, 2021
C++

zixuanweeei / gemm-opt

Manually optimize the GEMM (GEneral Matrix Multiply) operation. There is a long way to go.

cpu cpp gemm gemm-optimization

Updated Aug 22, 2021
C++

blackccpie / fastconv

fast 2D convolution implementation benchmark

cpp avx simd convolution gemm toeplitz im2col

Updated Nov 21, 2017
C++

yester31 / GEMM_Conv2d_CUDA

CUDA Gemm Convolution implementation

cuda cublas convolution cuda-kernels gemm cuda-programming

Updated Feb 4, 2022
C++

yester31 / OpenCL_EX

Development of deep learning inference code by OpenCL kerenl function.

opencl parallel-computing convolution deeplearning gemm

Updated Jun 1, 2022
C++

Bruce-Lee-LY / cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

gpu cublas nvidia cutlass gemm cublaslt llm matrix-multiply tensor-core

Updated Sep 27, 2024
C++

KaiserKlayton / lpa_cnn

Low Precision Arithmetic for Convolutional Neural Network Inference

benchmarking caffe deep-learning image-recognition convolutional-neural-networks 8-bit gemm

Updated Oct 29, 2017
C++

scocoyash / Convolution-To-Gemm

My experiments with convolution

matrix-multiplication convolution openmpi gemm gemm-optimization

Updated Jun 21, 2020
C++

xylcbd / gemm_base

gemm baseline code.

gemm mkl openblas gemm-optimization

Updated Oct 22, 2017
C++

LRZ-BADW / OMMOP

OpenMP Matrix Multiplication Offloading Playground

gpu openmp offloading gemm matmul

Updated Dec 2, 2022
C++

a-sidorova / gpu_opencl_cource

Course Programming on new Architecture-1 (GPU), autumn 2021

gpu opencl jacobi gemm heterogeneous-computing

Updated Dec 5, 2021
C++

jerinphilip / MozIntGemm

Wrapper around intgemm (x86_64) and ruy (ARM) to switch between both based on architecture and provide a fast matrix multiplication backend for Mozilla Firefox's translation feature.

wrapper arm x86-64 gemm

Updated Apr 20, 2022
C++

PhuNH / hpc-aa

High Performance Computing - Algorithms and Applications Course in WS18-19 at TUM

cuda fermi gemm

Updated Feb 3, 2019
C++

riskybacon / mnist_arma_blas

machine-learning matrix-multiplication blas gemm

Updated Feb 4, 2018
C++

BenQuickDeNN / CUDA-GEMM

CUDA version GEMM

Updated Mar 5, 2020
C++

Improve this page

Add a description, image, and links to the gemm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gemm topic, visit your repo's landing page and select "manage topics."