Kernel

CUDA

SGEMM

✅ Naive

✅ Thread Tiling

✅ Thread Tiling Bank Free

SGEMV

✅ Naive

✅ Warp Reduce

Reduce

Transpose

Sort

✅ MergeSort

Softmax

✅ Naive

✅ WarpReduce

Triton

HGEMM

✅ Block Tiling

GEMV

Reduce

Transpose

Sort

Softmax

Build

python3 ./script.py {kernelName}

Dependence

NVIDIA GPU
OpenAI Triton >= 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
3rd		3rd
include		include
lib		lib
triton		triton
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
matmul-performance.csv		matmul-performance.csv
matmul-performance.png		matmul-performance.png
performance.png		performance.png
results.html		results.html
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kernel

CUDA

SGEMM

SGEMV

Reduce

Transpose

Sort

Softmax

Triton

HGEMM

GEMV

Reduce

Transpose

Sort

Softmax

Build

Dependence

About

Releases

Packages

Contributors 2

Languages

Paran0idy/Kernel

Folders and files

Latest commit

History

Repository files navigation

Kernel

CUDA

SGEMM

SGEMV

Reduce

Transpose

Sort

Softmax

Triton

HGEMM

GEMV

Reduce

Transpose

Sort

Softmax

Build

Dependence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages