elementwise

Here are 2 public repositories matching this topic...

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

cuda pytorch triton gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce flash-attention-3

Updated Oct 14, 2024
Cuda

Liu-xiandong / How_to_optimize_in_GPU

Star

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

hpc reduce high-performance-computing gpu-acceleration sgemm elementwise sgemv

Updated Jul 29, 2023
Cuda

Improve this page

Add a description, image, and links to the elementwise topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the elementwise topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly