Skip to content
@Dao-AILab

Dao AI Lab

We are an AI research group led by Prof. Tri Dao

Popular repositories Loading

  1. flash-attention flash-attention Public

    Fast and memory-efficient exact attention

    Python 17.8k 1.7k

  2. causal-conv1d causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Cuda 488 104

  3. fast-hadamard-transform fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 198 27

  4. grouped-latent-attention grouped-latent-attention Public

    Python 112 2

  5. gemm-cublas gemm-cublas Public

    Python 21

  6. cutlass cutlass Public

    Forked from NVIDIA/cutlass

    CUDA Templates for Linear Algebra Subroutines

    C++ 1

Repositories

Showing 6 of 6 repositories
  • flash-attention Public

    Fast and memory-efficient exact attention

    Python 17,833 BSD-3-Clause 1,742 753 63 Updated Jun 15, 2025
  • cutlass Public Forked from NVIDIA/cutlass

    CUDA Templates for Linear Algebra Subroutines

    C++ 1 1,278 0 0 Updated Jun 9, 2025
  • Python 112 MIT 2 1 0 Updated May 29, 2025
  • causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Cuda 488 BSD-3-Clause 104 24 12 Updated May 26, 2025
  • gemm-cublas Public
    Python 21 Apache-2.0 0 0 0 Updated May 5, 2025
  • fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 198 BSD-3-Clause 27 7 2 Updated May 24, 2024

Most used topics

Loading…