Skip to content
@Dao-AILab

Dao AI Lab

We are an AI research group led by Prof. Tri Dao

Popular repositories Loading

  1. flash-attention flash-attention Public

    Fast and memory-efficient exact attention

    Python 18.7k 1.9k

  2. causal-conv1d causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Cuda 538 117

  3. quack quack Public

    A Quirky Assortment of CuTe Kernels

    Python 388 30

  4. fast-hadamard-transform fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 214 31

  5. grouped-latent-attention grouped-latent-attention Public

    Python 123 2

  6. gemm-cublas gemm-cublas Public

    Python 22

Repositories

Showing 7 of 7 repositories
  • quack Public

    A Quirky Assortment of CuTe Kernels

    Python 388 Apache-2.0 30 6 0 Updated Aug 6, 2025
  • flash-attention Public

    Fast and memory-efficient exact attention

    Python 18,735 BSD-3-Clause 1,867 791 75 Updated Aug 6, 2025
  • causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Cuda 538 BSD-3-Clause 117 27 10 Updated Jul 18, 2025
  • cutlass Public Forked from NVIDIA/cutlass

    CUDA Templates for Linear Algebra Subroutines

    C++ 1 1,374 0 0 Updated Jun 8, 2025
  • Python 123 MIT 2 4 0 Updated May 29, 2025
  • gemm-cublas Public
    Python 22 Apache-2.0 0 0 0 Updated May 5, 2025
  • fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 214 BSD-3-Clause 31 7 3 Updated May 24, 2024

Most used topics

Loading…