Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,074 64 Updated Feb 28, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,253 4,283 Updated Mar 7, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,817 469 Updated Mar 5, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,059 608 Updated Mar 6, 2025

hash-based-snargs-book / hash-based-snargs-book

Source code for "Building Cryptographic Proofs from Hash Functions"

TeX 182 27 Updated Feb 21, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,184 779 Updated Mar 1, 2025

dtolnay / watt

Runtime for executing procedural macros as WebAssembly

Rust 1,354 28 Updated Mar 3, 2025

lucidrains / native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 511 19 Updated Mar 7, 2025

stepfun-ai / Step-Video-T2V

Python 2,588 220 Updated Feb 27, 2025

seal-rg / recurrent-pretraining

Pretraining code for a large-scale depth-recurrent language model

Python 661 54 Updated Mar 5, 2025

jchook / shhh

Shh! Alerts you when you are too loud.

Rust 2 Updated Jan 23, 2025

brevis-network / pico

Rust 39 14 Updated Feb 22, 2025

nickscamara / open-deep-research

An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl

TypeScript 4,790 574 Updated Feb 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PatStiles

Achievements

Achievements

Block or report PatStiles

Starred repositories

unixpickle / learn-ptx

MagellaX / StreamAttn

deepseek-ai / smallpond

deepseek-ai / 3FS

duckdb / duckdb

unitreerobotics / unitree_rl_gym

QwenLM / Qwen

allenai / olmocr

sgl-project / sglang

volcengine / verl

PeterGriffinJin / Search-R1

arielb1 / pollcatch

rust-embedded / riscv

MISTLab / Swarm-SLAM

ETH-PBL / MLonMCU

NVIDIA / cutlass

deepseek-ai / DualPipe

thu-ml / SageAttention