Stars
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton