-
National University of Singapore
- Singapore
Highlights
- Pro
Stars
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional variability in sampling steps
Enhance-A-Video: Better Generated Video for Free
A generative world for general-purpose robotics & embodied AI learning.
Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction
A flexible and efficient training framework for large-scale alignment tasks
A throughput-oriented high-performance serving framework for LLMs
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
An official implementation of Pangu-Weather
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
Applied AI experiments and examples for PyTorch
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
An interference-aware scheduler for fine-grained GPU sharing
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and parallelism, or roll out your own.
Repository for MLCommons Chakra schema and tools
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
MSCCL++: A GPU-driven communication stack for scalable AI applications
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
VideoSys: An easy and efficient system for video generation
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
The official implementation of "Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization"
ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.