Skip to content
View Shenggan's full-sized avatar
🎯
Flying
🎯
Flying
  • National University of Singapore
  • Singapore

Highlights

  • Pro

Organizations

@cosmo-cube

Block or report Shenggan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,675 281 Updated Mar 10, 2025

An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional variability in sampling steps

Python 121 3 Updated Feb 17, 2025

Enhance-A-Video: Better Generated Video for Free

Python 489 28 Updated Mar 17, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,588 2,149 Updated Mar 28, 2025

Fast low-bit matmul kernels in Triton

Python 273 21 Updated Mar 26, 2025

Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction

Python 20 1 Updated Dec 16, 2024

AlphaFold 3 inference pipeline.

Python 6,287 782 Updated Mar 24, 2025

Official inference framework for 1-bit LLMs

C++ 12,850 907 Updated Feb 18, 2025

A flexible and efficient training framework for large-scale alignment tasks

Python 333 28 Updated Feb 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 784 31 Updated Sep 21, 2024

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 16,419 1,150 Updated Mar 14, 2025

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 49 4 Updated May 29, 2024

An official implementation of Pangu-Weather

Python 1,173 218 Updated Jan 12, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 778 50 Updated Mar 29, 2025

Applied AI experiments and examples for PyTorch

Python 250 24 Updated Mar 21, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 567 42 Updated Feb 14, 2025

The official Meta Llama 3 GitHub site

Python 28,556 3,338 Updated Jan 26, 2025

An interference-aware scheduler for fine-grained GPU sharing

Python 129 19 Updated Jan 26, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 459 33 Updated Mar 20, 2025

Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and parallelism, or roll out your own.

Python 1,315 91 Updated Mar 28, 2025

Repository for MLCommons Chakra schema and tools

Python 94 50 Updated Mar 14, 2025

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 132 63 Updated Mar 27, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 321 45 Updated Mar 29, 2025

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Python 856 46 Updated Jan 3, 2025

VideoSys: An easy and efficient system for video generation

Python 1,949 129 Updated Mar 9, 2025

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 333 126 Updated Feb 23, 2025

The official implementation of "Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization"

Python 15 1 Updated Mar 14, 2024

ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Python 102 8 Updated May 23, 2024

Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Python 331 18 Updated Sep 24, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,897 546 Updated Mar 13, 2025
Next
Showing results