Skip to content
@mit-han-lab

MIT HAN Lab

Efficient AI Computing. PI: Song Han

Pinned Loading

  1. streaming-llm Public

    [ICLR 2024] Efficient Streaming Language Models with Attention Sinks

    Python 6.9k 388

  2. llm-awq Public

    [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

    Python 3.2k 263

  3. efficientvit Public

    Efficient vision foundation models for high-resolution generation and perception.

    Python 3k 230

  4. bevfusion Public archive

    [ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

    Python 2.7k 484

  5. temporal-shift-module Public

    [ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

    Python 2.1k 420

  6. once-for-all Public

    [ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

    Python 1.9k 344

Repositories

Showing 10 of 59 repositories
  • llm-awq Public

    [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

    Python 3,166 MIT 263 160 10 Updated Jul 18, 2025
  • radial-attention Public

    Radial Attention Official Implementation

    Python 419 Apache-2.0 19 14 0 Updated Jul 15, 2025
  • lpd Public

    Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

    Python 59 MIT 4 1 0 Updated Jul 14, 2025
  • Quest Public

    [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

    Cuda 306 MIT 35 3 0 Updated Jul 11, 2025
  • torchquantum Public

    A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.

    Jupyter Notebook 1,512 MIT 225 61 (4 issues need help) 8 Updated Jul 8, 2025
  • x-attention Public

    [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

    Python 205 10 4 0 Updated Jul 7, 2025
  • vila-u Public

    [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

    Python 368 MIT 12 19 0 Updated Apr 25, 2025
  • efficientvit Public

    Efficient vision foundation models for high-resolution generation and perception.

    Python 2,995 Apache-2.0 230 107 0 Updated Apr 24, 2025
  • omniserve Public

    [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

    C++ 721 Apache-2.0 48 41 4 Updated Mar 6, 2025
  • torchsparse Public

    [MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

    Cuda 1,379 MIT 172 43 3 Updated Feb 24, 2025