SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Jul 3, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Neural Network Compression Framework for enhanced OpenVINO™ inference
LLMC is an elegant tool for LLM compression.
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"
[ECCV 2024] 3D Small Object Detection with Dynamic Spatial Pruning
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
[CVPR'24] Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Sparsity-aware deep learning inference runtime for CPUs
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
PaddleSlim is an open-source library for deep model compression and architecture search.
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
Characterization study repository for pruning, a popular way to compress a DL model. this repo also investigates optimal sparse tensor layouts for pruned nets
Code for the paper "FOCIL: Finetune-and-Freeze for Online Class-Incremental Learning by Training Randomly Pruned Sparse Experts"
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Harness for training/finding lottery tickets in PyTorch. With support for multiple pruning techniques and augmented by distributed training, FFCV and AMP.
Add a description, image, and links to the pruning topic page so that developers can more easily learn about it.
To associate your repository with the pruning topic, visit your repo's landing page and select "manage topics."