Sparsity-aware deep learning inference runtime for CPUs
-
Updated
Jul 19, 2024 - Python
Sparsity-aware deep learning inference runtime for CPUs
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
PaddleSlim is an open-source library for deep model compression and architecture search.
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
OpenMMLab Model Compression Toolbox and Benchmark.
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference
Neural Network Compression Framework for enhanced OpenVINO™ inference
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
YOLO ModelCompression MultidatasetTraining
Pruning and other network surgery for trained Keras models.
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Add a description, image, and links to the pruning topic page so that developers can more easily learn about it.
To associate your repository with the pruning topic, visit your repo's landing page and select "manage topics."