pruning

Star

Here are 455 public repositories matching this topic...

JulesBelveze / bert-squeeze

Star

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

nlp transformers lstm theseus pruning quantization bert distillation pytorch-lightning fastbert deebert

Updated Nov 3, 2024
Python

mcthouacbb / Sirius

Star

Chess engine

chess ai extensions engine evaluation bitboard pruning alpha-beta-pruning negamax reductions

Updated Nov 3, 2024
C++

ModelTC / llmc

Star

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Updated Nov 2, 2024
Python

openvinotoolkit / nncf

Star

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated Nov 1, 2024
Python

quic / aimet

Star

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Nov 1, 2024
Python

Efficient-ML / Awesome-Efficient-LLM-Diffusion

Star

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

awesome generative-model pruning model-compression distillation diffusion-models model-quantization efficient-deep-learning aigc large-language-models llm

Updated Nov 1, 2024

huggingface / optimum-intel

Star

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated Nov 3, 2024
Jupyter Notebook

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 1, 2024
Python

MitchellX / AdapMTL

Star

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model (ACM MM'24)

pruning multitask-learning

Updated Oct 30, 2024
Jupyter Notebook

ragibson / ModularityPruning

Star

Pruning tool to identify small subsets of network partitions that are significant from the perspective of stochastic block model inference. This method works for single-layer and multi-layer networks, as well as for restricting focus to a fixed number of communities when desired.

community-detection network-graph network-science graph-theory pruning modularity stochastic-block-model multilayer-networks