#

quantization

Here are 330 public repositories matching this topic...

LLaMA-Factory

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Updated Nov 18, 2024
Python

Chinese-LLaMA-Alpaca

ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

nlp llama lora quantization alpaca plm pre-trained-language-models large-language-models llm llama-2 alpaca-2

Updated Apr 30, 2024
Python

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated Nov 17, 2024
Python

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

nlp deep-learning transformers inference pytorch transformer quantization large-language-models llms

Updated Sep 28, 2024
Python

huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

pretrained-models quantization knowledge-distillation model-compression large-scale-distributed

Updated Jan 22, 2024
Python

deepsparse

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated Jul 19, 2024
Python

nlp-architect

IntelLabs / nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

nlp deep-learning tensorflow nlu transformers pytorch deeplearning quantization bert dynet

Updated Nov 7, 2022
Python

aaron-xichen / pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch quantization pytorch-tutorial pytorch-tutorials

Updated Nov 22, 2022
Python

stochasticai / xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

adapter deep-learning llama lora quantization language-model alpaca mistral fine-tuning peft finetuning mixed-precision gpt-2 gpt-j llm generative-ai gen-ai

Updated Sep 23, 2024
Python

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated Nov 18, 2024
Python

dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

deep-learning pytorch offloading quantization language-model google-colab colab-notebook mixture-of-experts llm

Updated Apr 8, 2024
Python

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Nov 18, 2024
Python

666DZY666 / micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…

Updated Oct 6, 2021
Python

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Nov 18, 2024
Python

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated Nov 18, 2024
Python

pytorch / ao

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

Updated Nov 19, 2024
Python

PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.

sparsity compression detection transformer segmentation pruning quantization nas bert tensorrt distillation ernie yolov5 yolov6 yolov7

Updated Nov 5, 2024
Python

OpenPPL / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

open-source caffe deep-learning neural-network cuda pytorch quantization onnx

Updated Mar 28, 2024
Python

tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

machine-learning sparsity compression deep-learning tensorflow optimization keras ml pruning quantization model-compression quantized-training quantized-neural-networks quantized-networks

Updated Nov 18, 2024
Python

open-mmlab / mmrazor

OpenMMLab Model Compression Toolbox and Benchmark.

detection pytorch classification segmentation pruning darts quantization nas knowledge-distillation spos autoslim

Updated Jun 11, 2024
Python

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."