A curated list for Efficient Large Language Models
-
Updated
Jun 12, 2024 - Python
A curated list for Efficient Large Language Models
This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.
This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
The official implementation of the ICML 2023 paper OFQ-ViT
Unofficial implementation of NCNet using flax and jax
A tutorial of model quantization using TensorFlow
Add a description, image, and links to the model-quantization topic page so that developers can more easily learn about it.
To associate your repository with the model-quantization topic, visit your repo's landing page and select "manage topics."