A curated list for Efficient Large Language Models
-
Updated
Jul 16, 2024 - Python
A curated list for Efficient Large Language Models
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
The official implementation of the ICML 2023 paper OFQ-ViT
A tutorial of model quantization using TensorFlow
Unofficial implementation of NCNet using flax and jax
Add a description, image, and links to the model-quantization topic page so that developers can more easily learn about it.
To associate your repository with the model-quantization topic, visit your repo's landing page and select "manage topics."