-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
repo链接
https://github.com/THUDM/ChatGLM-6B
https://github.com/mymusise/ChatGLM-Tuning
https://github.com/LianjiaTech/BELLE
LLM量化
https://zhuanlan.zhihu.com/p/616969812
- SmoothQuant
- Outlier Suppression
- AWQ
基于激活与参数大小的缩放保护 - LLM.int8
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
- https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features/
- https://github.com/timdettmers/bitsandbytes
- GPTQ
- GPTQ: 模型量化,穷鬼救星
不使用贪心算法,对W进行优化时,固定位置挑选Q
W不同列间权重更新互相独立,使用批处理更新(Lazy Batch更新,group_size参数控制)
数值稳定性优化 - 4-bit LLM Quantization with GPTQ
- GPTQ: 模型量化,穷鬼救星
- ZeroQuant
- LUT-GEMM
- SparseGPT
- weight only
fine-tune
LLM papers
Awesome-LLM-System-Papers
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
https://flexflow.ai/specInfer/
https://www.gabrieleoliaro.com/publication/expertflow/expertflow.pdf
Metadata
Metadata
Assignees
Labels
No labels