Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWQ:用于LLM压缩和加速的激活感知重量量化 #78

Open
ziwang-com opened this issue Jun 6, 2023 · 0 comments
Open

AWQ:用于LLM压缩和加速的激活感知重量量化 #78

ziwang-com opened this issue Jun 6, 2023 · 0 comments

Comments

@ziwang-com
Copy link
Owner

https://github.com/mit-han-lab/llm-awq

AWQ:用于LLM压缩和加速的激活感知重量量化[论文]
适用于LLM的高效准确的低比特权重量化(INT3 / 4),支持指令调谐模型和多模态LM。

overview

当前版本支持:

AWQ 搜索以实现准确的量化。
用于LLM的预先计算的AWQ模型库(LLaMA,OPT,Vicuna,LLaVA;加载以生成量化权重)。
PyTorch 中的内存效率高 4 位线性。
高效的 CUDA 内核实现,可实现快速推理(支持上下文和解码阶段)。
指令调谐模型 (Vicuna) 和多模态 LM (LLaVA) 的 4 位推理示例。

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant