Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

Open
maxin9966 opened this issue Jul 2, 2024 · 3 comments
Open
Labels
usage How to use vllm

Comments

@maxin9966
Copy link

Your current environment

how to initiate the gemma2-27b with a 4-bit quantization?

How would you like to use vllm

Could you please explain how to initiate the gemma2-27b with a 4-bit quantization?

@maxin9966 maxin9966 added the usage How to use vllm label Jul 2, 2024
@Qubitium
Copy link
Contributor

Qubitium commented Jul 2, 2024

GPTQModel v0.9.3 added Gemma 2 supuport for gptq 4bit quantization but the 27B model has inference issues though we haven't had time to test if vllm has similar infernece issue for the 27B model as HF transformers. 9B model is perfect though and passing with flying colors.

You can try quantizing a 27B with GPTQModel (use format=FORMAT.GPTQ, sym=True) and then try inferencing with vLLM. Let me know if you get it working.

@SJY8460
Copy link

SJY8460 commented Jul 4, 2024

I have a similar question. Whether vllm can directly use "load_in_4bit" to load quantizated model? If not, will it be implemented in the future?

@yechenzhi
Copy link

Hi, same question here~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

4 participants