[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

maxin9966 · 2024-07-02T11:28:23Z

Your current environment

how to initiate the gemma2-27b with a 4-bit quantization?

How would you like to use vllm

Could you please explain how to initiate the gemma2-27b with a 4-bit quantization?

Qubitium · 2024-07-02T18:30:20Z

GPTQModel v0.9.3 added Gemma 2 supuport for gptq 4bit quantization but the 27B model has inference issues though we haven't had time to test if vllm has similar infernece issue for the 27B model as HF transformers. 9B model is perfect though and passing with flying colors.

You can try quantizing a 27B with GPTQModel (use format=FORMAT.GPTQ, sym=True) and then try inferencing with vLLM. Let me know if you get it working.

SJY8460 · 2024-07-04T10:25:29Z

I have a similar question. Whether vllm can directly use "load_in_4bit" to load quantizated model? If not, will it be implemented in the future?

yechenzhi · 2024-07-21T11:20:15Z

Hi, same question here~

maxin9966 added the usage How to use vllm label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

maxin9966 commented Jul 2, 2024

Qubitium commented Jul 2, 2024

SJY8460 commented Jul 4, 2024

yechenzhi commented Jul 21, 2024

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

[Usage]: how to initiate the gemma2-27b with a 4-bit quantization? #6068

Comments

maxin9966 commented Jul 2, 2024

Your current environment

How would you like to use vllm

Qubitium commented Jul 2, 2024

SJY8460 commented Jul 4, 2024

yechenzhi commented Jul 21, 2024