You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GPTQModel v0.9.3 added Gemma 2 supuport for gptq 4bit quantization but the 27B model has inference issues though we haven't had time to test if vllm has similar infernece issue for the 27B model as HF transformers. 9B model is perfect though and passing with flying colors.
You can try quantizing a 27B with GPTQModel (use format=FORMAT.GPTQ, sym=True) and then try inferencing with vLLM. Let me know if you get it working.
Your current environment
how to initiate the gemma2-27b with a 4-bit quantization?
How would you like to use vllm
Could you please explain how to initiate the gemma2-27b with a 4-bit quantization?
The text was updated successfully, but these errors were encountered: