Your current environment
I performed GPTQ-int8 quantization using the fine-tuned Qwen2-VL-2B model, and there was no improvement in model inference speed after quantization
🐛 Describe the bug
I performed GPTQ-int8 quantization using the fine-tuned Qwen2-VL-2B model, and there was no improvement in model inference speed after quantization
Before submitting a new issue...