[Bug]: Qwen2-VL-2B quantization model has no improvement in reasoning speed compared to the original model

### Your current environment

I performed GPTQ-int8 quantization using the fine-tuned Qwen2-VL-2B model, and there was no improvement in model inference speed after quantization

### 🐛 Describe the bug

I performed GPTQ-int8 quantization using the fine-tuned Qwen2-VL-2B model, and there was no improvement in model inference speed after quantization

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Qwen2-VL-2B quantization model has no improvement in reasoning speed compared to the original model #15601

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen2-VL-2B quantization model has no improvement in reasoning speed compared to the original model #15601

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions