Skip to content

Conversation

JartX
Copy link
Contributor

@JartX JartX commented Sep 19, 2025

Hi everyone! This PR fixes the same issue as the following PR: #23994
Only for qwen3 next, you need to check if it has been quantized with auto_gptq using https://github.com/intel/auto-round

Command:

auto_round --model Qwen/Qwen3-Next-80B-A3B-Instruct --bits 8 --format "auto_gptq" --output_dir /workspace/outputs

@Isotr0py, it simply rechecks the key extracted from the quant_config. I've verified that it works. Could you try the following model? https://huggingface.co/Intel/Qwen3-Next-80B-A3B-Instruct-int4-AutoRound

IMPORTANT NOTE:
Platform: ROCM
In order to run Qwen3-Next-80B-A3B-Instruct-w4g128 AutoRound-GPTQ, I had to merge the following PR because the block size of attention is 272.
#24486
EDIT: PR #25105 Solved

…fig like qwen3moe

Signed-off-by: JartX <sagformas@epdcenter.es>
@JartX JartX requested a review from sighingnow as a code owner September 19, 2025 15:57
@mergify mergify bot added the qwen Related to Qwen models label Sep 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix to enable GPTQ quantization for the gate layer in Qwen3 Next MOE models when using AutoRound. The change correctly modifies _maybe_ignore_quant_config to check for the autoround_version attribute in the GPTQConfig or GPTQMarlinConfig. This ensures that the gate's quantization configuration is only ignored for standard GPTQ/AutoGPTQ, but applied for AutoRound-quantized models, which is the intended behavior. The implementation is correct and effectively resolves the compatibility issue.

@Isotr0py Isotr0py requested a review from jeejeelee September 19, 2025 16:50
@JartX
Copy link
Contributor Author

JartX commented Sep 19, 2025

@Isotr0py @vadiklyutiy
After exiting main and adapting the PR: #24486 again, the PR: #25105, works correctly, so for my part I close the problem, and only leave the support of AutoGPTQ=AutoRound-GPTQ to @Isotr0py, I withdraw the rest of the comments to avoid noise, Thank you all very much :)

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 20, 2025
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeejeelee jeejeelee merged commit 3642909 into vllm-project:main Sep 20, 2025
66 checks passed
@JartX JartX deleted the fix/qwen3-next-moe-autogptq_autoround_gptq branch September 20, 2025 10:18
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268)

Signed-off-by: JartX <sagformas@epdcenter.es>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
…utoGPTQ and AutoRound-GPTQ) (vllm-project#25268)

Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…utoGPTQ and AutoRound-GPTQ) (#25268)

Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants