Skip to content

Conversation

whx-sjtu
Copy link
Contributor

@whx-sjtu whx-sjtu commented Sep 15, 2025

Purpose

prefix is an init parameter of ParallelLMHead, which is later used to get_quant_method by VocabParallelEmbedding. Currently only part of models pass this parameter to initialize ParallelLMHead. Missing of this parameter can cause bugs in the process of getting quantization method. This PR completes the passing of the prefix parameter for all models.

Test Plan

No need to add new test.

Test Result

all tests should pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@mergify mergify bot added deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models gpt-oss Related to GPT-OSS models speculative-decoding labels Sep 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request systematically addresses a potential bug related to quantization by ensuring the prefix parameter is passed to ParallelLMHead across all relevant models. The changes are consistent, straightforward, and correctly implemented. This is a valuable fix that enhances the robustness and correctness of the model quantization process. The changes look good.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@whx-sjtu whx-sjtu force-pushed the add_prefix_to_llmhead branch from 3c216af to 80aea6b Compare September 15, 2025 10:03
@wangxiyuan
Copy link
Contributor

the prefix is very useful in custom quantization case.

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 16, 2025
@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@DarkLight1337 DarkLight1337 merged commit 4a9375f into vllm-project:main Sep 17, 2025
60 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: charlifu <charlifu@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models gpt-oss Related to GPT-OSS models llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants