Skip to content

Conversation

whx-sjtu
Copy link
Contributor

@whx-sjtu whx-sjtu commented Sep 27, 2025

Purpose

This PR is a supplement to PR #24862, passing some prefix parameters that were previously missed in that PR to LLMHead

Test Plan

No extra test needed.

Test Result

All current ci tests should pass.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models speculative-decoding labels Sep 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to pass the prefix parameter to ParallelLMHead instances across various models, which is a crucial step for ensuring correct quantization. While the intent is correct, the implementation introduces inconsistencies in several models where the provided prefix does not match the module's attribute name. This could lead to problems with quantization configurations that rely on accurate module paths. Furthermore, in models that utilize a ModuleList for multiple LM heads, the same prefix is incorrectly applied to all heads, which would prevent them from being quantized differently if needed. I've added specific comments and suggestions to address these inconsistencies.

@whx-sjtu whx-sjtu force-pushed the adapt_mtp_quant branch 3 times, most recently from 92c39e0 to d18f880 Compare September 27, 2025 13:36
Copy link

mergify bot commented Oct 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @whx-sjtu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 1, 2025
@whx-sjtu whx-sjtu force-pushed the adapt_mtp_quant branch 2 times, most recently from 6bddb18 to 5163bb6 Compare October 1, 2025 17:36
@mergify mergify bot removed the needs-rebase label Oct 1, 2025
@whx-sjtu whx-sjtu requested a review from Isotr0py October 1, 2025 17:37
@Isotr0py Isotr0py enabled auto-merge (squash) October 2, 2025 06:43
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
auto-merge was automatically disabled October 3, 2025 11:06

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 merged commit cbf9221 into vllm-project:main Oct 3, 2025
54 checks passed
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants