Skip to content

Conversation

@jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Nov 10, 2025

Purpose

Part of #23474

This PR first removes code related to lora extra vocab from the most of models (temporarily keeping llama and mixtral for lora tests to pass), then continues to remove lora extra vocab code in subsequent PRs.

PS: The entire work is planned to be completed in this week.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@mergify mergify bot added llama Related to Llama models qwen Related to Qwen models speculative-decoding labels Nov 10, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant step towards simplifying the codebase by removing the logic for LoRA extra vocabulary. The changes are extensive, touching many model files, and appear to be mostly correct and consistent with the goal of the refactoring. However, I've identified a recurring critical issue in several files that will lead to a TypeError at runtime due to incorrect tuple unpacking syntax. These issues need to be addressed to ensure the models function correctly.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 10, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check: We still keep the default padding for vocab sizes and only remove additional padding from LoRA, right?

@jeejeelee
Copy link
Collaborator Author

Just to double check: We still keep the default padding for vocab sizes and only remove additional padding from LoRA, right?

Yes

@WoosukKwon
Copy link
Collaborator

@jeejeelee Dumb question: Can you explain more? When skimming the code briefly, I thought this PR removed default padding too. IIRC, the vocab size should be padded for TP > 1 (or maybe for other reasons).

@jeejeelee
Copy link
Collaborator Author

@WoosukKwon This is a great question, and I apologize for not describing it clearly in the PR.

This PR does not change the vocab default padding size; it simply doesn't explicitly pass padding_size. According to VocabParallelEmbedding, the default value of padding_size is DEFAULT_VOCAB_PADDING_SIZE

qwen2 and llama are two types for comparison

The current PR does not modify models like llama in order to ensure CI tests can pass. They will be modified in subsequent PRs

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the explanation!

@WoosukKwon WoosukKwon merged commit 9d1c474 into vllm-project:main Nov 11, 2025
54 checks passed
@jeejeelee jeejeelee deleted the remove-lora-extra-vocab branch November 11, 2025 23:16
fangyuchu pushed a commit to fangyuchu/vllm that referenced this pull request Nov 12, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: George D. Torres <gdavtor@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants