[LoRA] Cleanup LoRA unused code #29611

jeejeelee · 2025-11-27T14:07:24Z

Purpose

We have removed the LoRA extra-vocab code, embedding_padding_modules is unused, this PR clean up related code. What's more, using a smaller base model and corresponding LoRA to reduce CI pressure."

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2025-11-27T14:08:11Z

Documentation preview: https://vllm--29611.org.readthedocs.build/en/29611/

mergify · 2025-11-27T14:08:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request provides a good cleanup by removing the unused embedding_padding_modules and related logic for padding LoRA embedding weights. The changes are consistent across the codebase. I have one suggestion to improve robustness by adding an assertion to ensure vocabulary size consistency for lm_head LoRA adapters, which will provide clearer error messages for users.

gemini-code-assist · 2025-11-27T14:10:33Z

vllm/lora/models.py

                if pin_memory:
                    loras[module_name].lora_a = loras[module_name].lora_a.pin_memory()
            else:
                loras[module_name].lora_b = tensor.to(device=device, dtype=dtype)


While the padding logic for lm_head LoRA adapters is being removed, it's good practice to add an explicit check for vocabulary size mismatch. This will prevent potential runtime errors with cryptic messages if a user tries to load a LoRA with a different vocabulary size for the lm_head, providing a much clearer error message instead.

Suggested change

loras[module_name].lora_b = tensor.to(device=device, dtype=dtype)

if "lm_head" in module_name and model_vocab_size is not None:

assert model_vocab_size == tensor.shape[0], (

f"The lm_head LoRA vocab size ({tensor.shape[0]}) must be consistent"

f" with the base model's vocabulary size({model_vocab_size})."

)

loras[module_name].lora_b = tensor.to(device=device, dtype=dtype)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

DarkLight1337 · 2025-11-27T16:36:31Z

Please update the description explaining which code is unused

jeejeelee · 2025-11-27T16:48:38Z

examples/offline_inference/multilora_inference.py

    """Main function that sets up and runs the prompt processing."""
    engine = initialize_engine()
-    lora_path = snapshot_download(repo_id="yard1/llama-2-7b-sql-lora-test")
+    lora_path = snapshot_download(repo_id="jeeejeee/llama32-3b-text2sql-spider")


We have removed the LoRA extra-vocab code, so we no longer support this type of LoRA weight. Accordingly, I change the base model and LoRA model in this script

jeejeelee · 2025-11-27T16:49:55Z

tests/entrypoints/conftest.py


 @pytest.fixture(scope="session")
-def zephyr_lora_files():
+def qwen3_lora_files():


Using a smaller LoRA model can reduce CI pressure.

jeejeelee · 2025-11-27T16:54:55Z

@DarkLight1337 I have updated the description

DarkLight1337

LGTM but maybe we should get LoRA TP to pass on main first

jeejeelee · 2025-11-27T17:22:03Z

@DarkLight1337 I will monitor LoRA TP test more closely, these tests are quite flaky

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee · 2025-11-28T16:37:29Z

tests/lora/test_olmoe_tp.py

+    generate_and_test(
+        llm, olmoe_lora_files, lora_id=1, compare_lower=fully_sharded_loras
+    )
+    generate_and_test(


Relax TP4+fully_sharded_loras test

hmellor

LGTM, just some nits

hmellor · 2025-11-28T18:52:50Z

tests/entrypoints/conftest.py

 @pytest.fixture(scope="session")
-def zephyr_lora_files():
+def qwen3_lora_files():
    """Download zephyr LoRA files once per test session."""


Docstring should be updated too

hmellor · 2025-11-28T18:54:48Z

tests/entrypoints/sagemaker/conftest.py


 # Model name constants used across tests
-MODEL_NAME_ZEPHYR = "HuggingFaceH4/zephyr-7b-beta"
+MODEL_NAME_ZEPHYR = "Qwen/Qwen3-0.6B"


Suggested change

MODEL_NAME_ZEPHYR = "Qwen/Qwen3-0.6B"

MODEL_NAME_QWEN= "Qwen/Qwen3-0.6B"

hmellor · 2025-11-28T18:58:02Z

vllm/lora/models.py

+                if "lora_embedding_A" in tensor_name and model_vocab_size is not None:
+                    assert model_vocab_size == tensor.shape[1], (
+                        f"The embedding LoRA size({tensor.shape[1]}) must be consistent"
+                        f" with the base model's vocabulary size({model_vocab_size})."
+                    )


Could we raise a runtime error instead of asserting?

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

DarkLight1337 · 2025-11-29T06:52:50Z

Flaky test

jeejeelee added 2 commits November 27, 2025 09:58

Init

43927e7

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Fix lora example

d8292ae

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee requested review from hmellor and patrickvonplaten as code owners November 27, 2025 14:07

jeejeelee marked this pull request as draft November 27, 2025 14:07

mergify bot added documentation Improvements or additions to documentation llama Related to Llama models speculative-decoding v1 labels Nov 27, 2025

mergify bot added the needs-rebase label Nov 27, 2025

gemini-code-assist bot reviewed Nov 27, 2025

View reviewed changes

Resolve conflicts

aa1a053

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify bot removed the needs-rebase label Nov 27, 2025

jeejeelee added 3 commits November 27, 2025 15:51

Move forward

5621c38

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Move forward

6bc4b34

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into cleanup-lora

ca56a1d

jeejeelee marked this pull request as ready for review November 27, 2025 16:34

jeejeelee requested review from DarkLight1337, NickLucche, aarnphm and robertgshaw2-redhat as code owners November 27, 2025 16:34

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 27, 2025

jeejeelee commented Nov 27, 2025

View reviewed changes

DarkLight1337 approved these changes Nov 27, 2025

View reviewed changes

Merge branch 'main' into cleanup-lora

a5ddc78

jeejeelee added 4 commits November 28, 2025 01:55

FIX

6be19a8

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into cleanup-lora

7a6ecf8

Fix test

a7a372f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into cleanup-lora

3b6cf37

jeejeelee commented Nov 28, 2025

View reviewed changes

hmellor approved these changes Nov 28, 2025

View reviewed changes

jeejeelee added 3 commits November 29, 2025 00:47

Fix test

828ee29

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into cleanup-lora

df9c7a5

Modify assert

e190b62

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

vllm-bot merged commit 39e63de into vllm-project:main Nov 29, 2025
60 of 62 checks passed

jeejeelee deleted the cleanup-lora branch November 29, 2025 12:02

jeejeelee mentioned this pull request Nov 29, 2025

[Bugfix] Revert test_tokenization.py #29729

Merged

5 tasks

	MODEL_NAME_ZEPHYR = "Qwen/Qwen3-0.6B"
	MODEL_NAME_QWEN= "Qwen/Qwen3-0.6B"

Uh oh!

[LoRA] Cleanup LoRA unused code #29611

[LoRA] Cleanup LoRA unused code #29611

Conversation

jeejeelee commented Nov 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 27, 2025

Uh oh!

jeejeelee Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Nov 27, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Nov 27, 2025

Uh oh!

jeejeelee Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeejeelee commented Nov 27, 2025 •

edited by github-actions bot

Loading

jeejeelee Nov 28, 2025 •

edited

Loading