-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[LoRA] Cleanup LoRA unused code #29611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
Documentation preview: https://vllm--29611.org.readthedocs.build/en/29611/ |
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a good cleanup by removing the unused embedding_padding_modules and related logic for padding LoRA embedding weights. The changes are consistent across the codebase. I have one suggestion to improve robustness by adding an assertion to ensure vocabulary size consistency for lm_head LoRA adapters, which will provide clearer error messages for users.
| if pin_memory: | ||
| loras[module_name].lora_a = loras[module_name].lora_a.pin_memory() | ||
| else: | ||
| loras[module_name].lora_b = tensor.to(device=device, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the padding logic for lm_head LoRA adapters is being removed, it's good practice to add an explicit check for vocabulary size mismatch. This will prevent potential runtime errors with cryptic messages if a user tries to load a LoRA with a different vocabulary size for the lm_head, providing a much clearer error message instead.
| loras[module_name].lora_b = tensor.to(device=device, dtype=dtype) | |
| if "lm_head" in module_name and model_vocab_size is not None: | |
| assert model_vocab_size == tensor.shape[0], ( | |
| f"The lm_head LoRA vocab size ({tensor.shape[0]}) must be consistent" | |
| f" with the base model's vocabulary size({model_vocab_size})." | |
| ) | |
| loras[module_name].lora_b = tensor.to(device=device, dtype=dtype) |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
Please update the description explaining which code is unused |
| """Main function that sets up and runs the prompt processing.""" | ||
| engine = initialize_engine() | ||
| lora_path = snapshot_download(repo_id="yard1/llama-2-7b-sql-lora-test") | ||
| lora_path = snapshot_download(repo_id="jeeejeee/llama32-3b-text2sql-spider") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have removed the LoRA extra-vocab code, so we no longer support this type of LoRA weight. Accordingly, I change the base model and LoRA model in this script
|
|
||
| @pytest.fixture(scope="session") | ||
| def zephyr_lora_files(): | ||
| def qwen3_lora_files(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a smaller LoRA model can reduce CI pressure.
|
@DarkLight1337 I have updated the description |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but maybe we should get LoRA TP to pass on main first
|
@DarkLight1337 I will monitor LoRA TP test more closely, these tests are quite flaky |
| generate_and_test( | ||
| llm, olmoe_lora_files, lora_id=1, compare_lower=fully_sharded_loras | ||
| ) | ||
| generate_and_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relax TP4+fully_sharded_loras test
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits
tests/entrypoints/conftest.py
Outdated
| @pytest.fixture(scope="session") | ||
| def zephyr_lora_files(): | ||
| def qwen3_lora_files(): | ||
| """Download zephyr LoRA files once per test session.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should be updated too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
|
|
||
| # Model name constants used across tests | ||
| MODEL_NAME_ZEPHYR = "HuggingFaceH4/zephyr-7b-beta" | ||
| MODEL_NAME_ZEPHYR = "Qwen/Qwen3-0.6B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MODEL_NAME_ZEPHYR = "Qwen/Qwen3-0.6B" | |
| MODEL_NAME_QWEN= "Qwen/Qwen3-0.6B" |
vllm/lora/models.py
Outdated
| if "lora_embedding_A" in tensor_name and model_vocab_size is not None: | ||
| assert model_vocab_size == tensor.shape[1], ( | ||
| f"The embedding LoRA size({tensor.shape[1]}) must be consistent" | ||
| f" with the base model's vocabulary size({model_vocab_size})." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we raise a runtime error instead of asserting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
Flaky test |
Purpose
We have removed the LoRA extra-vocab code,
embedding_padding_modulesis unused, this PR clean up related code. What's more, using a smaller base model and corresponding LoRA to reduce CI pressure."Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.