Skip to content

Conversation

jeejeelee
Copy link
Collaborator

@jeejeelee jeejeelee commented Sep 19, 2025

Purpose

This PR will slight affect the loRA implementation of hardware plugins. cc @xuechendi @Yikun

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@mergify mergify bot added v1 tpu Related to Google TPUs labels Sep 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the initialization of the LoRA manager to use a single vllm_config object, which simplifies the API and improves code clarity. The changes are well-implemented across the codebase. However, I've found a critical issue in one of the updated tests where the configuration for the test is not correctly set up, which will lead to test failures. I've provided a suggestion to fix it.

Comment on lines +440 to +450
model_config = ModelConfig(max_model_len=16)
vllm_config = VllmConfig(model_config=model_config,
lora_config=lora_config)

vllm_config.scheduler_config.max_num_seqs = 4
vllm_config.scheduler_config.max_num_batched_tokens = 2
worker_adapter_manager = LRUCacheWorkerLoRAManager(
4, 2,
dummy_model.unpadded_vocab_size - lora_config.lora_extra_vocab_size,
lora_config, device, EMBEDDING_MODULES, EMBEDDING_PADDING_MODULES)
vllm_config, device, EMBEDDING_MODULES, EMBEDDING_PADDING_MODULES)

worker_adapter_manager.max_num_seqs = 4
worker_adapter_manager.max_num_batched_tokens = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The vllm_config is not correctly initialized for the test. The ModelConfig within it doesn't have the hf_config from the dummy_model, which will cause vllm_config.model_config.get_vocab_size() to return 0 inside LRUCacheWorkerLoRAManager. This leads to incorrect behavior, especially when calculating target_embedding_padding.

To fix this, you should associate the dummy_model.config with the vllm_config's model_config. Also, the manual setting of max_num_seqs and max_num_batched_tokens on worker_adapter_manager is redundant as these are already set during initialization from the vllm_config.

Suggested change
model_config = ModelConfig(max_model_len=16)
vllm_config = VllmConfig(model_config=model_config,
lora_config=lora_config)
vllm_config.scheduler_config.max_num_seqs = 4
vllm_config.scheduler_config.max_num_batched_tokens = 2
worker_adapter_manager = LRUCacheWorkerLoRAManager(
4, 2,
dummy_model.unpadded_vocab_size - lora_config.lora_extra_vocab_size,
lora_config, device, EMBEDDING_MODULES, EMBEDDING_PADDING_MODULES)
vllm_config, device, EMBEDDING_MODULES, EMBEDDING_PADDING_MODULES)
worker_adapter_manager.max_num_seqs = 4
worker_adapter_manager.max_num_batched_tokens = 2
model_config = ModelConfig(max_model_len=16)
vllm_config = VllmConfig(model_config=model_config,
lora_config=lora_config)
# Manually set hf_config for the test since ModelConfig doesn't take it
# in __init__ and we are not loading from a real model path.
vllm_config.model_config.hf_config = dummy_model.config
vllm_config.scheduler_config.max_num_seqs = 4
vllm_config.scheduler_config.max_num_batched_tokens = 2
worker_adapter_manager = LRUCacheWorkerLoRAManager(
vllm_config, device, EMBEDDING_MODULES, EMBEDDING_PADDING_MODULES)

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits. Otherwise LGTM

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@jeejeelee jeejeelee enabled auto-merge (squash) September 19, 2025 16:57
@Yikun
Copy link
Collaborator

Yikun commented Sep 19, 2025

Thanks for reminder! @jeejeelee
also cc @paulyu12 @wangxiyuan

@jeejeelee jeejeelee merged commit 2821986 into vllm-project:main Sep 19, 2025
57 checks passed
@xuechendi
Copy link
Contributor

@jeejeelee , thanks for the CC, will update in our plugin accordingly

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
@jeejeelee jeejeelee deleted the improve-lora branch September 20, 2025 01:28
@paulyu12
Copy link

Thanks for reminder! @jeejeelee also cc @paulyu12 @wangxiyuan

@Yikun @wangxiyuan Updated our plugin: vllm-project/vllm-ascend#3095

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Sep 22, 2025
### What this PR does / why we need it?
Fix the impact to LoRA that
vllm-project/vllm#25249 brought.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

---------

Signed-off-by: paulyu12 <507435917@qq.com>
@vanbasten23
Copy link
Collaborator

This PR broke the lora in TPU plugin. But thanks @gpolovets1 for the fix https://github.com/vllm-project/tpu_commons/pull/720. cc: @jeejeelee @Isotr0py

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
…-project#25249)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants