Skip to content

Conversation

@Erland366
Copy link
Collaborator

People complaining that they can't use the LoRA with VLLM because load_lora method is not available. This is because when loading a LoRA model, get_peft_model goes into the Unsloth: Already have LoRA adapters! We shall skip this step. patching.

This PR is simply putting the patching inside that stage as well. We can't just move the patching to the beginning on the function or else when doing inference while training (like training GRPO), it'll do the inference only on the base model

This PR still has a flaw that the inference of vLLM has to be run once first before it's able to do inference on the loaded lora. Which maybe related of this part in the unsloth-zoo?

https://github.com/unslothai/unsloth-zoo/blob/a9857088bdaf412bef36800d837a3a37657555c8/unsloth_zoo/vllm_utils.py#L1206-L1212

Related issue -> #1670 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant