-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature add lora support for Qwen2 #3177
Feature add lora support for Qwen2 #3177
Conversation
can you post a working model/example? |
Qwen is similar to llama's architecture. But its vocab size is 150k and intermediate size is 13696. server: CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
--trust-remote-code \
--max-model-len 4096 \
--model ~/qwen/Qwen1.5-14B-Chat \
--enable-lora \
--lora-modules lora1=~/lora/xxx lora2=~/lora/xxx test: curl --request POST \
--url http://localhost:8000/v1/chat/completions \
--header 'content-type: application/json' \
--data '{
"model": "lora2",
"messages": [ { "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "China is a" }
],
"stop_token_ids": [151645, 151644, 151643],
"max_tokens": 5,
"temperature": 0.7
}' response: {"id":"cmpl-7a38720c85814f1faf3d83e1d8718573","object":"chat.completion","created":1122584,"model":"lora2","choices":[{"index":0,"message":{"role":"assistant","content":"country located in East Asia"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":22,"total_tokens":27,"completion_tokens":5}} |
I think this looks good. Have you verified that the outputs look correct (ie the adapter is being applied)? |
VLLM_INSTALL_PUNICA_KERNELS=1 pip install -e .
…________________________________
发件人: WangxuP ***@***.***>
发送时间: 2024年3月6日 23:52
收件人: vllm-project/vllm ***@***.***>
抄送: whyiug ***@***.***>; Author ***@***.***>
主题: Re: [vllm-project/vllm] Feature add lora support for qwen/qwen2 (PR #3177)
I downloaded the code for the current branch to be merged myself. Using python setup.py install for compilation and installation, and then loading two Lora models for conversion, the following problem was found. How can I solve it?
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm-0.3.3+cu120-py3.10-linux-x86_64.egg/vllm/lora/punica.py", line 83, in add_lora
import vllm._punica_C as punica_kernels
ModuleNotFoundError: No module named 'vllm._punica_C'
The above exception was the direct cause of the following exception:
ImportError: punica LoRA kernels could not be imported. If you built vLLM from source, make sure VLLM_INSTALL_PUNICA_KERNELS=1 env var was set.
―
Reply to this email directly, view it on GitHub<#3177 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABS4CUTUMSC4FJW7ASBCS5TYW43SDAVCNFSM6AAAAABEFAJVHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGE4DOOJVGY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Yes, I tested two loras on both qwen and qwen2(14b). The lora returns the same as after merged. |
thank you!
|
Yes, V100 is not supported(compute capability = 7.0), same issue: #3197 |
@whyiug The current PR can support Qwen 1.5, but it cannot support Qwen,Because the implementation of the vllm/vllm/lora/layers. py method class QKVParallelLinearWithLora (ColumnParallelLinearWithLoRA) is problematic, if it is qwen, it needs to follow the logic of ColumnParallelLinearWithLoRA instead of QKVParallelLinearWithLora, and the qkv proj weights of qwen are integrated,The qwen model does not include q_proj, k_proj, and v_proj, they are a whole qkv_proj, so your stacked_params mapping=[ #(paramname, shard_name, shard_id) ("qkv proj", "q_proj", "q"), ("qkv_proj", "k_proj", "k"), ("qkv proj", "v proj", "v"), ("gate_up_proj", "w2", 0), ("gate_up_proj", "w1", 1), ]The code is problematic,You can take a Qwen test and you will find the problem,The current code can run Qwen normally, but the effect is not correct. You can fine tune a Lora model and try it out |
You are right, I found the difference in https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/modeling_qwen.py |
@simon-mo If I may ask, will this feature be added in version 0.3.4? Thanks |
Closes #3054