-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add support for xverse #3610
Conversation
What is the difference of arch between llama and xverse? |
Nice question. The current xverse model architecture is no different from llama, and it is expected that xverse will add moe features within two weeks and merge them into To maintain an independent update progress, it is necessary to separately support the xverse architecture in VLLM. |
Hello @WoosukKwon , I've submitted a PR #3610 and I would greatly appreciate your help with reviewing the code changes and provide your feedback. Thank you for your time and assistance. Best regards, |
@hxer7963 Could you please fix the formatting error by running |
README.md
Outdated
- Yi (`01-ai/Yi-6B`, `01-ai/Yi-34B`, etc.) | ||
- Xverse (`xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's keep the alphabetic order :)
- Yi (`01-ai/Yi-6B`, `01-ai/Yi-34B`, etc.) | |
- Xverse (`xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.) | |
- Xverse (`xverse/XVERSE-7B-Chat`, `xverse/XVERSE-13B-Chat`, `xverse/XVERSE-65B-Chat`, etc.) | |
- Yi (`01-ai/Yi-6B`, `01-ai/Yi-34B`, etc.) |
vllm/model_executor/models/xverse.py
Outdated
hf_model_weights_iterator) | ||
from vllm.sequence import SamplerOutput | ||
|
||
KVCache = Tuple[torch.Tensor, torch.Tensor] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace KVCache
with torch.Tensor
. We changed the type recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for WoosukKwon's reply and suggestions. The issues mentioned in the review have been fixed in the latest commit.
Can we simply use this: |
- Fix typo in README to keep the alphabetic order - Replace KVCache with torch.Tensor in xverse.py
Yes, the new moe model will use the same name with XverseForCausalLM, so we want to add a new name. |
Hello @WoosukKwon , I've addressed all the feedback in the PR, and all checks have passed. Could you please take another look at the code changes and provide your review? Thank you for your time and assistance. Best regards, hxer7963. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for submitting the PR. Looking forward to the next model release!
Co-authored-by: willhe <hexin@xverse.cn> Co-authored-by: root <root@localhost.localdomain>
Add support for xverse models.
We tested the xverse 7B/13B/65B chat models and the quantized GPTQ model locally, and vllm responses are normal.
You can verify by downloading xverse models from Hugging Face (https://huggingface.co/xverse) or Modelscope (https://www.modelscope.cn/search?search=xverse).
Furthermore, before the PR, we executed format.sh.
Request: After the current PR is merged, can we add a new tag so that the latest version of the package supports inference of xverse models via pip installation.