-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Longchat #555
Support Longchat #555
Conversation
CC @DachengLi1 |
Wonderful! Thanks a lot Lily and Zhuohan! Will this be merged soon? |
I believe so! @LiuXiaoxuanPKU is working on some correctness tests and making sure everything works for a long context. Feel free to try out this PR if you would like to start immediately! |
@LiuXiaoxuanPKU You also need to add rope_scaling as an argument to argparse (https://github.com/LiuXiaoxuanPKU/vllm/blob/longchat/vllm/engine/arg_utils.py#L40) , otherwise this call on line 141 failse |
add rope scaling as a cli arg so openai server can load rope scaled models
@LiuXiaoxuanPKU Will also support Baichuan model? https://github.com/keezen/ntk_alibi like using AlibiNTK for scaling in alibi |
🤔 Is there any reason to prevent this PR from being merged |
We think vLLM might have some correctness issues. It might or might not be caused by this PR. To be more concrete, take a look at |
865c596
to
41a3b4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiuXiaoxuanPKU Awesome! Many thanks for the great work and sorry again for the very late review.
I've made several changes (mostly on the code style):
- I temporarily removed the tests for faster integration. I will submit another PR to add test for RoPE scaling.
- As we discussed offline, I removed
rope_scaling
fromModelConfig
andEngineArgs
. Nowrope_scaling
is always read from the model'sconfig.json
. - I refactored
rotary_embedding.py
and addedDynamicNTKScalingRotaryEmbedding
.
Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Add LlamaLinearScalingRotaryEmbedding, LlamaDynamicNTKScalingRotaryEmbedding. Attempt to fix #333, #464, #479