Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Longchat #555

Merged
merged 46 commits into from
Sep 27, 2023
Merged

Support Longchat #555

merged 46 commits into from
Sep 27, 2023

Conversation

LiuXiaoxuanPKU
Copy link
Collaborator

@LiuXiaoxuanPKU LiuXiaoxuanPKU commented Jul 24, 2023

Add LlamaLinearScalingRotaryEmbedding, LlamaDynamicNTKScalingRotaryEmbedding. Attempt to fix #333, #464, #479

@zhuohan123
Copy link
Member

CC @DachengLi1

@DachengLi1
Copy link

Wonderful! Thanks a lot Lily and Zhuohan! Will this be merged soon?

@zhuohan123
Copy link
Member

Wonderful! Thanks a lot Lily and Zhuohan! Will this be merged soon?

I believe so! @LiuXiaoxuanPKU is working on some correctness tests and making sure everything works for a long context. Feel free to try out this PR if you would like to start immediately!

@winglian
Copy link
Contributor

winglian commented Aug 1, 2023

@LiuXiaoxuanPKU You also need to add rope_scaling as an argument to argparse (https://github.com/LiuXiaoxuanPKU/vllm/blob/longchat/vllm/engine/arg_utils.py#L40) , otherwise this call on line 141 failse
engine_args = cls(**{attr: getattr(args, attr) for attr in attrs})

@lucasjinreal
Copy link

@LiuXiaoxuanPKU Will also support Baichuan model?

https://github.com/keezen/ntk_alibi like using AlibiNTK for scaling in alibi

@alanxmay
Copy link

🤔 Is there any reason to prevent this PR from being merged

@LiuXiaoxuanPKU
Copy link
Collaborator Author

🤔 Is there any reason to prevent this PR from being merged

We think vLLM might have some correctness issues. It might or might not be caused by this PR. To be more concrete, take a look at tests/test_longprompt.py in this PR. The four test cases should pass (HF passes all four cases), but currently, vLLM fails on them. We are still debugging it.

@WoosukKwon WoosukKwon self-requested a review September 26, 2023 18:49
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiuXiaoxuanPKU Awesome! Many thanks for the great work and sorry again for the very late review.

I've made several changes (mostly on the code style):

  1. I temporarily removed the tests for faster integration. I will submit another PR to add test for RoPE scaling.
  2. As we discussed offline, I removed rope_scaling from ModelConfig and EngineArgs. Now rope_scaling is always read from the model's config.json.
  3. I refactored rotary_embedding.py and added DynamicNTKScalingRotaryEmbedding.

@WoosukKwon WoosukKwon merged commit 21877b0 into vllm-project:main Sep 27, 2023
2 checks passed
@LiuXiaoxuanPKU LiuXiaoxuanPKU deleted the longchat branch November 6, 2023 18:04
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Condensed RotaryEmbeddings