Support Longchat #555

LiuXiaoxuanPKU · 2023-07-24T04:53:39Z

Add LlamaLinearScalingRotaryEmbedding, LlamaDynamicNTKScalingRotaryEmbedding. Attempt to fix #333, #464, #479

zhuohan123 · 2023-07-25T17:12:49Z

DachengLi1 · 2023-07-26T04:02:25Z

Wonderful! Thanks a lot Lily and Zhuohan! Will this be merged soon?

zhuohan123 · 2023-07-26T20:03:36Z

Wonderful! Thanks a lot Lily and Zhuohan! Will this be merged soon?

I believe so! @LiuXiaoxuanPKU is working on some correctness tests and making sure everything works for a long context. Feel free to try out this PR if you would like to start immediately!

winglian · 2023-08-01T13:24:16Z

@LiuXiaoxuanPKU You also need to add rope_scaling as an argument to argparse (https://github.com/LiuXiaoxuanPKU/vllm/blob/longchat/vllm/engine/arg_utils.py#L40) , otherwise this call on line 141 failse
engine_args = cls(**{attr: getattr(args, attr) for attr in attrs})

…odels

add rope scaling as a cli arg so openai server can load rope scaled models

lucasjinreal · 2023-08-07T07:53:41Z

@LiuXiaoxuanPKU Will also support Baichuan model?

https://github.com/keezen/ntk_alibi like using AlibiNTK for scaling in alibi

alanxmay · 2023-08-17T08:57:55Z

🤔 Is there any reason to prevent this PR from being merged

LiuXiaoxuanPKU · 2023-08-17T09:26:02Z

🤔 Is there any reason to prevent this PR from being merged

We think vLLM might have some correctness issues. It might or might not be caused by this PR. To be more concrete, take a look at tests/test_longprompt.py in this PR. The four test cases should pass (HF passes all four cases), but currently, vLLM fails on them. We are still debugging it.

WoosukKwon

@LiuXiaoxuanPKU Awesome! Many thanks for the great work and sorry again for the very late review.

I've made several changes (mostly on the code style):

I temporarily removed the tests for faster integration. I will submit another PR to add test for RoPE scaling.
As we discussed offline, I removed rope_scaling from ModelConfig and EngineArgs. Now rope_scaling is always read from the model's config.json.
I refactored rotary_embedding.py and added DynamicNTKScalingRotaryEmbedding.

Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

…ing on LRU (vllm-project#555)

LiuXiaoxuanPKU added 4 commits July 23, 2023 21:45

add suport for different ropes

9aee8c6

Merge branch 'main' into longchat

80d6105

format

5646fe1

Merge branch 'vllm-project:main' into longchat

d8ff362

LiuXiaoxuanPKU added 6 commits July 25, 2023 23:18

Merge branch 'main' into longchat

ba1803d

statically allocate cache for cos_sin_cache

b08b7e9

Merge branch 'vllm-project:main' into longchat

31acb7f

Merge branch 'longchat' of github.com:LiuXiaoxuanPKU/vllm into longchat

9acb8a0

format

57ce875

minor bug fix and format

b61c8dd

the-crypt-keeper mentioned this pull request Aug 3, 2023

Explore vicuna-1.5 16k context models the-crypt-keeper/can-ai-code#60

Closed

LiuXiaoxuanPKU mentioned this pull request Aug 5, 2023

Does vllm support vicuna-13b-v1.5-16k ? #674

Closed

add rope scaling as a cli arg so openai server can load rope scaled m…

fe402a6

…odels

winglian mentioned this pull request Aug 5, 2023

add rope scaling as a cli arg so openai server can load rope scaled models LiuXiaoxuanPKU/vllm#1

Merged

winglian and others added 4 commits August 5, 2023 11:32

set rope-scaling arg as json.loads so it can load from cli

bb8e153

merge with main

58e7121

Merge pull request #1 from winglian/longchat-args

b9012fb

add rope scaling as a cli arg so openai server can load rope scaled models

merge with main

14c65cc

zhuohan123 mentioned this pull request Aug 8, 2023

Implementation of Positional Interpolation (PI) Feature #690

Closed

LiuXiaoxuanPKU added 4 commits August 10, 2023 10:48

fix style and add test

fdc5ca3

Merge branch 'main' into longchat

e00f112

more style

659f7c9

add more tests

7148513

WoosukKwon self-requested a review September 26, 2023 18:49

WoosukKwon added 12 commits September 27, 2023 05:54

Merge branch 'main' into longchat

b62beb8

Roll back arg_utils

6fc06a6

Minor

3e7c318

Roll back

058d1fa

Consider RoPE in determining max len

9cdae35

Fix LLaMA

24470e0

Refactor rotary_embedding

46b548f

Refactor attention with rope

a955d44

Minor fix

8f614c3

Minor

e631736

Remove rope_scaling from docstring

4cdbc0a

Temporarily remove tests

41a3b4b

WoosukKwon force-pushed the longchat branch from 865c596 to 41a3b4b Compare September 27, 2023 10:19

Minor

198ee45

WoosukKwon approved these changes Sep 27, 2023

View reviewed changes

WoosukKwon merged commit 21877b0 into vllm-project:main Sep 27, 2023
2 checks passed

srka99 mentioned this pull request Sep 27, 2023

Support for longchat-7b-16k #358

Closed

viktor-ferenczi mentioned this pull request Oct 7, 2023

YaRN support implementation #1264

Merged

chu-tianxiang mentioned this pull request Oct 8, 2023

Bugged interference on Vicuna 7B GPTQ 16k #1213

Closed

allenhaozi mentioned this pull request Oct 11, 2023

qwen support long chat #1323

Closed

LiuXiaoxuanPKU deleted the longchat branch November 6, 2023 18:04

ponshane mentioned this pull request Jan 20, 2024

rope scaling doesn't work #2518

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Support Longchat and RoPE scaling (vllm-project#555)

9bece66

Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

hmellor mentioned this pull request Mar 9, 2024

how to set rope_scaling type to dynamic in vllm? #910

Closed

LilianJim mentioned this pull request Apr 24, 2024

[Feature]: Set RoPE scaling parameters dynamically #4334

Closed

sasha0552 mentioned this pull request May 7, 2024

[Frontend] Dynamic RoPE scaling #4638

Merged

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Support Longchat and RoPE scaling (vllm-project#555)

755338d

Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

sangmandu mentioned this pull request May 21, 2024

how to merge for different rope scaling? arcee-ai/mergekit#325

Open

rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Oct 7, 2024

[Scratch] Use finished_requsets_ids to delete session instead of rely…

24e650b

…ing on LRU (vllm-project#555)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Longchat #555

Support Longchat #555

LiuXiaoxuanPKU commented Jul 24, 2023 •

edited

Loading

zhuohan123 commented Jul 25, 2023

DachengLi1 commented Jul 26, 2023

zhuohan123 commented Jul 26, 2023

winglian commented Aug 1, 2023

lucasjinreal commented Aug 7, 2023

alanxmay commented Aug 17, 2023

LiuXiaoxuanPKU commented Aug 17, 2023

WoosukKwon left a comment

Support Longchat #555

Support Longchat #555

Conversation

LiuXiaoxuanPKU commented Jul 24, 2023 • edited Loading

zhuohan123 commented Jul 25, 2023

DachengLi1 commented Jul 26, 2023

zhuohan123 commented Jul 26, 2023

winglian commented Aug 1, 2023

lucasjinreal commented Aug 7, 2023

alanxmay commented Aug 17, 2023

LiuXiaoxuanPKU commented Aug 17, 2023

WoosukKwon left a comment

Choose a reason for hiding this comment

LiuXiaoxuanPKU commented Jul 24, 2023 •

edited

Loading