Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LLaMA-2 #505

Merged
merged 9 commits into from
Jul 20, 2023
Merged

Add support for LLaMA-2 #505

merged 9 commits into from
Jul 20, 2023

Conversation

zhuohan123
Copy link
Collaborator

@zhuohan123 zhuohan123 commented Jul 18, 2023

Fix #501

Update: this PR has some correctness issues on 70B models. Will look into it.

@WoosukKwon This PR is ready to go. Please review and let's merge it!

@zhuohan123 zhuohan123 changed the title [WIP] Add support for LLaMA-2 Add support for LLaMA-2 Jul 18, 2023
@gesanqiu
Copy link
Contributor

I notice that you didn't new a llama2.py, so this PR compatible with both llama and llama2?

@zhuohan123
Copy link
Collaborator Author

I notice that you didn't new a llama2.py, so this PR compatible with both llama and llama2?

Yes, old model should still be compatible.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhuohan123 Awesome! Thanks for the great work! Just left minor comments.

Please double-check that this PR doesn't break LLaMA V1 and other models using RoPE, before merging the PR.

csrc/pos_encoding_kernels.cu Show resolved Hide resolved
vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved
vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved
vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved
@zhuohan123 zhuohan123 merged commit 6fc2a38 into main Jul 20, 2023
2 checks passed
gqjia added a commit to gqjia/vllm that referenced this pull request Jul 21, 2023
@zhuohan123 zhuohan123 deleted the support-llama-2 branch July 25, 2023 21:59
@ri938
Copy link
Contributor

ri938 commented Jul 27, 2023

I am getting an error when trying to load some LLama V1 models:

LlamaConfig object has no attribute 'num_key_value_heads'

@HarrisonBT
Copy link

WARNING 07-28 03:23:18 scheduler.py:196] Input prompt (2716 tokens) is too long and exceeds limit of 4096

@tuyaao
Copy link

tuyaao commented Jul 31, 2023

I am getting an error when trying to load some LLama V1 models:

LlamaConfig object has no attribute 'num_key_value_heads'
same error from me on lastest master commit: 953f28c

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support LLaMA-2
6 participants