Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Falcon support (new) #592

Merged
merged 16 commits into from
Aug 2, 2023
Merged

Add Falcon support (new) #592

merged 16 commits into from
Aug 2, 2023

Conversation

zhuohan123
Copy link
Collaborator

@zhuohan123 zhuohan123 commented Jul 27, 2023

Close #195 #197 #356

This PR replaces PR #321

Also, revert an early all-reduce optimization that stores all all-reduce results in a shared buffer. This can lead to wrong results in distributed settings for models with two parallel all-reduce branches.

Correctness check:

  • tiiuae/falcon-rw-1b
  • tiiuae/falcon-rw-7b
  • tiiuae/falcon-7b
  • tiiuae/falcon-7b-instruct
  • tiiuae/falcon-40b
  • tiiuae/falcon-40b-instruct

@zhuohan123 zhuohan123 changed the title [WIP] Add Falcon support (new) Add Falcon support (new) Jul 28, 2023
@zhuohan123
Copy link
Collaborator Author

@WoosukKwon This PR is ready for review.

This was referenced Jul 29, 2023
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhuohan123 Awesome! Thanks a million for this hard work! Left minor comments.

vllm/transformers_utils/config.py Outdated Show resolved Hide resolved
vllm/transformers_utils/configs/__init__.py Show resolved Hide resolved
vllm/config.py Show resolved Hide resolved
vllm/model_executor/layers/attention.py Outdated Show resolved Hide resolved
csrc/pos_encoding_kernels.cu Outdated Show resolved Hide resolved
vllm/model_executor/models/falcon.py Show resolved Hide resolved
@zhuohan123 zhuohan123 merged commit 1b0bd0f into main Aug 2, 2023
2 checks passed
@gesanqiu
Copy link
Contributor

gesanqiu commented Aug 3, 2023

This PR seems lead wrong outputs to BigCoder models when TP=2.

@zhuohan123 zhuohan123 deleted the falcon-zhuohan branch August 4, 2023 05:50
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Falcon-7B / 40B models
3 participants