Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

Merged
merged 4 commits into from Feb 11, 2024

Conversation

hongxiayang
Copy link
Contributor

@hongxiayang hongxiayang commented Feb 5, 2024

This pull request adds vllm support for AMD Radeon™ 7900 series GPU (gfx1100) without using flash-attention.
Currently, flash-attention does not fully support gfx1100. So, we used vllm reference implementation instead.

Note:
When building the docker image, pass --build-arg BUILD_FA="0" to the docker build command.

@hongxiayang hongxiayang changed the title [ROCm] support navi3x gfx1100 without using flash-attention [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention Feb 5, 2024
@WoosukKwon WoosukKwon added the rocm label Feb 5, 2024
@hongxiayang
Copy link
Contributor Author

The current head of the vllm still can not compile successfully on ROCm. See issues #2725 and #2646.
I used a branch that can compile to verify the code changes: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6

@hongxiayang hongxiayang marked this pull request as ready for review February 6, 2024 15:22
@hongxiayang
Copy link
Contributor Author

The current head of the vllm still can not compile successfully on ROCm. See issues #2725 and #2646. I used a branch that can compile to verify the code changes: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6

I am working on fixing the build now.

Copy link
Collaborator

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@leavelet
Copy link

leavelet commented Feb 7, 2024

Have you tried this repo from AMD? Part of gfx1100 support has been added in howiejay/navi_support branch, but I'm not sure whether that's enough for vLLM.

@hongxiayang
Copy link
Contributor Author

LGTM! Thanks for the fix!

Thanks @zhuohan123 . I have updated the branch and resolved the conflict.

@zhuohan123 zhuohan123 merged commit 0580aab into vllm-project:main Feb 11, 2024
17 checks passed
yhu422 added a commit to yhu422/vllm that referenced this pull request Feb 13, 2024
[ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support  (vllm-project#2790)

Add documentation on how to do incremental builds (vllm-project#2796)

[Ray] Integration compiled DAG off by default (vllm-project#2471)

Disable custom all reduce by default (vllm-project#2808)

add usage context

removed usage_context from Engine_args

Move IO to another process

added http request

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (vllm-project#2768)

Add documentation section about LoRA (vllm-project#2834)

Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723)

Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net>

Added additional arg for from_engine_args

comments
jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants