[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

hongxiayang · 2024-02-05T18:28:26Z

This pull request adds vllm support for AMD Radeon™ 7900 series GPU (gfx1100) without using flash-attention.
Currently, flash-attention does not fully support gfx1100. So, we used vllm reference implementation instead.

Note:
When building the docker image, pass --build-arg BUILD_FA="0" to the docker build command.

hongxiayang · 2024-02-06T02:05:12Z

The current head of the vllm still can not compile successfully on ROCm. See issues #2725 and #2646.
I used a branch that can compile to verify the code changes: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6

hongxiayang · 2024-02-06T18:54:11Z

The current head of the vllm still can not compile successfully on ROCm. See issues #2725 and #2646. I used a branch that can compile to verify the code changes: https://github.com/hongxiayang/vllm/tree/navi3x_rocm6

I am working on fixing the build now.

zhuohan123

LGTM! Thanks for the fix!

leavelet · 2024-02-07T17:46:56Z

Have you tried this repo from AMD? Part of gfx1100 support has been added in howiejay/navi_support branch, but I'm not sure whether that's enough for vLLM.

hongxiayang · 2024-02-07T17:47:57Z

LGTM! Thanks for the fix!

Thanks @zhuohan123 . I have updated the branch and resolved the conflict.

[ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support (vllm-project#2790) Add documentation on how to do incremental builds (vllm-project#2796) [Ray] Integration compiled DAG off by default (vllm-project#2471) Disable custom all reduce by default (vllm-project#2808) add usage context removed usage_context from Engine_args Move IO to another process added http request [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (vllm-project#2768) Add documentation section about LoRA (vllm-project#2834) Refactor 2 awq gemm kernels into m16nXk32 (vllm-project#2723) Co-authored-by: Chunan Zeng <chunanzeng@Chunans-Air.attlocal.net> Added additional arg for from_engine_args comments

…ntion (vllm-project#2768)

hongxiayang changed the title ~~[ROCm] support navi3x gfx1100 without using flash-attention~~ [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention Feb 5, 2024

WoosukKwon added the rocm label Feb 5, 2024

hongxiayang marked this pull request as ready for review February 6, 2024 15:22

zhuohan123 approved these changes Feb 7, 2024

View reviewed changes

hongxiayang added 3 commits February 7, 2024 11:17

add gfx1100 to the list

b17d0ce

fix ruff and yapf and update doc

a0824ce

minor

a660ad7

hongxiayang force-pushed the navi3x_rocm branch from 1ef7c67 to a660ad7 Compare February 7, 2024 16:18

hongxiayang mentioned this pull request Feb 7, 2024

problematic math backend for F.scaled_dot_product_attention in ROCm 6.0 when testing using vllm for generate pytorch/pytorch#119389

Closed

fix doc

d79e749

zhuohan123 merged commit 0580aab into vllm-project:main Feb 11, 2024
17 checks passed

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…

5d228c1

…ntion (vllm-project#2768)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…

3ce39a2

…ntion (vllm-project#2768)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…

8efb21f

…ntion (vllm-project#2768)

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…

dafb3d2

…ntion (vllm-project#2768)

hongxiayang mentioned this pull request Mar 22, 2024

[ROCm] [Hardware][AMD] Remove xformer patches and ray issue fix #3558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

hongxiayang commented Feb 5, 2024 •

edited

hongxiayang commented Feb 6, 2024

hongxiayang commented Feb 6, 2024

zhuohan123 left a comment

leavelet commented Feb 7, 2024

hongxiayang commented Feb 7, 2024

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention #2768

Conversation

hongxiayang commented Feb 5, 2024 • edited

hongxiayang commented Feb 6, 2024

hongxiayang commented Feb 6, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment

leavelet commented Feb 7, 2024

hongxiayang commented Feb 7, 2024

hongxiayang commented Feb 5, 2024 •

edited