Skip to content

Conversation

kliuae
Copy link
Contributor

@kliuae kliuae commented Mar 11, 2024

The fix in #3262 has addressed the issue of blockReduceSum using 32-thread warps in architecture supporting 64 as warp size. However, discrepancies were found in generation results before and after the fix, and the layernorm unit test unfortunately didn't pass on ROCm.

This PR adds fixes to these issues. It's been tested on MI210 and passes the layernorm unit test.

@WoosukKwon WoosukKwon added the rocm Related to AMD ROCm label Mar 11, 2024
Copy link
Contributor

@dllehr-amd dllehr-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on these. This is correctly calculating the lane and wid.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kliuae Thanks for submitting the PR!
@dllehr-amd Thanks for the review!

@WoosukKwon WoosukKwon merged commit c9415c1 into vllm-project:main Mar 11, 2024
starmpcc pushed a commit to starmpcc/vllm that referenced this pull request Mar 14, 2024
dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rocm Related to AMD ROCm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants