[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA #3262

dllehr-amd · 2024-03-07T21:46:59Z

blockReduceSum was defaulting to 32 for warp size regardless of the architecture.

Bonus, refactor cuda_compat.h to hold WARP_SIZE define instead of the attention_kernels.cuh

blockReduceSum was defaulting to 32 for warp size regardless of the architecture. Bonus, refactor cuda_compat.h to hold WARP_SIZE define instead of the attention_kernels.cuh

zhuohan123

LGTM! Thanks for the fix!

vllm-project#3262)

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA

dc5865d

blockReduceSum was defaulting to 32 for warp size regardless of the architecture. Bonus, refactor cuda_compat.h to hold WARP_SIZE define instead of the attention_kernels.cuh

zhuohan123 approved these changes Mar 10, 2024

View reviewed changes

zhuohan123 merged commit e4a28e5 into vllm-project:main Mar 10, 2024
23 checks passed

kliuae mentioned this pull request Mar 11, 2024

[ROCm] Fix warp and lane calculation in blockReduceSum #3321

Merged

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (

84e31ca

vllm-project#3262)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA #3262

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA #3262

dllehr-amd commented Mar 7, 2024

zhuohan123 left a comment

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA #3262

[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA #3262

Conversation

dllehr-amd commented Mar 7, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment