-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
… Transformers accuracy issues Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a workaround for numerical accuracy issues on ROCm by disabling Flash/Memory-Efficient SDP backends during specific multi-modal model tests. The changes to the Dockerfile and requirements are appropriate. However, the implementation in conftest.py uses pytest_configure to modify global state, which can lead to test interference. I've suggested refactoring this to a session-scoped pytest fixture to ensure proper setup and teardown, making the tests more robust and isolated.
| def pytest_configure(config): | ||
| """Disable Flash/MemEfficient SDP on ROCm to avoid HF | ||
| Transformers accuracy issues. | ||
| """ | ||
| if not current_platform.is_rocm(): | ||
| return | ||
|
|
||
| torch.backends.cuda.enable_flash_sdp(False) | ||
| torch.backends.cuda.enable_mem_efficient_sdp(False) | ||
| torch.backends.cuda.enable_math_sdp(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pytest_configure to modify global state like torch.backends can be risky as it doesn't provide a teardown mechanism. This change persists for the entire test session and could unintentionally affect other tests that rely on the default SDP backend settings.
A safer approach is to use a session-scoped, auto-use fixture. This allows you to set up the workaround before tests run and, crucially, tear it down afterward by restoring the original settings. This ensures that the test environment is left in a clean state, preventing side effects on other tests.
import pytest
@pytest.fixture(scope="session", autouse=True)
def rocm_sdp_workaround():
"""Disable Flash/MemEfficient SDP on ROCm to avoid HF
Transformers accuracy issues.
"""
if not current_platform.is_rocm():
yield
return
# Save original state
flash_enabled = torch.backends.cuda.flash_sdp_enabled()
mem_efficient_enabled = torch.backends.cuda.mem_efficient_sdp_enabled()
math_enabled = torch.backends.cuda.math_sdp_enabled()
# Apply workaround
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
yield
# Restore original state
torch.backends.cuda.enable_flash_sdp(flash_enabled)
torch.backends.cuda.enable_mem_efficient_sdp(mem_efficient_enabled)
torch.backends.cuda.enable_math_sdp(math_enabled)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The teardown is unnecessary here. Session-scoped fixtures tear down when the process is about to exit anyway—there's no subsequent code that would observe the "restored" state. This adds complexity without practical benefit. If individual tests needed different SDP settings, session scope wouldn't help regardless; you'd need function-scoped fixtures with their own save/restore.
… Transformers accuracy issues (vllm-project#29909) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
This PR addresses regression failures in
Multi-Modal Models Test (Standard)test group (Qwen2.5-VL,Qwen3-VL,LLaVA) on the ROCm platform.The root cause is identified as numerical inaccuracies or instability when using default Flash Attention or Memory Efficient SDP backends with Hugging Face Transformers on ROCm. This PR introduces a
conftest.pyto yield a session-scoped fixture that:flash_sdpandmem_efficient_sdpfor ROCm.math_sdpto guarantee a deterministic and valid execution path.Test Plan
Verified locally after building the vLLM ROCm Docker container.
Command 1 (Core Multimodal):
Command 2 (Whisper):