[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909

AndreasKaratzas · 2025-12-02T18:49:02Z

This PR addresses regression failures in Multi-Modal Models Test (Standard) test group (Qwen2.5-VL, Qwen3-VL, LLaVA) on the ROCm platform.

The root cause is identified as numerical inaccuracies or instability when using default Flash Attention or Memory Efficient SDP backends with Hugging Face Transformers on ROCm. This PR introduces a conftest.py to yield a session-scoped fixture that:

Explicitly disables flash_sdp and mem_efficient_sdp for ROCm.
Explicitly enables math_sdp to guarantee a deterministic and valid execution path.

Test Plan

Verified locally after building the vLLM ROCm Docker container.

Command 1 (Core Multimodal):

pytest -v -s models/multimodal -m core_model \
  --ignore models/multimodal/generation/test_whisper.py \
  --ignore models/multimodal/processing

Command 2 (Whisper):

pytest -v -s tests/models/multimodal/generation/test_whisper.py -m core_model

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…andard

… Transformers accuracy issues Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request introduces a workaround for numerical accuracy issues on ROCm by disabling Flash/Memory-Efficient SDP backends during specific multi-modal model tests. The changes to the Dockerfile and requirements are appropriate. However, the implementation in conftest.py uses pytest_configure to modify global state, which can lead to test interference. I've suggested refactoring this to a session-scoped pytest fixture to ensure proper setup and teardown, making the tests more robust and isolated.

gemini-code-assist · 2025-12-02T18:50:17Z

tests/models/multimodal/generation/conftest.py

+def pytest_configure(config):
+    """Disable Flash/MemEfficient SDP on ROCm to avoid HF
+    Transformers accuracy issues.
+    """
+    if not current_platform.is_rocm():
+        return
+
+    torch.backends.cuda.enable_flash_sdp(False)
+    torch.backends.cuda.enable_mem_efficient_sdp(False)
+    torch.backends.cuda.enable_math_sdp(True)


Using pytest_configure to modify global state like torch.backends can be risky as it doesn't provide a teardown mechanism. This change persists for the entire test session and could unintentionally affect other tests that rely on the default SDP backend settings.

A safer approach is to use a session-scoped, auto-use fixture. This allows you to set up the workaround before tests run and, crucially, tear it down afterward by restoring the original settings. This ensures that the test environment is left in a clean state, preventing side effects on other tests.

import pytest @pytest.fixture(scope="session", autouse=True) def rocm_sdp_workaround(): """Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues. """ if not current_platform.is_rocm(): yield return # Save original state flash_enabled = torch.backends.cuda.flash_sdp_enabled() mem_efficient_enabled = torch.backends.cuda.mem_efficient_sdp_enabled() math_enabled = torch.backends.cuda.math_sdp_enabled() # Apply workaround torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_mem_efficient_sdp(False) torch.backends.cuda.enable_math_sdp(True) yield # Restore original state torch.backends.cuda.enable_flash_sdp(flash_enabled) torch.backends.cuda.enable_mem_efficient_sdp(mem_efficient_enabled) torch.backends.cuda.enable_math_sdp(math_enabled)

The teardown is unnecessary here. Session-scoped fixtures tear down when the process is about to exit anyway—there's no subsequent code that would observe the "restored" state. This adds complexity without practical benefit. If individual tests needed different SDP settings, session scope wouldn't help regardless; you'd need function-scoped fixtures with their own save/restore.

… Transformers accuracy issues (vllm-project#29909) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

AndreasKaratzas added 4 commits December 2, 2025 16:29

Added packages used during multi-modal standard model testing

ac0ea24

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Removed v1 package copy from final Docker stage

124bb4a

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_multi_mod_st…

0c31fa5

…andard

[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF…

2417556

… Transformers accuracy issues Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from DarkLight1337, gshtras, tjtanaa and ywang96 as code owners December 2, 2025 18:49

mergify bot added ci/build multi-modality Related to multi-modality (#4194) rocm Related to AMD ROCm labels Dec 2, 2025

AndreasKaratzas mentioned this pull request Dec 2, 2025

[CI Failure]: mi325_1: Multi-Modal Models Test (Standard) #29520

Open

3 tasks

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 2, 2025

ywang96 approved these changes Dec 2, 2025

View reviewed changes

DarkLight1337 merged commit 506ed87 into vllm-project:main Dec 3, 2025
23 of 25 checks passed

AndreasKaratzas mentioned this pull request Dec 3, 2025

[CI Failure]: mi325_1: Python-only Installation Test #29443

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909

[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909

Uh oh!

AndreasKaratzas commented Dec 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Uh oh!

AndreasKaratzas Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909

[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues #29909

Uh oh!

Conversation

AndreasKaratzas commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndreasKaratzas commented Dec 2, 2025 •

edited by github-actions bot

Loading