Skip to content

Conversation

@AndreasKaratzas
Copy link
Contributor

@AndreasKaratzas AndreasKaratzas commented Dec 2, 2025

This PR addresses regression failures in Multi-Modal Models Test (Standard) test group (Qwen2.5-VL, Qwen3-VL, LLaVA) on the ROCm platform.

The root cause is identified as numerical inaccuracies or instability when using default Flash Attention or Memory Efficient SDP backends with Hugging Face Transformers on ROCm. This PR introduces a conftest.py to yield a session-scoped fixture that:

  1. Explicitly disables flash_sdp and mem_efficient_sdp for ROCm.
  2. Explicitly enables math_sdp to guarantee a deterministic and valid execution path.

Test Plan

Verified locally after building the vLLM ROCm Docker container.

Command 1 (Core Multimodal):

pytest -v -s models/multimodal -m core_model \
  --ignore models/multimodal/generation/test_whisper.py \
  --ignore models/multimodal/processing

Command 2 (Whisper):

pytest -v -s tests/models/multimodal/generation/test_whisper.py -m core_model

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
… Transformers accuracy issues

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for numerical accuracy issues on ROCm by disabling Flash/Memory-Efficient SDP backends during specific multi-modal model tests. The changes to the Dockerfile and requirements are appropriate. However, the implementation in conftest.py uses pytest_configure to modify global state, which can lead to test interference. I've suggested refactoring this to a session-scoped pytest fixture to ensure proper setup and teardown, making the tests more robust and isolated.

Comment on lines +10 to +19
def pytest_configure(config):
"""Disable Flash/MemEfficient SDP on ROCm to avoid HF
Transformers accuracy issues.
"""
if not current_platform.is_rocm():
return

torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using pytest_configure to modify global state like torch.backends can be risky as it doesn't provide a teardown mechanism. This change persists for the entire test session and could unintentionally affect other tests that rely on the default SDP backend settings.

A safer approach is to use a session-scoped, auto-use fixture. This allows you to set up the workaround before tests run and, crucially, tear it down afterward by restoring the original settings. This ensures that the test environment is left in a clean state, preventing side effects on other tests.

import pytest


@pytest.fixture(scope="session", autouse=True)
def rocm_sdp_workaround():
    """Disable Flash/MemEfficient SDP on ROCm to avoid HF
    Transformers accuracy issues.
    """
    if not current_platform.is_rocm():
        yield
        return

    # Save original state
    flash_enabled = torch.backends.cuda.flash_sdp_enabled()
    mem_efficient_enabled = torch.backends.cuda.mem_efficient_sdp_enabled()
    math_enabled = torch.backends.cuda.math_sdp_enabled()

    # Apply workaround
    torch.backends.cuda.enable_flash_sdp(False)
    torch.backends.cuda.enable_mem_efficient_sdp(False)
    torch.backends.cuda.enable_math_sdp(True)

    yield

    # Restore original state
    torch.backends.cuda.enable_flash_sdp(flash_enabled)
    torch.backends.cuda.enable_mem_efficient_sdp(mem_efficient_enabled)
    torch.backends.cuda.enable_math_sdp(math_enabled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The teardown is unnecessary here. Session-scoped fixtures tear down when the process is about to exit anyway—there's no subsequent code that would observe the "restored" state. This adds complexity without practical benefit. If individual tests needed different SDP settings, session scope wouldn't help regardless; you'd need function-scoped fixtures with their own save/restore.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 2, 2025
@DarkLight1337 DarkLight1337 merged commit 506ed87 into vllm-project:main Dec 3, 2025
23 of 25 checks passed
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
… Transformers accuracy issues (vllm-project#29909)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants