Skip to content

Conversation

@simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Oct 29, 2025

@simon-mo simon-mo merged commit 9007bf5 into main Oct 29, 2025
5 of 10 checks passed
@simon-mo simon-mo deleted the revert-27598-use-prebuilt-xformers branch October 29, 2025 03:58
@mergify mergify bot added the ci/build label Oct 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that used a pre-built xformers wheel, opting instead to build xformers from source within the Docker image. This is done by adding a RUN command to the Dockerfile for compilation and removing the xformers dependency from requirements/cuda.txt. However, the implementation has a critical flaw: the list of CUDA architectures for the xformers build is too restrictive. My review provides a specific suggestion to expand this list to ensure broader GPU compatibility (notably for Volta GPUs like V100) and improve performance on modern GPUs by including them for ahead-of-time compilation.

# TODO (huydhn): Remove this once xformers is released for 2.9.0
RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH'
. /etc/environment
export TORCH_CUDA_ARCH_LIST='7.5 8.0+PTX 9.0a'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The hardcoded TORCH_CUDA_ARCH_LIST is overly restrictive and may cause compatibility and performance issues for users of the Docker image.

  • Dropped Architectures: This list removes support for Volta (7.0), which is used by V100 GPUs. This is a significant regression as V100s are still widely used in cloud environments and research.
  • Performance on Modern GPUs: It relies on just-in-time (JIT) compilation from PTX for modern architectures like Ada Lovelace (8.9) and Hopper (9.0), as they are not explicitly listed. This can lead to significant startup delays when vLLM is first run on these GPUs.
  • Inconsistency: The default torch_cuda_arch_list defined as a build argument earlier in this Dockerfile (line 144) is much more comprehensive. While that argument is not available in this build stage, its value serves as a good reference for what architectures are generally supported.

To ensure broad compatibility and optimal performance, I recommend using a more inclusive list of architectures. This suggested list restores Volta support, provides ahead-of-time (AOT) compilation for common modern GPUs, and maintains forward compatibility for future architectures via PTX.

    export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0a+PTX'

bhagyashrigai pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Oct 29, 2025
huydhn added a commit to huydhn/vllm that referenced this pull request Oct 29, 2025
huydhn added a commit to huydhn/vllm that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants