Skip to content

[Bug]: Can't build VLLM wheel using VLLM docker image. #29669

@halyavin

Description

@halyavin

Your current environment

Collecting environment information...

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.4.210-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.86
CUDA_MODULE_LOADING set to   : 

Nvidia driver version        : 570.172.08
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
Versions of relevant libraries
==============================
[pip3] efficientnet_pytorch==0.7.1
[pip3] flashinfer-python==0.5.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] open_clip_torch==2.32.0
[pip3] pytorch-lightning==2.5.2
[pip3] pyzmq==27.1.0
[pip3] segmentation_models_pytorch==0.4.0
[pip3] sentence-transformers==3.2.1
[pip3] terratorch==1.0.2
[pip3] torch==2.9.0+cu129
[pip3] torchaudio==2.9.0+cu129
[pip3] torchgeo==0.7.0
[pip3] torchmetrics==1.7.4
[pip3] torchvision==0.24.0+cu129
[pip3] transformers==4.57.1
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.5.0
[pip3] tritonclient==2.51.0
[pip3] vector-quantize-pytorch==1.21.2
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.2.dev328+g626169f19 (git sha: 626169f19)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,driver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=cloudgaming,driver>=570,driver<571
NVIDIA_DRIVER_CAPABILITIES=compute,utility
CUDA_VERSION=12.9.1
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

When I build test VLLM docker image (--target test), I can't use it to build VLLM wheel. The problem appeared after #29270 . CMake can't find nvrtc library anymore. As far as I understand, CMake searches for libnvrtc.so but the test image has only libnvrtc.so.12.
Here is CMake output:

running build_ext
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: RelWithDebInfo
-- Target device: cuda
-- Found Python: /tmp/build-env-6lzv4z90/bin/python (found version "3.12.12") found components: Interpreter Development.Module Development.SABIModule
-- Found python matching: /tmp/build-env-6lzv4z90/bin/python.
-- Found CUDA: /usr/local/cuda (found version "12.9")
-- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Unable to find cublas_v2.h in either "/usr/local/cuda/include" or "/usr/math_libs/include"
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.9.86")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- PyTorch: CUDA detected: 12.9
-- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- PyTorch: CUDA toolkit directory: /usr/local/cuda
-- PyTorch: Header version is: 12.9
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- USE_CUDSS is set to 0. Compiling without cuDSS support
-- USE_CUFILE is set to 0. Compiling without cuFile support
-- Autodetected CUDA architecture(s):  9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0
CMake Warning at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:323 (message):
  pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore
  its value.  Please configure `TORCH_CUDA_ARCH_LIST` instead.
Call Stack (most recent call first):
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:91 (find_package)


-- Added CUDA NVCC flags for: -gencode;arch=compute_90,code=sm_90
CMake Warning at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
  CMakeLists.txt:91 (find_package)


-- Found Torch: /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/lib/libtorch.so
-- CUDA target architectures: 9.0
-- CUDA supported target architectures: 9.0
-- FetchContent base directory: /home/halyavin/vllm/.deps
-- Enabling cumem allocator extension.
-- CMake Version: 4.2.0
-- CUTLASS 4.2.1
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86")
-- CUDART: /usr/local/cuda/lib64/libcudart.so
-- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
-- NVRTC: Not Found
-- Default Install Location: install
-- Found Python3: /tmp/build-env-6lzv4z90/bin/python3.12 (found suitable version "3.12.12", minimum required is "3.5") found components: Interpreter
-- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a;100;100a;120;120a;121;121a;101;101a;100f;120f;121f;103a;103f;101f
-- Enable caching of reference results in conv unit tests
-- Enable rigorous conv problem sizes in conv unit tests
-- Grid Dependency Control (GDC) is enabled for SM100 kernels (required for programmatic dependent launches).
-- Using the following NVCC flags: 
  --expt-relaxed-constexpr
  -ftemplate-backtrace-limit=0
  -DCUTLASS_TEST_LEVEL=0
  -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1
  -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1
  -DCUTLASS_DEBUG_TRACE_LEVEL=0
  -DCUTLASS_SM100_FAMILY_ARCHS_ENABLED
  -Xcompiler=-Wconversion
  -Xcompiler=-fno-strict-aliasing
  -lineinfo
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Marlin generation script hash: c2af22dc04c6341bd52c8819eb3bf417
-- Last run Marlin generate script hash: 
-- Marlin generation completed successfully.
-- Building Marlin kernels for archs: 9.0+PTX
-- Not building AllSpark kernels as no compatible archs found in CUDA target architectures
-- Building scaled_mm_c3x_sm90 for archs: 9.0a
-- Not building scaled_mm_c3x_120 as no compatible archs found in CUDA target architectures
-- Not building scaled_mm_c3x_100 as no compatible archs found in CUDA target architectures
-- Building scaled_mm_c2x for archs: 8.9+PTX
-- Building sparse_scaled_mm_c3x for archs: 9.0a
-- Not building NVFP4 as no compatible archs were found.
-- Not building NVFP4 as no compatible archs were found.
-- Not building CUTLASS MLA as no compatible archs were found.
-- Building grouped_mm_c3x for archs: 9.0a
-- Not building grouped_mm_c3x as no compatible archs found in CUDA target architectures.
-- Building moe_data for archs: 9.0a
-- Not building blockwise_scaled_group_mm_sm100 as no compatible archs found in CUDA target architectures
-- Machete generation script hash: adf6b86715a35f4aaa9b407f95930771
-- Last run machete generate script hash: 
-- Machete generation completed successfully.
-- Building Machete kernels for archs: 9.0a
-- Building W4A8 kernels for archs: 9.0a
-- Building hadacore
-- Enabling C extension.
-- Marlin MOE generation script hash: cbc6643fa5c8c251cc716bb86d88915e
-- Last run Marlin MOE generate script hash: 
-- Marlin MOE generation completed successfully.
-- Building Marlin MOE kernels for archs: 9.0+PTX
-- Enabling moe extension.
-- [triton_kernels] Fetch from https://github.com/triton-lang/triton.git:v3.5.0
-- [triton_kernels] triton_kernels is available at /home/halyavin/vllm/.deps/triton_kernels-src/python/triton_kernels/triton_kernels/
-- FlashMLA is available at /home/halyavin/vllm/.deps/flashmla-src
CMake Warning (dev) at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/cmake/data/share/cmake-4.2/Modules/FetchContent.cmake:1963 (message):
  Calling FetchContent_Populate(qutlass) is deprecated, call
  FetchContent_MakeAvailable(qutlass) instead.  Policy CMP0169 can be set to
  OLD to allow FetchContent_Populate(qutlass) to be called directly for now,
  but the ability to call it with declared details will be removed completely
  in a future version.
Call Stack (most recent call first):
  cmake/external_projects/qutlass.cmake:27 (FetchContent_Populate)
  CMakeLists.txt:1044 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- [QUTLASS] QuTLASS is available at /home/halyavin/vllm/.deps/qutlass-src
-- [QUTLASS] Skipping build: no supported arch (12.0a / 10.0a) found in CUDA_ARCHS='9.0'.
-- Build type: RelWithDebInfo
-- Target device: cuda
CMake Warning at .deps/vllm-flash-attn-src/CMakeLists.txt:77 (message):
  Pytorch version 2.4.0 expected for CUDA build, saw 2.9.0 instead.


-- CUDA target architectures: 9.0
-- CUDA supported target architectures: 9.0
-- FA2_ARCHS: 8.0+PTX
-- FA3_ARCHS: 9.0a
-- vllm-flash-attn is available at /home/halyavin/vllm/.deps/vllm-flash-attn-src
-- Configuring done (73.9s)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_nvrtc_LIBRARY (ADVANCED)
    linked by target "cumem_allocator" in directory /home/halyavin/vllm

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
    main()
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 280, in build_wheel
    return _build_backend().build_wheel(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 432, in build_wheel
    return _build(['bdist_wheel'])
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 423, in _build
    return self._build_with_temp_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
    self.run_setup()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
    exec(code, locals())
  File "<string>", line 693, in <module>
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/__init__.py", line 115, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 186, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
    dist.run_commands()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
    self.run_command(cmd)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
    self.run_command("build")
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
    self.distribution.run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
    self.distribution.run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "<string>", line 272, in run
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 96, in run
    _build_ext.run(self)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
    self.build_extensions()
  File "<string>", line 229, in build_extensions
  File "<string>", line 206, in configure
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/halyavin/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/tmp/build-env-6lzv4z90/bin/python', '-DVLLM_PYTHON_PATH=/usr/lib/python312.zip:/usr/lib/python3.12:/usr/lib/python3.12/lib-dynload:/tmp/build-env-6lzv4z90/lib/python3.12/site-packages:/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_vendor', '-DFETCHCONTENT_BASE_DIR=/home/halyavin/vllm/.deps', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=100', '-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc']' returned non-zero exit status 1.

ERROR Backend subprocess exited when trying to invoke build_wheel

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions