[Bug]: Can't build VLLM wheel using VLLM docker image.

### Your current environment

```
Collecting environment information...

==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version                : Could not collect
CMake version                : Could not collect
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.4.210-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.9.86
CUDA_MODULE_LOADING set to   : 

Nvidia driver version        : 570.172.08
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
Versions of relevant libraries
==============================
[pip3] efficientnet_pytorch==0.7.1
[pip3] flashinfer-python==0.5.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] open_clip_torch==2.32.0
[pip3] pytorch-lightning==2.5.2
[pip3] pyzmq==27.1.0
[pip3] segmentation_models_pytorch==0.4.0
[pip3] sentence-transformers==3.2.1
[pip3] terratorch==1.0.2
[pip3] torch==2.9.0+cu129
[pip3] torchaudio==2.9.0+cu129
[pip3] torchgeo==0.7.0
[pip3] torchmetrics==1.7.4
[pip3] torchvision==0.24.0+cu129
[pip3] transformers==4.57.1
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.5.0
[pip3] tritonclient==2.51.0
[pip3] vector-quantize-pytorch==1.21.2
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.2.dev328+g626169f19 (git sha: 626169f19)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_REQUIRE_CUDA=cuda>=12.9 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 brand=unknown,driver>=570,driver<571 brand=grid,driver>=570,driver<571 brand=tesla,driver>=570,driver<571 brand=nvidia,driver>=570,driver<571 brand=quadro,driver>=570,driver<571 brand=quadrortx,driver>=570,driver<571 brand=nvidiartx,driver>=570,driver<571 brand=vapps,driver>=570,driver<571 brand=vpc,driver>=570,driver<571 brand=vcs,driver>=570,driver<571 brand=vws,driver>=570,driver<571 brand=cloudgaming,driver>=570,driver<571
NVIDIA_DRIVER_CAPABILITIES=compute,utility
CUDA_VERSION=12.9.1
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```


### 🐛 Describe the bug

When I build test VLLM docker image (`--target test`), I can't use it to build VLLM wheel. The problem appeared after https://github.com/vllm-project/vllm/pull/29270 . CMake can't find nvrtc library anymore. As far as I understand, CMake searches for libnvrtc.so but the test image has only libnvrtc.so.12.
Here is CMake output:
```
running build_ext
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: RelWithDebInfo
-- Target device: cuda
-- Found Python: /tmp/build-env-6lzv4z90/bin/python (found version "3.12.12") found components: Interpreter Development.Module Development.SABIModule
-- Found python matching: /tmp/build-env-6lzv4z90/bin/python.
-- Found CUDA: /usr/local/cuda (found version "12.9")
-- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Unable to find cublas_v2.h in either "/usr/local/cuda/include" or "/usr/math_libs/include"
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.9.86")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- PyTorch: CUDA detected: 12.9
-- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- PyTorch: CUDA toolkit directory: /usr/local/cuda
-- PyTorch: Header version is: 12.9
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- USE_CUDSS is set to 0. Compiling without cuDSS support
-- USE_CUFILE is set to 0. Compiling without cuFile support
-- Autodetected CUDA architecture(s):  9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0
CMake Warning at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:323 (message):
  pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore
  its value.  Please configure `TORCH_CUDA_ARCH_LIST` instead.
Call Stack (most recent call first):
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:91 (find_package)


-- Added CUDA NVCC flags for: -gencode;arch=compute_90,code=sm_90
CMake Warning at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
  CMakeLists.txt:91 (find_package)


-- Found Torch: /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/torch/lib/libtorch.so
-- CUDA target architectures: 9.0
-- CUDA supported target architectures: 9.0
-- FetchContent base directory: /home/halyavin/vllm/.deps
-- Enabling cumem allocator extension.
-- CMake Version: 4.2.0
-- CUTLASS 4.2.1
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86")
-- CUDART: /usr/local/cuda/lib64/libcudart.so
-- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
-- NVRTC: Not Found
-- Default Install Location: install
-- Found Python3: /tmp/build-env-6lzv4z90/bin/python3.12 (found suitable version "3.12.12", minimum required is "3.5") found components: Interpreter
-- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a;100;100a;120;120a;121;121a;101;101a;100f;120f;121f;103a;103f;101f
-- Enable caching of reference results in conv unit tests
-- Enable rigorous conv problem sizes in conv unit tests
-- Grid Dependency Control (GDC) is enabled for SM100 kernels (required for programmatic dependent launches).
-- Using the following NVCC flags: 
  --expt-relaxed-constexpr
  -ftemplate-backtrace-limit=0
  -DCUTLASS_TEST_LEVEL=0
  -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1
  -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1
  -DCUTLASS_DEBUG_TRACE_LEVEL=0
  -DCUTLASS_SM100_FAMILY_ARCHS_ENABLED
  -Xcompiler=-Wconversion
  -Xcompiler=-fno-strict-aliasing
  -lineinfo
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Marlin generation script hash: c2af22dc04c6341bd52c8819eb3bf417
-- Last run Marlin generate script hash: 
-- Marlin generation completed successfully.
-- Building Marlin kernels for archs: 9.0+PTX
-- Not building AllSpark kernels as no compatible archs found in CUDA target architectures
-- Building scaled_mm_c3x_sm90 for archs: 9.0a
-- Not building scaled_mm_c3x_120 as no compatible archs found in CUDA target architectures
-- Not building scaled_mm_c3x_100 as no compatible archs found in CUDA target architectures
-- Building scaled_mm_c2x for archs: 8.9+PTX
-- Building sparse_scaled_mm_c3x for archs: 9.0a
-- Not building NVFP4 as no compatible archs were found.
-- Not building NVFP4 as no compatible archs were found.
-- Not building CUTLASS MLA as no compatible archs were found.
-- Building grouped_mm_c3x for archs: 9.0a
-- Not building grouped_mm_c3x as no compatible archs found in CUDA target architectures.
-- Building moe_data for archs: 9.0a
-- Not building blockwise_scaled_group_mm_sm100 as no compatible archs found in CUDA target architectures
-- Machete generation script hash: adf6b86715a35f4aaa9b407f95930771
-- Last run machete generate script hash: 
-- Machete generation completed successfully.
-- Building Machete kernels for archs: 9.0a
-- Building W4A8 kernels for archs: 9.0a
-- Building hadacore
-- Enabling C extension.
-- Marlin MOE generation script hash: cbc6643fa5c8c251cc716bb86d88915e
-- Last run Marlin MOE generate script hash: 
-- Marlin MOE generation completed successfully.
-- Building Marlin MOE kernels for archs: 9.0+PTX
-- Enabling moe extension.
-- [triton_kernels] Fetch from https://github.com/triton-lang/triton.git:v3.5.0
-- [triton_kernels] triton_kernels is available at /home/halyavin/vllm/.deps/triton_kernels-src/python/triton_kernels/triton_kernels/
-- FlashMLA is available at /home/halyavin/vllm/.deps/flashmla-src
CMake Warning (dev) at /tmp/build-env-6lzv4z90/lib/python3.12/site-packages/cmake/data/share/cmake-4.2/Modules/FetchContent.cmake:1963 (message):
  Calling FetchContent_Populate(qutlass) is deprecated, call
  FetchContent_MakeAvailable(qutlass) instead.  Policy CMP0169 can be set to
  OLD to allow FetchContent_Populate(qutlass) to be called directly for now,
  but the ability to call it with declared details will be removed completely
  in a future version.
Call Stack (most recent call first):
  cmake/external_projects/qutlass.cmake:27 (FetchContent_Populate)
  CMakeLists.txt:1044 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- [QUTLASS] QuTLASS is available at /home/halyavin/vllm/.deps/qutlass-src
-- [QUTLASS] Skipping build: no supported arch (12.0a / 10.0a) found in CUDA_ARCHS='9.0'.
-- Build type: RelWithDebInfo
-- Target device: cuda
CMake Warning at .deps/vllm-flash-attn-src/CMakeLists.txt:77 (message):
  Pytorch version 2.4.0 expected for CUDA build, saw 2.9.0 instead.


-- CUDA target architectures: 9.0
-- CUDA supported target architectures: 9.0
-- FA2_ARCHS: 8.0+PTX
-- FA3_ARCHS: 9.0a
-- vllm-flash-attn is available at /home/halyavin/vllm/.deps/vllm-flash-attn-src
-- Configuring done (73.9s)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_nvrtc_LIBRARY (ADVANCED)
    linked by target "cumem_allocator" in directory /home/halyavin/vllm

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
    main()
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pyproject_hooks/_in_process/_in_process.py", line 280, in build_wheel
    return _build_backend().build_wheel(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 432, in build_wheel
    return _build(['bdist_wheel'])
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 423, in _build
    return self._build_with_temp_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
    self.run_setup()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
    exec(code, locals())
  File "<string>", line 693, in <module>
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/__init__.py", line 115, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 186, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
    dist.run_commands()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
    self.run_command(cmd)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
    self.run_command("build")
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
    self.distribution.run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
    self.distribution.run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in run_command
    super().run_command(command)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
    cmd_obj.run()
  File "<string>", line 272, in run
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 96, in run
    _build_ext.run(self)
  File "/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
    self.build_extensions()
  File "<string>", line 229, in build_extensions
  File "<string>", line 206, in configure
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/home/halyavin/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/tmp/build-env-6lzv4z90/bin/python', '-DVLLM_PYTHON_PATH=/usr/lib/python312.zip:/usr/lib/python3.12:/usr/lib/python3.12/lib-dynload:/tmp/build-env-6lzv4z90/lib/python3.12/site-packages:/tmp/build-env-6lzv4z90/lib/python3.12/site-packages/setuptools/_vendor', '-DFETCHCONTENT_BASE_DIR=/home/halyavin/vllm/.deps', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=100', '-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc']' returned non-zero exit status 1.

ERROR Backend subprocess exited when trying to invoke build_wheel
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Can't build VLLM wheel using VLLM docker image. #29669

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Can't build VLLM wheel using VLLM docker image. #29669

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions