Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 2080ti 22G * 4, vllm=0.4.0 startup error #3764

Closed
chuanzhubin-aiopx opened this issue Apr 1, 2024 · 28 comments
Closed

[Bug]: 2080ti 22G * 4, vllm=0.4.0 startup error #3764

chuanzhubin-aiopx opened this issue Apr 1, 2024 · 28 comments
Labels
bug Something isn't working

Comments

@chuanzhubin-aiopx
Copy link

Your current environment

The output of `python collect_env.py`
# For security purposes, please feel free to check the contents of collect_env.py before running it.
python collect_env.py
--2024-04-01 08:16:40--  https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8002::154, 2606:50c0:8001::154, ...
正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 24853 (24K) [text/plain]
正在保存至: ‘collect_env.py’

collect_env.py                     100%[===============================================================>]  24.27K   110KB/s    用时 0.2s  

2024-04-01 08:16:41 (110 KB/s) - 已保存 ‘collect_env.py’ [24853/24853])

Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.0
Libc version: glibc-2.35

Python version: 3.9.19 (main, Mar 21 2024, 17:11:28)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 2080 Ti
GPU 1: NVIDIA GeForce RTX 2080 Ti
GPU 2: NVIDIA GeForce RTX 2080 Ti
GPU 3: NVIDIA GeForce RTX 2080 Ti

Nvidia driver version: 535.161.07
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
架构:                              x86_64
CPU 运行模式:                      32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
字节序:                            Little Endian
CPU:                                56
在线 CPU 列表:                     0-55
厂商 ID:                           GenuineIntel
型号名称:                          Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
CPU 系列:                          6
型号:                              79
每个核的线程数:                    2
每个座的核数:                      14
座:                                2
步进:                              1
CPU 最大 MHz:                      3300.0000
CPU 最小 MHz:                      1200.0000
BogoMIPS:                          4800.24
标记:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
虚拟化:                            VT-x
L1d 缓存:                          896 KiB (28 instances)
L1i 缓存:                          896 KiB (28 instances)
L2 缓存:                           7 MiB (28 instances)
L3 缓存:                           70 MiB (2 instances)
NUMA 节点:                         2
NUMA 节点0 CPU:                    0-13,28-41
NUMA 节点1 CPU:                    14-27,42-55
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT vulnerable

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.1.2                    pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.4.0
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV2     SYS     SYS     0-13,28-41      0               N/A
GPU1    NV2      X      SYS     SYS     0-13,28-41      0               N/A
GPU2    SYS     SYS      X      NV2     14-27,42-55     1               N/A
GPU3    SYS     SYS     NV2      X      14-27,42-55     1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

INFO 04-01 08:19:19 model_runner.py:104] Loading model weights took 0.2389 GB
INFO 04-01 08:19:20 gpu_executor.py:94] # GPU blocks: 33910, # CPU blocks: 7281
INFO 04-01 08:19:23 model_runner.py:791] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-01 08:19:23 model_runner.py:795] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Traceback (most recent call last):
  File "/home/czb/src/czb/vllm_test.py", line 11, in <module>
    llm = LLM(model="facebook/opt-125m")
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 112, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 192, in from_engine_args
    engine = cls(
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 111, in __init__
    self.model_executor = executor_class(model_config, cache_config,
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 40, in __init__
    self._init_cache()
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 107, in _init_cache
    self.driver_worker.warm_up_model()
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/worker/worker.py", line 167, in warm_up_model
    self.model_runner.capture_model(self.gpu_cache)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 854, in capture_model
    graph_runner.capture(
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 906, in capture
    self.model(
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/opt.py", line 301, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/opt.py", line 276, in forward
    return self.decoder(input_ids, positions, kv_caches, attn_metadata)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/opt.py", line 250, in forward
    hidden_states = layer(hidden_states, kv_caches[i], attn_metadata)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/opt.py", line 158, in forward
    hidden_states = self.self_attn(hidden_states=hidden_states,
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/models/opt.py", line 103, in forward
    output, _ = self.out_proj(attn_output)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/vllm/model_executor/layers/linear.py", line 580, in forward
    output = output_ + self.bias if self.bias is not None else output_
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@chuanzhubin-aiopx chuanzhubin-aiopx added the bug Something isn't working label Apr 1, 2024
@youkaichao
Copy link
Member

Can you try to execute some small piece of code to verify cuda/torch is working?

e.g.

import torch
data = torch.randn(555).cuda()
print(data.sum().cpu().item())

@youkaichao
Copy link
Member

In addition, you can try to build from source: https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source . See if it works.

@youkaichao
Copy link
Member

In case you met errors, you can change the following line:

set(CUDA_SUPPORTED_ARCHS "7.0;7.5;8.0;8.6;8.9;9.0")

to set(CUDA_SUPPORTED_ARCHS "7.5") in your case.

@chuanzhubin-aiopx
Copy link
Author

cuda runs normally. The key issue is that it runs normally in vllm==0.3.3. The problem only arises after upgrading to vllm==0.4.0.

@youkaichao
Copy link
Member

Yes, the released binary wheel might not contain the cuda kernel for your GPU model. If you can help to build from source to verify it, it would be very helpful.

@chuanzhubin-aiopx
Copy link
Author

Building wheels for collected packages: vllm
  Building editable for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building editable for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [182 lines of output]
      /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      running editable_wheel
      creating /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info
      writing /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/requires.txt
      writing top-level names to /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/top_level.txt
      writing manifest file '/tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/SOURCES.txt'
      reading manifest file '/tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm.egg-info/SOURCES.txt'
      creating '/tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm-0.3.3+cu115.dist-info'
      creating /tmp/pip-wheel-6u3kmrmd/.tmp-bbqz_vwu/vllm-0.3.3+cu115.dist-info/WHEEL
      running build_py
      running build_ext
      Using MAX_JOBS=6 as the number of jobs.
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: RelWithDebInfo
      -- Found Python: /home/czb/miniconda3/envs/test/bin/python (found version "3.9.19") found components: Interpreter Development.Module
      -- Found python matching: /home/czb/miniconda3/envs/test/bin/python.
      -- Found CUDA: /usr (found version "11.5")
      -- The CUDA compiler identification is NVIDIA 11.5.119
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Found CUDAToolkit: /usr/include (found version "11.5.119")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Caffe2: CUDA detected: 11.5
      -- Caffe2: CUDA nvcc is: /usr/bin/nvcc
      -- Caffe2: CUDA toolkit directory: /usr
      -- Caffe2: Header version is: 11.5
      -- /usr/lib/x86_64-linux-gnu/libnvrtc.so shorthash is 65f2c18b
      -- USE_CUDNN is set to 0. Compiling without cuDNN support
      -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
      -- Autodetected CUDA architecture(s):  7.5 7.5 7.5 7.5
      -- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
      CMake Warning at /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
        CMakeLists.txt:64 (find_package)
      
      
      -- Found Torch: /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/lib/libtorch.so
      -- CUDA supported arches: 7.5
      -- CUDA target arches: 75
      -- Punica target arches:
      CMake Warning at CMakeLists.txt:249 (message):
        Unable to create _punica_C target because none of the requested
        architectures (75) are supported, i.e.  >= 8.0
      
      
      -- Enabling C extension.
      -- Enabling moe extension.
      -- Configuring done (9.3s)
      -- Generating done (0.0s)
      -- Build files have been written to: /tmp/tmpiwywa7u0.build-temp
      Using MAX_JOBS=6 as the number of jobs.
      [1/3] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/moe_ops.cpp.o
      [2/3] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      FAILED: CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      /usr/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_moe_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_moe_C_EXPORTS -I/home/czb/git/vllm/csrc -isystem /home/czb/miniconda3/envs/test/include/python3.9 -isystem /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/include -isystem /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_75,code=[compute_75,sm_75]" -Xcompiler=-fPIC -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o -MF CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o.d -x cu -c /home/czb/git/vllm/csrc/moe/topk_softmax_kernels.cu -o CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
      /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
        435 |         function(_Functor&& __f)
            |                                                                                                                                                 ^
      /usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
      /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
        530 |         operator=(_Functor&& __f)
            |                                                                                                                                                  ^
      /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "<string>", line 184, in build_extensions
        File "/home/czb/miniconda3/envs/test/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '6']' returned non-zero exit status 1.
      /tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
      !!
      
              ********************************************************************************
              An error happened while installing `vllm` in editable mode.
      
              The following steps are recommended to help debug this problem:
      
              - Try to install the project normally, without using the editable mode.
                Does the error still persist?
                (If it does, try fixing the problem before attempting the editable mode).
              - If you are using binary extensions, make sure you have all OS-level
                dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
              - Try the latest version of setuptools (maybe the error was already fixed).
              - If you (or your project dependencies) are using any setuptools extension
                or customization, make sure they support the editable mode.
      
              After following the steps above, if the problem still persists and
              you think this is related to how setuptools handles editable installations,
              please submit a reproducible example
              (see https://stackoverflow.com/help/minimal-reproducible-example) to:
      
                  https://github.com/pypa/setuptools/issues
      
              See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
              ********************************************************************************
      
      !!
        cmd_obj.run()
      Traceback (most recent call last):
        File "/home/czb/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/czb/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/czb/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
          return hook(wheel_directory, config_settings, metadata_directory)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 443, in build_editable
          return self._build_with_temp_dir(
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 349, in <module>
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-e4vgc7fl/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "<string>", line 184, in build_extensions
        File "/home/czb/miniconda3/envs/test/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '6']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects```

`pip install -e .`
Compilation error.   
Or I can give you an online environment with a 2080ti, and you can try it out.
Many of our community members are using the 2080ti for learning and experiments because this card is affordable and relatively easy to obtain. 
###We are looking forward to vllm continuing to support this model.###

@youkaichao
Copy link
Member

In case you met errors, you can change the following line:

set(CUDA_SUPPORTED_ARCHS "7.0;7.5;8.0;8.6;8.9;9.0")

to set(CUDA_SUPPORTED_ARCHS "7.5") in your case.

Please try this.

@chuanzhubin-aiopx
Copy link
Author

7.5 has already been set. Reporting this error.

@youkaichao
Copy link
Member

7.5 is supported. There is a bug in our new cmake build system which leads to your failure. Hopefully it will be solved in this week. Please stay tuned.

@chuanzhubin-aiopx
Copy link
Author

Eagerly anticipating. Thank you for your support.

@lzcchl
Copy link

lzcchl commented Apr 1, 2024

same error, happy to see this! but my device is GTX1060, which compute capability is 6.1, is that be supported? both vllm 0.3.3 and 0.4.0 not work.

@lzcchl
Copy link

lzcchl commented Apr 1, 2024

I test on GTX1660 supper, which compute capability is 7.5, version 0.3.3 works, version 0.4.0 don't work and get same error.

@youkaichao
Copy link
Member

@lzcchl there is one problem: your installation of pytorch is for cuda 12.1, but your cuda version is 11.5 .

CUDA used to build PyTorch: 12.1
CUDA runtime version: 11.5.119

Please update your cuda toolkit version at https://developer.nvidia.com/cuda-toolkit-archive .

@youkaichao
Copy link
Member

It is recommended to have cuda 12.1, because then you can install torch seamlessly. Otherwise you have to install a specific torch version, which is again very difficult.

@chuanzhubin
Copy link
Contributor

It is recommended to have cuda 12.1, because then you can install torch seamlessly. Otherwise you have to install a specific torch version, which is again very difficult.

Thank you for pointing out the issue, I have already upgraded the cuda-toolkit version to 12.1. However, it seems unrelated to the current compilation error, and the same error is still being reported.

@GennVa
Copy link

GennVa commented Apr 2, 2024

Same error here with version 0.4.0, using cuda 12.1 and a T4 GPU. It worked with v0.3.3.

@chuanzhubin
Copy link
Contributor

Same error here with version 0.4.0, using cuda 12.1 and a T4 GPU. It worked with v0.3.3.

The moon rises over the sea, from afar we share this moment. 海上生明月,天涯共此时,很高兴看到老外也报错.

@youkaichao
Copy link
Member

@chuanzhubin @GennVa we just release v0.4.0.post1 to add support for sm 70 and sm75, can you have a try? It's in https://github.com/vllm-project/vllm/releases/tag/v0.4.0.post1

@chuanzhubin
Copy link
Contributor

chuanzhubin commented Apr 2, 2024

I'm glad to receive the new tag, but it seems that there is still an error when running pip install -e ..

Requirement already satisfied: rpds-py>=0.7.1 in /home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages (from jsonschema->outlines==0.0.34->vllm==0.4.0.post1) (0.18.0)
Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages (from numba->outlines==0.0.34->vllm==0.4.0.post1) (0.42.0)
Requirement already satisfied: mpmath>=0.19 in /home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages (from sympy->torch==2.1.2->vllm==0.4.0.post1) (1.3.0)
Requirement already satisfied: sniffio>=1.1 in /home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages (from anyio<5,>=3.4.0->starlette<0.37.0,>=0.36.3->fastapi->vllm==0.4.0.post1) (1.3.1)
Requirement already satisfied: exceptiongroup>=1.0.2 in /home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages (from anyio<5,>=3.4.0->starlette<0.37.0,>=0.36.3->fastapi->vllm==0.4.0.post1) (1.2.0)
Using cached outlines-0.0.34-py3-none-any.whl (76 kB)
Using cached tiktoken-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
Using cached cmake-3.29.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.6 MB)
Using cached py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Building wheels for collected packages: vllm
  Building editable for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building editable for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [146 lines of output]
      /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      running editable_wheel
      creating /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info
      writing /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/requires.txt
      writing top-level names to /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/top_level.txt
      writing manifest file '/tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/SOURCES.txt'
      reading manifest file '/tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm.egg-info/SOURCES.txt'
      creating '/tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm-0.4.0.post1.dist-info'
      creating /tmp/pip-wheel-o6yooq9i/.tmp-v_7t8uzs/vllm-0.4.0.post1.dist-info/WHEEL
      running build_py
      running build_ext
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: RelWithDebInfo
      -- Target device: cuda
      -- Found Python: /home/czb/miniconda3/envs/vllm/bin/python (found version "3.9.19") found components: Interpreter Development.Module
      -- Found python matching: /home/czb/miniconda3/envs/vllm/bin/python.
      -- Found CUDA: /usr/local/cuda (found version "12.1")
      -- The CUDA compiler identification is unknown
      CMake Error at /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCUDACompiler.cmake:266 (message):
        Failed to detect a default CUDA architecture.
      
      
      
        Compiler output:
      
      Call Stack (most recent call first):
        /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:47 (enable_language)
        /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
        /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
        CMakeLists.txt:67 (find_package)
      
      
      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "<string>", line 177, in build_extensions
        File "<string>", line 160, in configure
        File "/home/czb/miniconda3/envs/vllm/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/home/czb/git/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/tmpqlt3550g.build-lib/vllm', '-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=/tmp/tmpmqw6b4g3.build-temp', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/home/czb/miniconda3/envs/vllm/bin/python', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=56']' returned non-zero exit status 1.
      /tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
      !!
      
              ********************************************************************************
              An error happened while installing `vllm` in editable mode.
      
              The following steps are recommended to help debug this problem:
      
              - Try to install the project normally, without using the editable mode.
                Does the error still persist?
                (If it does, try fixing the problem before attempting the editable mode).
              - If you are using binary extensions, make sure you have all OS-level
                dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
              - Try the latest version of setuptools (maybe the error was already fixed).
              - If you (or your project dependencies) are using any setuptools extension
                or customization, make sure they support the editable mode.
      
              After following the steps above, if the problem still persists and
              you think this is related to how setuptools handles editable installations,
              please submit a reproducible example
              (see https://stackoverflow.com/help/minimal-reproducible-example) to:
      
                  https://github.com/pypa/setuptools/issues
      
              See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
              ********************************************************************************
      
      !!
        cmd_obj.run()
      Traceback (most recent call last):
        File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/czb/miniconda3/envs/vllm/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
          return hook(wheel_directory, config_settings, metadata_directory)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 443, in build_editable
          return self._build_with_temp_dir(
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 364, in <module>
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 155, in run
          self._create_wheel_file(bdist_wheel)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
          files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
          self._run_build_subcommands()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
          self.run_command(name)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-70jncsaw/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "<string>", line 177, in build_extensions
        File "<string>", line 160, in configure
        File "/home/czb/miniconda3/envs/vllm/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/home/czb/git/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/tmpqlt3550g.build-lib/vllm', '-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=/tmp/tmpmqw6b4g3.build-temp', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/home/czb/miniconda3/envs/vllm/bin/python', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=56']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects

@youkaichao
Copy link
Member

You can download and use the attached wheel, no need to install from source.

Your build fails because your environment is not set up correctly. See https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source :

If you don’t want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from the official website. After installation, set the environment variable CUDA_HOME to the installation path of CUDA Toolkit, and make sure that the nvcc compiler is in your PATH, e.g.:

@chuanzhubin
Copy link
Contributor

The installed [vllm-0.4.0.post1+cu118-cp39-cp39-manylinux1_x86_64.whl](https://github.com/vllm-project/vllm/releases/download/v0.4.0.post1/vllm-0.4.0.post1+cu118-cp39-cp39-manylinux1_x86_64.whl) but the output is garbled during inference.

@chuanzhubin
Copy link
Contributor

python -c 'import torch
print(torch.version.cuda)'
12.1

I guess that need to downgrade CUDA to 11.8

@chuanzhubin
Copy link
Contributor

After the experiment, inference under CUDA 11.8 and CUDA 12.1 will result in garbled text. There is a little bit of frustration.
Still looking forward to finally solving this problem

@youkaichao
Copy link
Member

When you are installing vllm with cuda 11.8, you need to specify pytorch index so that you install proper version of pytorch, as specified in the documentation:

# Install vLLM with CUDA 11.8.
export VLLM_VERSION=0.4.0
export PYTHON_VERSION=39
pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118

Note that the last line is very long, and there is --extra-index-url https://download.pytorch.org/whl/cu118 in the end.

If your model's output is not as expected, that should be a separate issue, and you need to file another issue with details on your environment and model.

@chuanzhubin
Copy link
Contributor

Today I did not give up, and finally compiled and installed successfully, and got the fragrant qwen-moe.

@chuanzhubin
Copy link
Contributor

chuanzhubin commented Apr 3, 2024

安装NVIDIA驱动和CUDA Toolkit

1. 安装NVIDIA驱动

  • 版本:530
  • 安装方式:通过Ubuntu的图形界面进行安装。

2. 安装CUDA Toolkit

  • 版本:12.1
  • 安装方式:从NVIDIA官方下载并安装。CUDA Toolkit 自带NVIDIA驱动, 但我已近手动安装过了, 所以在安装过程中把驱动勾去掉.
  • 下载链接:CUDA 12.1.1 Download Archive

配置环境变量

添加CUDA Toolkit到PATH

为了确保nvcc编译器可以被系统找到,需要将其路径添加到环境变量PATH中:

export PATH=$PATH:/usr/local/cuda-12.1/bin

请确保将/usr/local/cuda-12.1/bin替换为实际CUDA Toolkit安装路径的bin目录。

安装VLLM

1. 切换到VLLM的特定版本

  • 版本:0.4.0.post1
  • 切换方式:通过GitHub访问特定版本标签页。
  • GitHub链接:vllm 0.4.0.post1 Release

2. 安装VLLM

使用pip进行安装,并确保以可编辑模式安装:

pip install -e .

可选步骤:修改VLLM的CMakeLists.txt

如果在安装过程中遇到问题,可能需要修改vllm/CMakeLists.txt文件中的某些配置。一个可能的修改是设置CUDA支持的架构:

set(CUDA_SUPPORTED_ARCHS "7.5")

请注意,这个修改可能不是必要的,具体取决于您的系统配置和需求。

@chuanzhubin
Copy link
Contributor

@youkaichao Thank you for your professional and patient help

@GennVa
Copy link

GennVa commented Apr 3, 2024

@chuanzhubin @GennVa we just release v0.4.0.post1 to add support for sm 70 and sm75, can you have a try? It's in https://github.com/vllm-project/vllm/releases/tag/v0.4.0.post1

@youkaichao v0.4.0.post1 working for me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants