Build failure due to CUDA version mismatch #129

WoosukKwon · 2023-05-26T04:35:42Z

I failed to build the system with the latest NVIDIA PyTorch docker image. The reason is PyTorch installed by pip is built with CUDA 11.7 while the container uses CUDA 12.1.

RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

The text was updated successfully, but these errors were encountered:

Joejoequ · 2023-06-29T03:03:08Z

Same Issue Here. It looks like it did not use CUDA 11.8 in the conda environmnent.
CUDA 11.8 Python 3.8.16 Nvidia A100 80G ubuntu

File "/tmp/pip-build-env-_5k66uxz/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.0) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

(vllm) x@x:~/xx$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

WoosukKwon · 2023-06-29T06:21:37Z

@Joejoequ Thanks for reporting it! I think in your case, the problem can be easily solved by installing CUDA 11.8 version of PyTorch:

pip3 install torch --index-url https://download.pytorch.org/whl/cu118

Joejoequ · 2023-06-29T08:14:35Z

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Joejoequ · 2023-07-06T02:43:13Z

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

mosheduminer · 2023-08-04T18:34:32Z

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.

The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

DavidPeleg6 · 2023-08-06T12:14:05Z

im also having the same issue:
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

and i have installed nightly version of pytorch with cuda 12.1 support

antonpolishko · 2023-08-27T09:15:05Z

nvcr.io/nvidia/pytorch:22.12-py3 image is the last one with CUDA 11.8 according to compatibility matrix. Then images switched to CUDA 12+

Ikkyu321 · 2023-11-02T08:11:36Z

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

@Joejoequ I got the same problem. Can you show me how to solve this？

valentin-fngr · 2023-11-02T09:48:39Z

I have the same problem. Pytorch using cuda 11.8 installed, yet I get this error when installing.
The detected CUDA version (12.1) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

Option 1 : You have cuda 12.1 therefore you should simply uninstall the current binaries of pytorch that you have and then reinstall it using :

pip3 install torch torchvision torchaudio

Option 2 : If you do not want to use the cuda 12.1 that you have installed, you can use another version of cuda (11.7, 11.8, ...).
First you need to uninstall your current cuda (https://stackoverflow.com/a/56827564) and then select the one you want on the cuda install website

xunfeng1980 · 2023-11-04T04:17:29Z

same problem:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

Kkkassini · 2023-11-07T13:12:11Z

Possible to use cuda 11.7? there's other services that require 11.7

Lumingous · 2023-11-08T13:11:55Z

I'm having the same problem. I've re-installed my pytorch to support cuda 11.8. Don't know why still shows this error

jaesuny · 2023-11-10T04:59:07Z

Removing pyproject.toml may be a solution.
In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

xunfeng1980 · 2023-11-11T08:32:51Z

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

think,it work for me.

StevenZ-G · 2023-11-19T17:57:21Z

RuntimeError:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.6). Please make sure to use the same CUDA versions.

quanhephia · 2023-11-20T11:41:17Z

RuntimeError: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.6). Please make sure to use the same CUDA versions.

I resolved this error by downgrading the version of vllm from 0.2.2 to 0.2.1

Mruduldhawley · 2023-12-05T11:09:37Z

@WoosukKwon Thanks for the quick response! But the issue still exists after using the command above. It is weird that the detected version is 12.0 since the cudatookit installed in the env is 11.8. Looks like it will access Cuda outside the env.

Finally i installed successfully by changing module env and use "module load" on linux server.

please elaborate.

0-hero · 2023-12-10T16:59:11Z

Changed env to run with CUDA 11.7
Install vllm with pip install vllm Without this step I face another error
Then install from source with pip install -e .

This solved it for me

DuAooo · 2023-12-25T15:55:00Z

For my problem, I found the the code used /usr/bin/nvcc which will print a different version
The right nvcc is in /usr/local/cuda/bin,
So I delete /usr/bin/nvcc, now my code works fine.

4daJKong · 2024-01-12T06:43:13Z

Possible to use cuda 11.7? there's other services that require 11.7

Is it possible to use vllm in cuda 11.7 ?
I tried pip install vllm in an conda environment, but still cannot install successfully.

DrAlexLiu · 2024-01-25T15:23:33Z

I solve this by running this:
conda install nvidia/label/cuda-11.8.0::cuda-nvcc

looks like your conda env has no nvcc installed, and it calls your system-based nvcc, which is not 11.8 or the cuda version you installed.

Daishijun · 2024-01-31T11:51:46Z

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

Thanks! You are a true hero.

hmellor · 2024-04-04T08:20:16Z

@WoosukKwon is this resolved now?

hmellor · 2024-04-20T12:10:10Z

Closing because the build system has changed dramatically since this was opened

SUMMARY: `github-action-benchmark` action, needs a JSON file with metrics for reporting. It throws an error when the JSON is empty or doesn't have any data. Bug: On the `remote push` benchmark job, we produce the necessary files, but they don't have any JSON data. Fix: Make the logging script skip file creation if the file is going to be empty. In the GHA side, add logic to skip processing when a desired file does not exist. Additional changes: - Rename `GHABenchmarkToolName` -> `BenchmarkMetricType` - Add a `Observation` BenchmarkMetricType - This could be useful in the near future when we discover volatile metrics. TEST PLAN: Jobs on this PR. Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

oJro · 2024-07-25T06:36:48Z

Removing pyproject.toml may be a solution. In my case, the build system was using the version of pytorch from pyproject.toml rather than the pytorch already installed.

how to remove pyproject.toml?

@iotamudelta

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters (vllm-project#114) * Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters * Adding HTTP headers * Add distributed executor backend to benchmark scripts (vllm-project#118) * Add weight padding for moe (vllm-project#119) * add weight padding for moe * enable padding by default * fix linter * fix linter * fix linter * using envs.py * fix linter * [BugFix] Fix navi build after many custom for MI kernels added (vllm-project#116) * fix navi build * Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime * replacing ifdefs on host code with those on kernels * refactoring code to avoid unsupported call on Navi * syntactic change * import statements fix * moving env variables to envs.py * style fixes * cosmetic changes for isort * remved extra include * moving use_skinny to be member --------- Co-authored-by: lcskrishna <lollachaitanya@gmail.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * add emtpy_cache() after each padding (vllm-project#120) * [FIX] Gradlib OOM on Navi and sometimes on MI (vllm-project#124) * add memory clean up after every shape and parameter to reduce cache invalidation buffers * small typo * syntax change --------- Co-authored-by: maleksan85 <maleksan@amd.com> * save shape when fp8 solution not found (vllm-project#123) Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Fix unit test for moe by adding padding (vllm-project#128) * fix test_moe * fix linter * Llama3.1 (vllm-project#129) * Add support for a rope extension method (vllm-project#6553) * [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693) --------- Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * chat/completions endpoint (vllm-project#121) * Initial implementation of chat/completions endpoint and its streaming variant * Reusing datatypes from the openai entrypoints * Response role from arg * Added models endpoint and model validation from the request * Optimize custom all reduce (vllm-project#130) * First version * Revert error. While there, add missing finalize. * Use the correct defaults for ROCm. Increase sampling area to capture crossover. * Scope end_sync as well. * Guard only volatile keyword for ifndef USE_ROCM * Document crossover * Add BF16 support to custom PA (vllm-project#133) * tightened atol for custom PA; enable supported head size, block sizes in testing * update num_blocks and num_iters in benchmark PA to realistic settings * move to generic b16 type * bf16 first port * enabled all bf16 tests, set atol for bf16 * enable custom PA for bf16 as well as block size 32 and head size 64 * fix cast to zero in custom PA reduce * py linter fixes * clang format fixes * div round up clang-format --------- Co-authored-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Making check for output match in original types. It saves some memory. (vllm-project#135) Co-authored-by: maleksan85 <maleksan@amd.com> * Make CAR ROCm 6.1 compatible. (vllm-project#137) * remove scoping * while there fix a typo * while there remove unused variable * Car revert (vllm-project#140) * Per @iotamudelta suggestion until the deadlocks issue is better understood Revert "Make CAR ROCm 6.1 compatible. (vllm-project#137)" This reverts commit 4d2dda6. * Per @iotamudelta suggestion until the deadlocks issue is better understood Revert "Optimize custom all reduce (vllm-project#130)" This reverts commit 636ff01. --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by: lcskrishna <lollachaitanya@gmail.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: iotamudelta <dieterich@ogolem.org> Co-authored-by: sanyalington <shomy.sanyal@amd.com>

* Re-enable FusedRoPE for Gaudi1 * add fallback impl of rope

@iotamudelta

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters (vllm-project#114) * Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters * Adding HTTP headers * Add distributed executor backend to benchmark scripts (vllm-project#118) * Add weight padding for moe (vllm-project#119) * add weight padding for moe * enable padding by default * fix linter * fix linter * fix linter * using envs.py * fix linter * [BugFix] Fix navi build after many custom for MI kernels added (vllm-project#116) * fix navi build * Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime * replacing ifdefs on host code with those on kernels * refactoring code to avoid unsupported call on Navi * syntactic change * import statements fix * moving env variables to envs.py * style fixes * cosmetic changes for isort * remved extra include * moving use_skinny to be member --------- Co-authored-by: lcskrishna <lollachaitanya@gmail.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * add emtpy_cache() after each padding (vllm-project#120) * [FIX] Gradlib OOM on Navi and sometimes on MI (vllm-project#124) * add memory clean up after every shape and parameter to reduce cache invalidation buffers * small typo * syntax change --------- Co-authored-by: maleksan85 <maleksan@amd.com> * save shape when fp8 solution not found (vllm-project#123) Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Fix unit test for moe by adding padding (vllm-project#128) * fix test_moe * fix linter * Llama3.1 (vllm-project#129) * Add support for a rope extension method (vllm-project#6553) * [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693) --------- Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * chat/completions endpoint (vllm-project#121) * Initial implementation of chat/completions endpoint and its streaming variant * Reusing datatypes from the openai entrypoints * Response role from arg * Added models endpoint and model validation from the request * Optimize custom all reduce (vllm-project#130) * First version * Revert error. While there, add missing finalize. * Use the correct defaults for ROCm. Increase sampling area to capture crossover. * Scope end_sync as well. * Guard only volatile keyword for ifndef USE_ROCM * Document crossover * Add BF16 support to custom PA (vllm-project#133) * tightened atol for custom PA; enable supported head size, block sizes in testing * update num_blocks and num_iters in benchmark PA to realistic settings * move to generic b16 type * bf16 first port * enabled all bf16 tests, set atol for bf16 * enable custom PA for bf16 as well as block size 32 and head size 64 * fix cast to zero in custom PA reduce * py linter fixes * clang format fixes * div round up clang-format --------- Co-authored-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Making check for output match in original types. It saves some memory. (vllm-project#135) Co-authored-by: maleksan85 <maleksan@amd.com> * Make CAR ROCm 6.1 compatible. (vllm-project#137) * remove scoping * while there fix a typo * while there remove unused variable * Car revert (vllm-project#140) * Per @iotamudelta suggestion until the deadlocks issue is better understood Revert "Make CAR ROCm 6.1 compatible. (vllm-project#137)" This reverts commit 4d2dda6. * Per @iotamudelta suggestion until the deadlocks issue is better understood Revert "Optimize custom all reduce (vllm-project#130)" This reverts commit 636ff01. * Using the correct datatypes for streaming non-chat completions (vllm-project#134) * Adding UNREACHABLE_CODE macro for non MI300 and MI250 cards (vllm-project#138) * Adding UNREACHABLE_CODE macro * clang format fixes * clang formatting fix * minor updates in syntax * clang format update * clang format fix one more try * clang format one more try * clang format fix one more try --------- Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> * gfx90a typo fix (vllm-project#142) Co-authored-by: maleksan85 <maleksan@amd.com> * wvsplitk templatized and better tuned for MI300 (vllm-project#132) * improvements to wvSpltK * wvsplt gemm; better handle MI300 and large A[] sizes * lint fix * Adjustments to better handle small weights in TP8. * early-out bug fix * better wave load balancing in wvSplt * add missing skip for wvsplt_big * Bug fix for wvSplt_big in load balancing at M4, lint fix. * [Bugfix] Dockerfile.rocm (vllm-project#141) * Dockerfile.rocm bug fix * naming preference --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Update test-template.j2 (vllm-project#145) * Adding Triton implementations awq_dequantize and awq_gemm to ROCm (vllm-project#136) * basic support for AWQ added * awq_dequantize implementation in Triton * awq_gemm implementation in Triton * unit tests in tests/kernels/test_awq_triton.py --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: Charlie Fu <Charlie.Fu@amd.com> Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by: lcskrishna <lollachaitanya@gmail.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: iotamudelta <dieterich@ogolem.org> Co-authored-by: sanyalington <shomy.sanyal@amd.com> Co-authored-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com> Co-authored-by: Zachary Streeter <90640993+zstreet87@users.noreply.github.com> Co-authored-by: omkar kakarparthi <75638701+okakarpa@users.noreply.github.com> Co-authored-by: rasmith <Randall.Smith@amd.com>

* Add support for a rope extension method (vllm-project#6553) * [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693) --------- Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon self-assigned this May 26, 2023

WoosukKwon added the installation Installation problems label May 26, 2023

zhuohan123 mentioned this issue Jun 25, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

jaesuny mentioned this issue Nov 11, 2023

Docker build fails due to CUDA version mismatch #1597

Closed

jaesuny mentioned this issue Nov 15, 2023

The detected CUDA version (11.8) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions. #1548

Closed

SuperBruceJia mentioned this issue Dec 6, 2023

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

Closed

ybgdgh mentioned this issue Jan 1, 2024

fail to install GroundingDINO IDEA-Research/Grounded-Segment-Anything#412

Open

jaesuny mentioned this issue Mar 28, 2024

[Bug]: vllm/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol error #3630

Closed

hmellor closed this as completed Apr 20, 2024

zigzagcai mentioned this issue Aug 15, 2024

[Feature] Support variable-length sequences for mamba block state-spaces/mamba#244

Open

Xaenalt pushed a commit to Xaenalt/vllm that referenced this issue Aug 15, 2024

Refactor & re-enable HPU RoPE for Gaudi1 (vllm-project#129)

19993b7

* Re-enable FusedRoPE for Gaudi1 * add fallback impl of rope

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build failure due to CUDA version mismatch #129

Build failure due to CUDA version mismatch #129

WoosukKwon commented May 26, 2023

Joejoequ commented Jun 29, 2023 •

edited

Loading

WoosukKwon commented Jun 29, 2023

Joejoequ commented Jun 29, 2023

Joejoequ commented Jul 6, 2023

mosheduminer commented Aug 4, 2023

DavidPeleg6 commented Aug 6, 2023

antonpolishko commented Aug 27, 2023 •

edited

Loading

Ikkyu321 commented Nov 2, 2023 •

edited

Loading

valentin-fngr commented Nov 2, 2023

xunfeng1980 commented Nov 4, 2023

Kkkassini commented Nov 7, 2023

Lumingous commented Nov 8, 2023

jaesuny commented Nov 10, 2023

xunfeng1980 commented Nov 11, 2023

StevenZ-G commented Nov 19, 2023

quanhephia commented Nov 20, 2023

Mruduldhawley commented Dec 5, 2023

0-hero commented Dec 10, 2023

DuAooo commented Dec 25, 2023

4daJKong commented Jan 12, 2024

DrAlexLiu commented Jan 25, 2024

Daishijun commented Jan 31, 2024

hmellor commented Apr 4, 2024

hmellor commented Apr 20, 2024

oJro commented Jul 25, 2024 •

edited

Loading

Build failure due to CUDA version mismatch #129

Build failure due to CUDA version mismatch #129

Comments

WoosukKwon commented May 26, 2023

Joejoequ commented Jun 29, 2023 • edited Loading

WoosukKwon commented Jun 29, 2023

Joejoequ commented Jun 29, 2023

Joejoequ commented Jul 6, 2023

mosheduminer commented Aug 4, 2023

DavidPeleg6 commented Aug 6, 2023

antonpolishko commented Aug 27, 2023 • edited Loading

Ikkyu321 commented Nov 2, 2023 • edited Loading

valentin-fngr commented Nov 2, 2023

xunfeng1980 commented Nov 4, 2023

Kkkassini commented Nov 7, 2023

Lumingous commented Nov 8, 2023

jaesuny commented Nov 10, 2023

xunfeng1980 commented Nov 11, 2023

StevenZ-G commented Nov 19, 2023

quanhephia commented Nov 20, 2023

Mruduldhawley commented Dec 5, 2023

0-hero commented Dec 10, 2023

DuAooo commented Dec 25, 2023

4daJKong commented Jan 12, 2024

DrAlexLiu commented Jan 25, 2024

Daishijun commented Jan 31, 2024

hmellor commented Apr 4, 2024

hmellor commented Apr 20, 2024

oJro commented Jul 25, 2024 • edited Loading

Joejoequ commented Jun 29, 2023 •

edited

Loading

antonpolishko commented Aug 27, 2023 •

edited

Loading

Ikkyu321 commented Nov 2, 2023 •

edited

Loading

oJro commented Jul 25, 2024 •

edited

Loading