Upgrade to CUDA 12 #1527

zhuohan123 · 2023-10-31T22:20:10Z

I can successfully build vllm with:

CUDA 12.2
Pytorch 2.1.0 (built with CUDA 12.1)
xformers 0.0.22.post7

Note that vllm cannot be successfully built on CUDA 12.1 because of this issue. This can be solved after pytorch upgrade its pybind dependecy.

.github/workflows/publish.yml

Tostino · 2023-11-01T09:29:27Z

I will double check, but pretty sure I built with 12.1 last week.

Will edit when I have a chance to look at my changes.

Edit:

(venv) user@pop-os:/media/user/Data/IdeaProjects/vllm$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
(venv) user@pop-os:/media/user/Data/IdeaProjects/vllm$ pip freeze
aiosignal==1.3.1
anyio==3.7.1
astroid==2.6.6
attrs==23.1.0
certifi==2023.7.22
charset-normalizer==3.3.1
click==8.1.7
cmake==3.27.7
dill==0.3.7
exceptiongroup==1.1.3
fastapi==0.104.0
filelock==3.12.4
frozenlist==1.4.0
fsspec==2023.10.0
h11==0.14.0
httptools==0.6.1
huggingface-hub==0.17.3
idna==3.4
importlib-metadata==6.8.0
iniconfig==2.0.0
isort==5.12.0
Jinja2==3.1.2
jsonschema==4.19.1
jsonschema-specifications==2023.7.1
lazy-object-proxy==1.9.0
lit==17.0.3
MarkupSafe==2.1.3
mccabe==0.6.1
mpmath==1.3.0
msgpack==1.0.7
mypy==0.991
mypy-extensions==1.0.0
networkx==3.2
ninja==1.11.1.1
numpy==1.26.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.52
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
packaging==23.2
pandas==2.1.2
platformdirs==3.11.0
pluggy==1.3.0
protobuf==4.24.4
psutil==5.9.6
py==1.11.0
pyarrow==13.0.0
pydantic==1.10.13
pylint==2.8.2
pytest==7.4.3
pytest-asyncio==0.21.1
pytest-forked==1.6.0
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
PyYAML==6.0.1
ray==2.7.1
referencing==0.30.2
regex==2023.10.3
requests==2.31.0
rpds-py==0.10.6
safetensors==0.4.0
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
sympy==1.12
tokenizers==0.14.1
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.1
torch==2.1.0
tqdm==4.66.1
transformers==4.34.1
triton==2.1.0
types-PyYAML==6.0.12.12
types-requests==2.31.0.10
types-setuptools==68.2.0.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.7
uvicorn==0.23.2
uvloop==0.19.0
-e git+https://github.com/Tostino/vllm.git@11409d16190f420a7cb6254b4cea8b4594d6ec4e#egg=vllm
watchfiles==0.21.0
websockets==12.0
wrapt==1.12.1
xformers==0.0.22.post7
yapf==0.32.0
zipp==3.17.0
(venv) user@pop-os:/media/user/Data/IdeaProjects/vllm$

WoosukKwon · 2023-11-03T21:08:05Z

I think we have to check the following items before the release:

Does our GitHub workflow work for CUDA 12.2?
If you compile our package with CUDA 12.2, does it work when the user’s NVIDIA driver version is < 12.2 (e.g., like in colab)?
Bump up the NVIDIA PyTorch docker image version.

casper-hansen · 2023-11-07T19:27:11Z

Hi vLLM maintainers. I suggest maintaining compatibility with torch 2.0.1 and CUDA 11.8.0 for a few more versions. The way this would work is that you create two versions of the wheel:

PyPi: torch 2.1.0 and CUDA 12.1.1 wheel
GitHub release: additional torch 2.0.1 and CUDA 11.8.0 wheel

The idea is that people can still install vLLM from the GitHub release if they do not have the latest CUDA version yet. This can be achieved by the following:

VLLM_VERSION = "0.2.2"
PYPI_BUILD = os.getenv("PYPI_BUILD", "0") == "1"

if not PYPI_BUILD:
    try:
        CUDA_VERSION = "".join(os.environ.get("CUDA_VERSION", torch.version.cuda).split("."))[:3]
        VLLM_VERSION += f"+cu{CUDA_VERSION}"
    except Exception as ex:
        raise RuntimeError("Your system must have an Nvidia GPU for installing vLLM")

In the GitHub workflow, add a conditional on which and check if the current CUDA version being used to build is the same as the one you want to release on PyPi.

if ( $env:CUDA_VERSION -eq $env:PYPI_CUDA_VERSION ){
    $env:PYPI_BUILD = 1
}

For reference, you can look at setup.py and build.yaml in AutoAWQ.
https://github.com/casper-hansen/AutoAWQ/blob/main/.github/workflows/build.yaml

requirements.txt

WoosukKwon · 2023-11-08T04:21:22Z

@zhuohan123 Let's merge this PR for the ease of development?

TODOs (in the next PR):

Update Dockerfile
Update the NVIDIA PyTorch docker image version in installation.rst
Build vllm+cu118 for release (Support both CUDA 11.8 and CUDA 12.1 #1589)

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Upgrade to CUDA 12

3020d58

zhuohan123 requested a review from WoosukKwon October 31, 2023 22:20

WoosukKwon reviewed Oct 31, 2023

View reviewed changes

.github/workflows/publish.yml Outdated Show resolved Hide resolved

WoosukKwon mentioned this pull request Nov 2, 2023

[v0.2.2] Release Tracker #1551

Closed

3 tasks

WoosukKwon reviewed Nov 7, 2023

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

aarnphm mentioned this pull request Nov 7, 2023

build: Lock vLLM to newer version with CUDA 12 bentoml/OpenLLM#572

Closed

WoosukKwon added 4 commits November 8, 2023 04:07

xformers >= 0.0.22.post7

87886d0

MAX_JOBS=1 to avoid OOM

cb497f1

Fix CUDA version to 12.1

40fdeaa

Log gcc/g++/c++ versions

fc6c831

WoosukKwon mentioned this pull request Nov 8, 2023

Support both CUDA 11.8 and CUDA 12.1 #1589

Closed

WoosukKwon approved these changes Nov 8, 2023

View reviewed changes

zhuohan123 merged commit 06458a0 into main Nov 8, 2023
2 checks passed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Nov 13, 2023

Upgrade to CUDA 12 (vllm-project#1527)

f27e416

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon deleted the upgrade-to-cuda-12 branch November 17, 2023 00:29

D-Octopus mentioned this pull request Nov 28, 2023

ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory #1369

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Upgrade to CUDA 12 (vllm-project#1527)

4790eed

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Upgrade to CUDA 12 (vllm-project#1527)

50ad084

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to CUDA 12 #1527

Upgrade to CUDA 12 #1527

zhuohan123 commented Oct 31, 2023

Tostino commented Nov 1, 2023 •

edited

WoosukKwon commented Nov 3, 2023

casper-hansen commented Nov 7, 2023 •

edited

WoosukKwon commented Nov 8, 2023 •

edited

Upgrade to CUDA 12 #1527

Upgrade to CUDA 12 #1527

Conversation

zhuohan123 commented Oct 31, 2023

Tostino commented Nov 1, 2023 • edited

WoosukKwon commented Nov 3, 2023

casper-hansen commented Nov 7, 2023 • edited

WoosukKwon commented Nov 8, 2023 • edited

Tostino commented Nov 1, 2023 •

edited

casper-hansen commented Nov 7, 2023 •

edited

WoosukKwon commented Nov 8, 2023 •

edited