Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

Closed
zhudy opened this issue Dec 11, 2023 · 33 comments · Fixed by #8713
Closed

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

zhudy opened this issue Dec 11, 2023 · 33 comments · Fixed by #8713

Comments

@zhudy
Copy link

zhudy commented Dec 11, 2023

do as: https://docs.vllm.ai/en/latest/getting_started/installation.html

  1. docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
  2. git clone https://github.com/vllm-project/vllm.git
  3. cd vllm
  4. pip install -e .

here is the details in side the docker instance:
root@f8c2e06fbf8b:/mnt/vllm# pip install -e .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///mnt/vllm
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... error
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
:142: UserWarning: Unsupported CUDA/ROCM architectures ({'6.1', '7.2', '8.7', '5.2', '6.0'}) are excluded from the TORCH_CUDA_ARCH_LIST env variable (5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX). Supported CUDA/ROCM architectures are: {'7.5', '8.0', '9.0', '7.0', '8.6+PTX', '9.0+PTX', '8.6', '8.0+PTX', '8.9+PTX', '8.9', '7.0+PTX', '7.5+PTX'}.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable
return hook(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 441, in get_requires_for_build_editable
return self.get_requires_for_build_wheel(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 297, in
File "", line 267, in get_vllm_version
NameError: name 'nvcc_cuda_version' is not defined. Did you mean: 'cuda_version'?
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python -m pip install --upgrade pip

@zhudy
Copy link
Author

zhudy commented Dec 11, 2023

Actually, the nvcc is ok to run as these:

root@f8c2e06fbf8b:/mnt/vllm# nvcc -v
nvcc fatal : No input files specified; use option --help for more information
root@f8c2e06fbf8b:/mnt/vllm# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:10:07_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

@zhudy
Copy link
Author

zhudy commented Dec 11, 2023

there is cuda:

root@f8c2e06fbf8b:/mnt/vllm# echo $CUDA_HOME
/usr/local/cuda

root@f8c2e06fbf8b:/mnt/vllm# type nvcc
nvcc is /usr/local/cuda/bin/nvcc

github.com/vllm# python3 -c "import torch; print(torch.cuda.is_available()); print(torch.version);"
True
2.1.0a0+32f93b1

@yexing
Copy link

yexing commented Dec 13, 2023

add

nvcc_cuda_version = get_nvcc_cuda_version(CUDA_HOME) 

to setup.py at line 268

@cyc00518
Copy link

cyc00518 commented Feb 22, 2024

@yexing @zhudy
Excuse me. I face the same problem.
I cloned vllm into my project.
and add
nvcc_cuda_version = get_nvcc_cuda_version(CUDA_HOME)
to setup.py at line 268

But still have same problem. Did I mislead something?

@Wetzr
Copy link

Wetzr commented Mar 4, 2024

I have the same problem and would be glad if there would be any help.
Setup:
Aarch64 GH200
OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
nvcc: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_11:03:34_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
cuda home: /usr/local/cuda
Torch: 2.2.0a0+81ea7a4

I am running inside the nvidia pytorch_23.12 Container.

@haileyschoelkopf
Copy link

Got it working with the changes in this branch: https://github.com/haileyschoelkopf/vllm/tree/aarm64-dockerfile , with built dockerfiles here: https://hub.docker.com/r/haileysch/vllm-aarch64-base https://hub.docker.com/r/haileysch/vllm-aarch64-openai hopefully this'll be helpful to others!

@tuanhe
Copy link

tuanhe commented Mar 29, 2024

do as: https://docs.vllm.ai/en/latest/getting_started/installation.html

  1. docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
  2. git clone https://github.com/vllm-project/vllm.git
  3. cd vllm
  4. pip install -e .

here is the details in side the docker instance: root@f8c2e06fbf8b:/mnt/vllm# pip install -e . Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///mnt/vllm Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... error error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully. │ exit code: 1 ╰─> [22 lines of output] /tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' :142: UserWarning: Unsupported CUDA/ROCM architectures ({'6.1', '7.2', '8.7', '5.2', '6.0'}) are excluded from the TORCH_CUDA_ARCH_LIST env variable (5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX). Supported CUDA/ROCM architectures are: {'7.5', '8.0', '9.0', '7.0', '8.6+PTX', '9.0+PTX', '8.6', '8.0+PTX', '8.9+PTX', '8.9', '7.0+PTX', '7.5+PTX'}. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable return hook(config_settings) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 441, in get_requires_for_build_editable return self.get_requires_for_build_wheel(config_settings) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup exec(code, locals()) File "", line 297, in File "", line 267, in get_vllm_version NameError: name 'nvcc_cuda_version' is not defined. Did you mean: 'cuda_version'? [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.2.1 -> 23.3.1 [notice] To update, run: python -m pip install --upgrade pip

HI, guys , had you solved the issue ?

@cyc00518
Copy link

cyc00518 commented Jun 6, 2024

@tuanhe
Still face same problem,
Anyone know vllm support aarch-64 now?

@drikster80
Copy link

Had a similar problem on the GH200 (aarch64 Grace CPU).
Similar to @haileyschoelkopf, I updated the Dockerfile and requirements to work with v0.5.1. Here is the forked version:
https://github.com/drikster80/vllm/tree/gh200-docker

Main issues that needed to be overcome:

  • Use Nvidia's Pytorch container due to PyTorch not supporting ARM64. (specifically nvcr.io/nvidia/pytorch:24.04-py3 to ensure PyTorch 2.3 build and latest optimizations (e.g. Lightning-Thunder). Release Notes for 24.04-py3
  • xformers hangs on pip install. Not sure why (maybe just taking forever to compile?)
  • Triton needs to be installed from source
  • vllm-flash-attn needs to be built from source
  • Comment out "torch", "xformers", and "vllm-flash-attn" in requirements files (handling that in the Dockerfile directly).

For future updating, you can see the changes here: drikster80@359fd4f

@ZihaoZhou
Copy link

ZihaoZhou commented Jul 19, 2024

Thank you all.

I have built the image using the script provided by @drikster80, and it takes about 12 hours (most of the time is spent for mamba builder and xformers). So to save time for others, I have made the image public at https://hub.docker.com/r/zihaokevinzhou/vllm-aarch64-openai . I have validated it works well for my personal hosting of fp8 quantized version of llama-3-70b.

@drikster80
Copy link

@ZihaoZhou, thank you.

It normally only takes ~80 min on my system. 12 hrs seems excessive. I'm working on an update for v0.5.2, but haven't gotten the new flash-infer to build yet. I'll update the script when that's solved and post back here.

I haven't been uploading since the container is ~33GB. It looks like the one you uploaded is 13GB? Is that just from native compression? I'm sure there are some ways to cut it down (e.g. remove some of the build artifacts from the last image?).

@cyc00518
Copy link

@ZihaoZhou
You should have appeared earlier!

@drikster80
In fact, I am also using GH200, and today I used yours forked version to build it.

The step that took me the most time was:
RUN python3 setup.py bdist_wheel --dist-dir=dist which took a total of 40 minutes.
Installing Triton also took a very long time.

Additionally, for the xformers part, I spent an entire afternoon, and it also seemed to be stuck there.
So in the end, I commented out this part.

Now, vllm is successfully running on GH200, thanks to your selfless contribution!

May I ask, regarding the Docker image on aarch64, compared to the original version, is the main difference just commenting out the items you mentioned in the requirements.txt?
Why is this necessary?

@drikster80
Copy link

drikster80 commented Jul 19, 2024

@cyc00518 You can see the list of full changes here: main...drikster80:vllm:gh200-docker

Effectively, xformers and vllm-flash-attention don't release ARM64 wheels, so those need to be built from source. Also, since Nvidia's PyTorch container already containers torch, torchvision, and some other stuff, those need to be commented out in the requirements file. The only 3 files that are changes are Dockerfile, requirements-build.txt, and requirements-cuda.txt.

As a side note, if you're using a GH200 bare metal, you might also want to checkout my auto-install for GH200s. Getting it setup with optimizations, NCCL, OFED, for high-speed distributed training/inference was a pain, so automated it for people to use or reference: https://github.com/drikster80/gh200-Ubuntu-22.04-autoinstall

Main issues that needed to be overcome:

  • Use Nvidia's Pytorch container due to PyTorch not supporting ARM64. (specifically nvcr.io/nvidia/pytorch:24.04-py3 to ensure PyTorch 2.3 build and latest optimizations (e.g. Lightning-Thunder). Release Notes for 24.04-py3
  • xformers hangs on pip install. Not sure why (maybe just taking forever to compile?)
  • Triton needs to be installed from source
  • vllm-flash-attn needs to be built from source
  • Comment out "torch", "xformers", and "vllm-flash-attn" in requirements files (handling that in the Dockerfile directly).

@cyc00518
Copy link

@drikster80
Thank you very much for your patient replies.

I have learned a lot, and I also appreciate the additional information you provided!

@drikster80
Copy link

Updated the aarch64 remote branch to v0.5.2: https://github.com/drikster80/vllm/tree/gh200-docker

Pushed up a GH200 specific version (built for SM 9.0+PTX) to https://hub.docker.com/r/drikster80/vllm-gh200-openai

Building a more generic version now and will update this comment when complete.

@drikster80
Copy link

If anyone comes across this and is trying to get Llama-3.1 to work with the GH200 (or aarch64 + H100), I have the latest working container (v0.5.3-post1 with a couple more commits) image up at https://hub.docker.com/r/drikster80/vllm-gh200-openai
Pull it with docker pull drikster80/vllm-gh200-openai:latest

Codes is still in the https://github.com/drikster80/vllm/tree/gh200-docker branch.

Validated Llama-3.1-8b-Instruct works, and trying to working to test 405B-FP8 now (with cpu-offload)

@FanZhang91
Copy link

@tuanhe Still face same problem, Anyone know vllm support aarch-64 now?

+1

@skandermoalla
Copy link

Also built some images for arm64 with cuda arch 9.0 (for GH200/H100) and for amd64 for cuda arch 8.0 and 9.0 (A100 and H100) from a fork of @drikster80 's installation to focus on the reproducibility of the build and to have both architectures start from the NGC PyTorch images.
Code: https://github.com/skandermoalla/vllm-build
Images: https://hub.docker.com/repository/docker/skandermoalla/vllm/general

@drikster80
Copy link

@FanZhang91, I still maintain two docker images for aarch64 on DockerHub. These have both been updated to v0.6.1 as of 30 min ago.

All Supported CUDA caps: drikster80/vllm-aarch64-openai:latest
GH200/H100+ only (smaller): drikster80/vllm-gh200-openai:latest

They are slightly different from upstream in a couple small ways:

  • Based on Nvidia Pytorch container 24.07
  • Python 3.10 (haven't upgraded to 3.12 yet to to source compiling problems
  • Using main FlashInfer instead of release... just haven't gotten around to pinning that to a release.
  • Xformers, Flashinfer, and a couple other things needed to be built from source

You can pull and build yourself with:

git clone -b gh200-docker https://github.com/drikster80/vllm.git
cd ./vllm\

# Update the max_jobs and nvvc_threads as needed to prevent OOM. This is good for a GH200.
docker build . --target vllm-openai -t drikster80/vllm-aarch64-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8

# Can also pin to a specific Nvidia GPU Capability:
# docker build . --target vllm-openai -t drikster80/vllm-gh200-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"

It takes ~1 hr to build on a pinned capability, and ~3+ hours to build for all GPU capability levels. Longer if you reduce the max_jobs variable.

@skandermoalla, I've been meaning to make a PR for a merged DockerFile that can product both arm64 & amd64... just haven't had the time to work it. This was requested by some of the vllm maintainers and would make my life a lot easier to not need to maintain a separate fork. Is this something you'd be interested in collaborating on?

@skandermoalla
Copy link

There weren't any changes in the Dockerfile or dependencies to compile for arm64 and and64 as most of the tricky packages are compiled from source.
For me what's important is to start from the NGC image for both architectures. If this is something the vllm team is happy to have then I'm happy to collaborate on producing one!
You did all the hard work already of figuring out what to compile and what not and in which order to install the packages and skip their pip deps when needed.

@gongchengli
Copy link

@FanZhang91, I still maintain two docker images for aarch64 on DockerHub. These have both been updated to v0.6.1 as of 30 min ago.

All Supported CUDA caps: drikster80/vllm-aarch64-openai:latest GH200/H100+ only (smaller): drikster80/vllm-gh200-openai:latest

They are slightly different from upstream in a couple small ways:

  • Based on Nvidia Pytorch container 24.07
  • Python 3.10 (haven't upgraded to 3.12 yet to to source compiling problems
  • Using main FlashInfer instead of release... just haven't gotten around to pinning that to a release.
  • Xformers, Flashinfer, and a couple other things needed to be built from source

You can pull and build yourself with:

git clone -b gh200-docker https://github.com/drikster80/vllm.git
cd ./vllm\

# Update the max_jobs and nvvc_threads as needed to prevent OOM. This is good for a GH200.
docker build . --target vllm-openai -t drikster80/vllm-aarch64-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8

# Can also pin to a specific Nvidia GPU Capability:
# docker build . --target vllm-openai -t drikster80/vllm-gh200-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"

It takes ~1 hr to build on a pinned capability, and ~3+ hours to build for all GPU capability levels. Longer if you reduce the max_jobs variable.

@skandermoalla, I've been meaning to make a PR for a merged DockerFile that can product both arm64 & amd64... just haven't had the time to work it. This was requested by some of the vllm maintainers and would make my life a lot easier to not need to maintain a separate fork. Is this something you'd be interested in collaborating on?

Hi @drikster80 , thanks for your docker images. After pulling the docker image, is it still necessary to rebuild or recompile from the source code? Since I got the error:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I am using the NVIDIA Jetson Orin with the docker in vllm v0.6.1, and the device is different from yours.
If it is necessary to do this, could you please provide any files?

@youkaichao
Copy link
Member

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

@KungFuPandaPro
Copy link

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

are your environment arm?

@youkaichao
Copy link
Member

yes, I built it on GH200 successfully.

@KungFuPandaPro
Copy link

pip install -vvv -e . --no-build-isolation

so many erros : 32 errors detected in the compilation of "/home/qz/zww/vllm/csrc/quantization/gptq/q_gemm.cu".
11 errors detected in the compilation of "/home/qz/zww/vllm/csrc/quantization/fp8/common.cu". is that normal?

@KungFuPandaPro
Copy link

yes, I built it on GH200 successfully.

I failed

@KungFuPandaPro
Copy link

yes, I built it on GH200 successfully.

I build on Jetson

@KungFuPandaPro
Copy link

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

But how to download torch from cuda122 :pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

@drikster80
Copy link

drikster80 commented Sep 25, 2024

@youkaichao, I'm still running into errors building the Docker container:

When running:

build . --target vllm-openai --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"
# Tried with and without setting VLLM_TARGET_DEVICE=cuda env

I get the following error:

=> ERROR [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     --mount=type=bind,source=.git,target=.git      if [ "$USE_SCCACHE" !=  1.4s
------
 > [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     --mount=type=bind,source=.git,target=.git      if [ "$USE_SCCACHE" != "1" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi:
1.167 Traceback (most recent call last):
1.167   File "/workspace/setup.py", line 485, in <module>
1.167     version=get_vllm_version(),
1.167             ^^^^^^^^^^^^^^^^^^
1.167   File "/workspace/setup.py", line 394, in get_vllm_version
1.167     raise RuntimeError("Unknown runtime environment")
1.167 RuntimeError: Unknown runtime environment
------
Dockerfile:106
--------------------
 105 |     ENV CCACHE_DIR=/root/.cache/ccache
 106 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \
 107 | >>>     --mount=type=cache,target=/root/.cache/pip \
 108 | >>>     --mount=type=bind,source=.git,target=.git  \
 109 | >>>     if [ "$USE_SCCACHE" != "1" ]; then \
 110 | >>>         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \
 111 | >>>     fi
 112 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi" did not complete successfully: exit code: 1

Was having that problem before as well. Not sure why it can't detect the architecture. In my branched dockerfile, I updated the setup.py to force cuda. That's not going to work for main repo though.

@youkaichao
Copy link
Member

@drikster80 when I met the problem Unknown runtime environment, it is usually because pip installs torch from pypi directly, and it does not have aarch64 wheel with cuda support.

make sure you control all the torch installation.

my solution is:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # install pytorch
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

make sure you followed these steps.

ideally, you should not see any pytorch install/uninstall during the build, because your dockerfile already has pytorch installed.

@KungFuPandaPro
Copy link

@drikster80 when I met the problem Unknown runtime environment, it is usually because pip installs torch from pypi directly, and it does not have aarch64 wheel with cuda support.

make sure you control all the torch installation.

my solution is:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # install pytorch
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

make sure you followed these steps.

ideally, you should not see any pytorch install/uninstall during the build, because your dockerfile already has pytorch installed.

which dockerfile you use?

@KungFuPandaPro
Copy link

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

my platform is aarch64-linux in jetson

@KungFuPandaPro
Copy link

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

I failed :Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.