ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

zhudy · 2023-12-11T14:37:44Z

do as: https://docs.vllm.ai/en/latest/getting_started/installation.html

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .

here is the details in side the docker instance:
root@f8c2e06fbf8b:/mnt/vllm# pip install -e .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///mnt/vllm
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... error
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
:142: UserWarning: Unsupported CUDA/ROCM architectures ({'6.1', '7.2', '8.7', '5.2', '6.0'}) are excluded from the TORCH_CUDA_ARCH_LIST env variable (5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX). Supported CUDA/ROCM architectures are: {'7.5', '8.0', '9.0', '7.0', '8.6+PTX', '9.0+PTX', '8.6', '8.0+PTX', '8.9+PTX', '8.9', '7.0+PTX', '7.5+PTX'}.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable
return hook(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 441, in get_requires_for_build_editable
return self.get_requires_for_build_wheel(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 297, in
File "", line 267, in get_vllm_version
NameError: name 'nvcc_cuda_version' is not defined. Did you mean: 'cuda_version'?
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python -m pip install --upgrade pip

The text was updated successfully, but these errors were encountered:

zhudy · 2023-12-11T14:39:15Z

Actually, the nvcc is ok to run as these:

root@f8c2e06fbf8b:/mnt/vllm# nvcc -v
nvcc fatal : No input files specified; use option --help for more information
root@f8c2e06fbf8b:/mnt/vllm# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:10:07_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

zhudy · 2023-12-11T14:54:26Z

there is cuda:

root@f8c2e06fbf8b:/mnt/vllm# echo $CUDA_HOME
/usr/local/cuda

root@f8c2e06fbf8b:/mnt/vllm# type nvcc
nvcc is /usr/local/cuda/bin/nvcc

github.com/vllm# python3 -c "import torch; print(torch.cuda.is_available()); print(torch.version);"
True
2.1.0a0+32f93b1

yexing · 2023-12-13T09:38:43Z

add

nvcc_cuda_version = get_nvcc_cuda_version(CUDA_HOME)

to setup.py at line 268

cyc00518 · 2024-02-22T05:00:04Z

@yexing @zhudy
Excuse me. I face the same problem.
I cloned vllm into my project.
and add
nvcc_cuda_version = get_nvcc_cuda_version(CUDA_HOME)
to setup.py at line 268

But still have same problem. Did I mislead something?

Wetzr · 2024-03-04T08:23:09Z

I have the same problem and would be glad if there would be any help.
Setup:
Aarch64 GH200
OS: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
nvcc: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_11:03:34_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
cuda home: /usr/local/cuda
Torch: 2.2.0a0+81ea7a4

I am running inside the nvidia pytorch_23.12 Container.

haileyschoelkopf · 2024-03-06T13:53:21Z

Got it working with the changes in this branch: https://github.com/haileyschoelkopf/vllm/tree/aarm64-dockerfile , with built dockerfiles here: https://hub.docker.com/r/haileysch/vllm-aarch64-base https://hub.docker.com/r/haileysch/vllm-aarch64-openai hopefully this'll be helpful to others!

tuanhe · 2024-03-29T02:17:46Z

do as: https://docs.vllm.ai/en/latest/getting_started/installation.html

docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3

git clone https://github.com/vllm-project/vllm.git

cd vllm

pip install -e .

here is the details in side the docker instance: root@f8c2e06fbf8b:/mnt/vllm# pip install -e . Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Obtaining file:///mnt/vllm Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... error error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully. │ exit code: 1 ╰─> [22 lines of output] /tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' :142: UserWarning: Unsupported CUDA/ROCM architectures ({'6.1', '7.2', '8.7', '5.2', '6.0'}) are excluded from the TORCH_CUDA_ARCH_LIST env variable (5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX). Supported CUDA/ROCM architectures are: {'7.5', '8.0', '9.0', '7.0', '8.6+PTX', '9.0+PTX', '8.6', '8.0+PTX', '8.9+PTX', '8.9', '7.0+PTX', '7.5+PTX'}. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable return hook(config_settings) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 441, in get_requires_for_build_editable return self.get_requires_for_build_wheel(config_settings) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup exec(code, locals()) File "", line 297, in File "", line 267, in get_vllm_version NameError: name 'nvcc_cuda_version' is not defined. Did you mean: 'cuda_version'? [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build editable did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.2.1 -> 23.3.1 [notice] To update, run: python -m pip install --upgrade pip

HI, guys , had you solved the issue ?

cyc00518 · 2024-06-06T02:29:47Z

@tuanhe
Still face same problem,
Anyone know vllm support aarch-64 now?

drikster80 · 2024-07-11T18:46:50Z

Had a similar problem on the GH200 (aarch64 Grace CPU).
Similar to @haileyschoelkopf, I updated the Dockerfile and requirements to work with v0.5.1. Here is the forked version:
https://github.com/drikster80/vllm/tree/gh200-docker

Main issues that needed to be overcome:

Use Nvidia's Pytorch container due to PyTorch not supporting ARM64. (specifically nvcr.io/nvidia/pytorch:24.04-py3 to ensure PyTorch 2.3 build and latest optimizations (e.g. Lightning-Thunder). Release Notes for 24.04-py3
xformers hangs on pip install. Not sure why (maybe just taking forever to compile?)
Triton needs to be installed from source
vllm-flash-attn needs to be built from source
Comment out "torch", "xformers", and "vllm-flash-attn" in requirements files (handling that in the Dockerfile directly).

For future updating, you can see the changes here: drikster80@359fd4f

ZihaoZhou · 2024-07-19T14:35:06Z

Thank you all.

I have built the image using the script provided by @drikster80, and it takes about 12 hours (most of the time is spent for mamba builder and xformers). So to save time for others, I have made the image public at https://hub.docker.com/r/zihaokevinzhou/vllm-aarch64-openai . I have validated it works well for my personal hosting of fp8 quantized version of llama-3-70b.

drikster80 · 2024-07-19T14:56:22Z

@ZihaoZhou, thank you.

It normally only takes ~80 min on my system. 12 hrs seems excessive. I'm working on an update for v0.5.2, but haven't gotten the new flash-infer to build yet. I'll update the script when that's solved and post back here.

I haven't been uploading since the container is ~33GB. It looks like the one you uploaded is 13GB? Is that just from native compression? I'm sure there are some ways to cut it down (e.g. remove some of the build artifacts from the last image?).

cyc00518 · 2024-07-19T15:14:08Z

@ZihaoZhou
You should have appeared earlier!

@drikster80
In fact, I am also using GH200, and today I used yours forked version to build it.

The step that took me the most time was:
RUN python3 setup.py bdist_wheel --dist-dir=dist which took a total of 40 minutes.
Installing Triton also took a very long time.

Additionally, for the xformers part, I spent an entire afternoon, and it also seemed to be stuck there.
So in the end, I commented out this part.

Now, vllm is successfully running on GH200, thanks to your selfless contribution!

May I ask, regarding the Docker image on aarch64, compared to the original version, is the main difference just commenting out the items you mentioned in the requirements.txt?
Why is this necessary?

drikster80 · 2024-07-19T15:23:26Z

@cyc00518 You can see the list of full changes here: main...drikster80:vllm:gh200-docker

Effectively, xformers and vllm-flash-attention don't release ARM64 wheels, so those need to be built from source. Also, since Nvidia's PyTorch container already containers torch, torchvision, and some other stuff, those need to be commented out in the requirements file. The only 3 files that are changes are Dockerfile, requirements-build.txt, and requirements-cuda.txt.

As a side note, if you're using a GH200 bare metal, you might also want to checkout my auto-install for GH200s. Getting it setup with optimizations, NCCL, OFED, for high-speed distributed training/inference was a pain, so automated it for people to use or reference: https://github.com/drikster80/gh200-Ubuntu-22.04-autoinstall

Main issues that needed to be overcome:

Use Nvidia's Pytorch container due to PyTorch not supporting ARM64. (specifically nvcr.io/nvidia/pytorch:24.04-py3 to ensure PyTorch 2.3 build and latest optimizations (e.g. Lightning-Thunder). Release Notes for 24.04-py3

xformers hangs on pip install. Not sure why (maybe just taking forever to compile?)

Triton needs to be installed from source

vllm-flash-attn needs to be built from source

Comment out "torch", "xformers", and "vllm-flash-attn" in requirements files (handling that in the Dockerfile directly).

cyc00518 · 2024-07-19T15:39:43Z

@drikster80
Thank you very much for your patient replies.

I have learned a lot, and I also appreciate the additional information you provided！

drikster80 · 2024-07-19T20:07:22Z

Updated the aarch64 remote branch to v0.5.2: https://github.com/drikster80/vllm/tree/gh200-docker

Pushed up a GH200 specific version (built for SM 9.0+PTX) to https://hub.docker.com/r/drikster80/vllm-gh200-openai

Building a more generic version now and will update this comment when complete.

drikster80 · 2024-07-23T20:45:36Z

If anyone comes across this and is trying to get Llama-3.1 to work with the GH200 (or aarch64 + H100), I have the latest working container (v0.5.3-post1 with a couple more commits) image up at https://hub.docker.com/r/drikster80/vllm-gh200-openai
Pull it with docker pull drikster80/vllm-gh200-openai:latest

Codes is still in the https://github.com/drikster80/vllm/tree/gh200-docker branch.

Validated Llama-3.1-8b-Instruct works, and trying to working to test 405B-FP8 now (with cpu-offload)

FanZhang91 · 2024-08-16T07:17:20Z

@tuanhe Still face same problem, Anyone know vllm support aarch-64 now?

+1

skandermoalla · 2024-09-13T00:06:12Z

Also built some images for arm64 with cuda arch 9.0 (for GH200/H100) and for amd64 for cuda arch 8.0 and 9.0 (A100 and H100) from a fork of @drikster80 's installation to focus on the reproducibility of the build and to have both architectures start from the NGC PyTorch images.
Code: https://github.com/skandermoalla/vllm-build
Images: https://hub.docker.com/repository/docker/skandermoalla/vllm/general

drikster80 · 2024-09-13T02:02:46Z

@FanZhang91, I still maintain two docker images for aarch64 on DockerHub. These have both been updated to v0.6.1 as of 30 min ago.

All Supported CUDA caps: drikster80/vllm-aarch64-openai:latest
GH200/H100+ only (smaller): drikster80/vllm-gh200-openai:latest

They are slightly different from upstream in a couple small ways:

Based on Nvidia Pytorch container 24.07
Python 3.10 (haven't upgraded to 3.12 yet to to source compiling problems
Using main FlashInfer instead of release... just haven't gotten around to pinning that to a release.
Xformers, Flashinfer, and a couple other things needed to be built from source

You can pull and build yourself with:

git clone -b gh200-docker https://github.com/drikster80/vllm.git
cd ./vllm\

# Update the max_jobs and nvvc_threads as needed to prevent OOM. This is good for a GH200.
docker build . --target vllm-openai -t drikster80/vllm-aarch64-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8

# Can also pin to a specific Nvidia GPU Capability:
# docker build . --target vllm-openai -t drikster80/vllm-gh200-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"

It takes ~1 hr to build on a pinned capability, and ~3+ hours to build for all GPU capability levels. Longer if you reduce the max_jobs variable.

@skandermoalla, I've been meaning to make a PR for a merged DockerFile that can product both arm64 & amd64... just haven't had the time to work it. This was requested by some of the vllm maintainers and would make my life a lot easier to not need to maintain a separate fork. Is this something you'd be interested in collaborating on?

skandermoalla · 2024-09-13T18:08:23Z

There weren't any changes in the Dockerfile or dependencies to compile for arm64 and and64 as most of the tricky packages are compiled from source.
For me what's important is to start from the NGC image for both architectures. If this is something the vllm team is happy to have then I'm happy to collaborate on producing one!
You did all the hard work already of figuring out what to compile and what not and in which order to install the packages and skip their pip deps when needed.

gongchengli · 2024-09-21T09:03:39Z

@FanZhang91, I still maintain two docker images for aarch64 on DockerHub. These have both been updated to v0.6.1 as of 30 min ago.

All Supported CUDA caps: drikster80/vllm-aarch64-openai:latest GH200/H100+ only (smaller): drikster80/vllm-gh200-openai:latest

They are slightly different from upstream in a couple small ways:

Based on Nvidia Pytorch container 24.07

Python 3.10 (haven't upgraded to 3.12 yet to to source compiling problems

Using main FlashInfer instead of release... just haven't gotten around to pinning that to a release.

Xformers, Flashinfer, and a couple other things needed to be built from source

You can pull and build yourself with:
git clone -b gh200-docker https://github.com/drikster80/vllm.git
cd ./vllm\

# Update the max_jobs and nvvc_threads as needed to prevent OOM. This is good for a GH200.
docker build . --target vllm-openai -t drikster80/vllm-aarch64-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8

# Can also pin to a specific Nvidia GPU Capability:
# docker build . --target vllm-openai -t drikster80/vllm-gh200-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"
It takes ~1 hr to build on a pinned capability, and ~3+ hours to build for all GPU capability levels. Longer if you reduce the max_jobs variable.

@skandermoalla, I've been meaning to make a PR for a merged DockerFile that can product both arm64 & amd64... just haven't had the time to work it. This was requested by some of the vllm maintainers and would make my life a lot easier to not need to maintain a separate fork. Is this something you'd be interested in collaborating on?

Hi @drikster80 , thanks for your docker images. After pulling the docker image, is it still necessary to rebuild or recompile from the source code? Since I got the error:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I am using the NVIDIA Jetson Orin with the docker in vllm v0.6.1, and the device is different from yours.
If it is necessary to do this, could you please provide any files?

youkaichao · 2024-09-22T16:17:14Z

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

KungFuPandaPro · 2024-09-25T07:27:08Z

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

are your environment arm?

youkaichao · 2024-09-25T08:12:23Z

yes, I built it on GH200 successfully.

KungFuPandaPro · 2024-09-25T08:37:04Z

pip install -vvv -e . --no-build-isolation

so many erros : 32 errors detected in the compilation of "/home/qz/zww/vllm/csrc/quantization/gptq/q_gemm.cu".
11 errors detected in the compilation of "/home/qz/zww/vllm/csrc/quantization/fp8/common.cu". is that normal?

KungFuPandaPro · 2024-09-25T08:53:58Z

yes, I built it on GH200 successfully.

I failed

KungFuPandaPro · 2024-09-25T08:55:00Z

yes, I built it on GH200 successfully.

I build on Jetson

KungFuPandaPro · 2024-09-25T09:25:22Z

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

But how to download torch from cuda122 :pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

drikster80 · 2024-09-25T13:27:24Z

@youkaichao, I'm still running into errors building the Docker container:

When running:

build . --target vllm-openai --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"
# Tried with and without setting VLLM_TARGET_DEVICE=cuda env

I get the following error:

=> ERROR [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     --mount=type=bind,source=.git,target=.git      if [ "$USE_SCCACHE" !=  1.4s
------
 > [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache     --mount=type=cache,target=/root/.cache/pip     --mount=type=bind,source=.git,target=.git      if [ "$USE_SCCACHE" != "1" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi:
1.167 Traceback (most recent call last):
1.167   File "/workspace/setup.py", line 485, in <module>
1.167     version=get_vllm_version(),
1.167             ^^^^^^^^^^^^^^^^^^
1.167   File "/workspace/setup.py", line 394, in get_vllm_version
1.167     raise RuntimeError("Unknown runtime environment")
1.167 RuntimeError: Unknown runtime environment
------
Dockerfile:106
--------------------
 105 |     ENV CCACHE_DIR=/root/.cache/ccache
 106 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \
 107 | >>>     --mount=type=cache,target=/root/.cache/pip \
 108 | >>>     --mount=type=bind,source=.git,target=.git  \
 109 | >>>     if [ "$USE_SCCACHE" != "1" ]; then \
 110 | >>>         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \
 111 | >>>     fi
 112 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then         python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38;     fi" did not complete successfully: exit code: 1

Was having that problem before as well. Not sure why it can't detect the architecture. In my branched dockerfile, I updated the setup.py to force cuda. That's not going to work for main repo though.

youkaichao · 2024-09-25T16:31:15Z

@drikster80 when I met the problem Unknown runtime environment, it is usually because pip installs torch from pypi directly, and it does not have aarch64 wheel with cuda support.

make sure you control all the torch installation.

my solution is:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # install pytorch
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch

make sure you followed these steps.

ideally, you should not see any pytorch install/uninstall during the build, because your dockerfile already has pytorch installed.

KungFuPandaPro · 2024-09-26T00:45:59Z

@drikster80 when I met the problem Unknown runtime environment, it is usually because pip installs torch from pypi directly, and it does not have aarch64 wheel with cuda support.

make sure you control all the torch installation.

my solution is:
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # install pytorch
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch
make sure you followed these steps.

ideally, you should not see any pytorch install/uninstall during the build, because your dockerfile already has pytorch installed.

which dockerfile you use?

KungFuPandaPro · 2024-09-26T01:06:31Z

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

my platform is aarch64-linux in jetson

KungFuPandaPro · 2024-09-26T02:08:08Z

can you please try out #8713 ? @drikster80 @gongchengli

I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward.

on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment:
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation

I failed :Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher

youkaichao mentioned this issue Aug 16, 2024

[Installation]: vllm install error in jetson agx orin #7575

Open

youkaichao mentioned this issue Sep 22, 2024

Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher #8708

Open

1 task

youkaichao linked a pull request Sep 22, 2024 that will close this issue

[build] enable existing pytorch (for GH200, aarch64, nightly) #8713

Merged

youkaichao closed this as completed in #8713 Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

zhudy commented Dec 11, 2023

zhudy commented Dec 11, 2023

zhudy commented Dec 11, 2023

yexing commented Dec 13, 2023

cyc00518 commented Feb 22, 2024 •

edited

Loading

Wetzr commented Mar 4, 2024

haileyschoelkopf commented Mar 6, 2024

tuanhe commented Mar 29, 2024

cyc00518 commented Jun 6, 2024

drikster80 commented Jul 11, 2024

ZihaoZhou commented Jul 19, 2024 •

edited

Loading

drikster80 commented Jul 19, 2024

cyc00518 commented Jul 19, 2024

drikster80 commented Jul 19, 2024 •

edited

Loading

Main issues that needed to be overcome:

cyc00518 commented Jul 19, 2024

drikster80 commented Jul 19, 2024

drikster80 commented Jul 23, 2024

FanZhang91 commented Aug 16, 2024

skandermoalla commented Sep 13, 2024

drikster80 commented Sep 13, 2024

skandermoalla commented Sep 13, 2024

gongchengli commented Sep 21, 2024

youkaichao commented Sep 22, 2024

KungFuPandaPro commented Sep 25, 2024

youkaichao commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

drikster80 commented Sep 25, 2024 •

edited

Loading

youkaichao commented Sep 25, 2024

KungFuPandaPro commented Sep 26, 2024

KungFuPandaPro commented Sep 26, 2024

KungFuPandaPro commented Sep 26, 2024

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

Comments

zhudy commented Dec 11, 2023

zhudy commented Dec 11, 2023

zhudy commented Dec 11, 2023

yexing commented Dec 13, 2023

cyc00518 commented Feb 22, 2024 • edited Loading

Wetzr commented Mar 4, 2024

haileyschoelkopf commented Mar 6, 2024

tuanhe commented Mar 29, 2024

cyc00518 commented Jun 6, 2024

drikster80 commented Jul 11, 2024

Main issues that needed to be overcome:

ZihaoZhou commented Jul 19, 2024 • edited Loading

drikster80 commented Jul 19, 2024

cyc00518 commented Jul 19, 2024

drikster80 commented Jul 19, 2024 • edited Loading

Main issues that needed to be overcome:

cyc00518 commented Jul 19, 2024

drikster80 commented Jul 19, 2024

drikster80 commented Jul 23, 2024

FanZhang91 commented Aug 16, 2024

skandermoalla commented Sep 13, 2024

drikster80 commented Sep 13, 2024

skandermoalla commented Sep 13, 2024

gongchengli commented Sep 21, 2024

youkaichao commented Sep 22, 2024

KungFuPandaPro commented Sep 25, 2024

youkaichao commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

KungFuPandaPro commented Sep 25, 2024

drikster80 commented Sep 25, 2024 • edited Loading

youkaichao commented Sep 25, 2024

KungFuPandaPro commented Sep 26, 2024

KungFuPandaPro commented Sep 26, 2024

KungFuPandaPro commented Sep 26, 2024

cyc00518 commented Feb 22, 2024 •

edited

Loading

ZihaoZhou commented Jul 19, 2024 •

edited

Loading

drikster80 commented Jul 19, 2024 •

edited

Loading

drikster80 commented Sep 25, 2024 •

edited

Loading