-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021
Comments
Actually, the nvcc is ok to run as these: root@f8c2e06fbf8b:/mnt/vllm# nvcc -v |
there is cuda: root@f8c2e06fbf8b:/mnt/vllm# echo $CUDA_HOME root@f8c2e06fbf8b:/mnt/vllm# type nvcc github.com/vllm# python3 -c "import torch; print(torch.cuda.is_available()); print(torch.version);" |
add
to setup.py at line 268 |
I have the same problem and would be glad if there would be any help. I am running inside the nvidia pytorch_23.12 Container. |
Got it working with the changes in this branch: https://github.com/haileyschoelkopf/vllm/tree/aarm64-dockerfile , with built dockerfiles here: https://hub.docker.com/r/haileysch/vllm-aarch64-base https://hub.docker.com/r/haileysch/vllm-aarch64-openai hopefully this'll be helpful to others! |
HI, guys , had you solved the issue ? |
@tuanhe |
Had a similar problem on the GH200 (aarch64 Grace CPU). Main issues that needed to be overcome:
For future updating, you can see the changes here: drikster80@359fd4f |
Thank you all. I have built the image using the script provided by @drikster80, and it takes about 12 hours (most of the time is spent for mamba builder and xformers). So to save time for others, I have made the image public at https://hub.docker.com/r/zihaokevinzhou/vllm-aarch64-openai . I have validated it works well for my personal hosting of fp8 quantized version of llama-3-70b. |
@ZihaoZhou, thank you. It normally only takes ~80 min on my system. 12 hrs seems excessive. I'm working on an update for v0.5.2, but haven't gotten the new I haven't been uploading since the container is ~33GB. It looks like the one you uploaded is 13GB? Is that just from native compression? I'm sure there are some ways to cut it down (e.g. remove some of the build artifacts from the last image?). |
@ZihaoZhou @drikster80 The step that took me the most time was: Additionally, for the xformers part, I spent an entire afternoon, and it also seemed to be stuck there. Now, vllm is successfully running on GH200, thanks to your selfless contribution! May I ask, regarding the Docker image on aarch64, compared to the original version, is the main difference just commenting out the items you mentioned in the requirements.txt? |
@cyc00518 You can see the list of full changes here: main...drikster80:vllm:gh200-docker Effectively, As a side note, if you're using a GH200 bare metal, you might also want to checkout my auto-install for GH200s. Getting it setup with optimizations, NCCL, OFED, for high-speed distributed training/inference was a pain, so automated it for people to use or reference: https://github.com/drikster80/gh200-Ubuntu-22.04-autoinstall
|
@drikster80 I have learned a lot, and I also appreciate the additional information you provided! |
Updated the aarch64 remote branch to v0.5.2: https://github.com/drikster80/vllm/tree/gh200-docker Pushed up a GH200 specific version (built for SM 9.0+PTX) to https://hub.docker.com/r/drikster80/vllm-gh200-openai Building a more generic version now and will update this comment when complete. |
If anyone comes across this and is trying to get Llama-3.1 to work with the GH200 (or aarch64 + H100), I have the latest working container (v0.5.3-post1 with a couple more commits) image up at https://hub.docker.com/r/drikster80/vllm-gh200-openai Codes is still in the https://github.com/drikster80/vllm/tree/gh200-docker branch. Validated Llama-3.1-8b-Instruct works, and trying to working to test 405B-FP8 now (with cpu-offload) |
+1 |
Also built some images for arm64 with cuda arch 9.0 (for GH200/H100) and for amd64 for cuda arch 8.0 and 9.0 (A100 and H100) from a fork of @drikster80 's installation to focus on the reproducibility of the build and to have both architectures start from the NGC PyTorch images. |
@FanZhang91, I still maintain two docker images for aarch64 on DockerHub. These have both been updated to v0.6.1 as of 30 min ago. All Supported CUDA caps: drikster80/vllm-aarch64-openai:latest They are slightly different from upstream in a couple small ways:
You can pull and build yourself with: git clone -b gh200-docker https://github.com/drikster80/vllm.git
cd ./vllm\
# Update the max_jobs and nvvc_threads as needed to prevent OOM. This is good for a GH200.
docker build . --target vllm-openai -t drikster80/vllm-aarch64-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8
# Can also pin to a specific Nvidia GPU Capability:
# docker build . --target vllm-openai -t drikster80/vllm-gh200-openai:v0.6.1 --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX" It takes ~1 hr to build on a pinned capability, and ~3+ hours to build for all GPU capability levels. Longer if you reduce the max_jobs variable. @skandermoalla, I've been meaning to make a PR for a merged DockerFile that can product both arm64 & amd64... just haven't had the time to work it. This was requested by some of the vllm maintainers and would make my life a lot easier to not need to maintain a separate fork. Is this something you'd be interested in collaborating on? |
There weren't any changes in the Dockerfile or dependencies to compile for arm64 and and64 as most of the tricky packages are compiled from source. |
Hi @drikster80 , thanks for your docker images. After pulling the docker image, is it still necessary to rebuild or recompile from the source code? Since I got the error:
I am using the NVIDIA Jetson Orin with the docker in vllm v0.6.1, and the device is different from yours. |
can you please try out #8713 ? @drikster80 @gongchengli I spare some time to investigate the issue, and it looks the most complicated part is to bring your own pytorch ( @drikster80 does this by using ngc pytorch container). other than that, it is pretty straight-forward. on that branch, I can easily build vllm from scratch with nightly pytorch, in a fresh new environment: $ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py
$ pip install -r requirements-build.txt
$ pip install -vvv -e . --no-build-isolation |
are your environment arm? |
yes, I built it on GH200 successfully. |
so many erros : 32 errors detected in the compilation of "/home/qz/zww/vllm/csrc/quantization/gptq/q_gemm.cu". |
I failed |
I build on Jetson |
But how to download torch from cuda122 :pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 |
@youkaichao, I'm still running into errors building the Docker container: When running: build . --target vllm-openai --build-arg max_jobs=10 --build-arg nvcc_threads=8 --build-arg torch_cuda_arch_list="9.0+PTX"
# Tried with and without setting VLLM_TARGET_DEVICE=cuda env I get the following error: => ERROR [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/pip --mount=type=bind,source=.git,target=.git if [ "$USE_SCCACHE" != 1.4s
------
> [build 12/14] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/pip --mount=type=bind,source=.git,target=.git if [ "$USE_SCCACHE" != "1" ]; then python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; fi:
1.167 Traceback (most recent call last):
1.167 File "/workspace/setup.py", line 485, in <module>
1.167 version=get_vllm_version(),
1.167 ^^^^^^^^^^^^^^^^^^
1.167 File "/workspace/setup.py", line 394, in get_vllm_version
1.167 raise RuntimeError("Unknown runtime environment")
1.167 RuntimeError: Unknown runtime environment
------
Dockerfile:106
--------------------
105 | ENV CCACHE_DIR=/root/.cache/ccache
106 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \
107 | >>> --mount=type=cache,target=/root/.cache/pip \
108 | >>> --mount=type=bind,source=.git,target=.git \
109 | >>> if [ "$USE_SCCACHE" != "1" ]; then \
110 | >>> python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; \
111 | >>> fi
112 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38; fi" did not complete successfully: exit code: 1 Was having that problem before as well. Not sure why it can't detect the architecture. In my branched dockerfile, I updated the setup.py to force cuda. That's not going to work for main repo though. |
@drikster80 when I met the problem make sure you control all the torch installation. my solution is: $ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 # install pytorch
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python use_existing_torch.py # remove all vllm dependency specification of pytorch
$ pip install -r requirements-build.txt # install the rest build time dependency
$ pip install -vvv -e . --no-build-isolation # use --no-build-isolation to build with the current pytorch make sure you followed these steps. ideally, you should not see any pytorch install/uninstall during the build, because your dockerfile already has pytorch installed. |
which dockerfile you use? |
my platform is aarch64-linux in jetson |
I failed :Feature 'f16 arithemetic and compare instructions' requires .target sm_53 or higher |
do as: https://docs.vllm.ai/en/latest/getting_started/installation.html
here is the details in side the docker instance:
root@f8c2e06fbf8b:/mnt/vllm# pip install -e .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///mnt/vllm
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... error
error: subprocess-exited-with-error
× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
:142: UserWarning: Unsupported CUDA/ROCM architectures ({'6.1', '7.2', '8.7', '5.2', '6.0'}) are excluded from the
TORCH_CUDA_ARCH_LIST
env variable (5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX). Supported CUDA/ROCM architectures are: {'7.5', '8.0', '9.0', '7.0', '8.6+PTX', '9.0+PTX', '8.6', '8.0+PTX', '8.9+PTX', '8.9', '7.0+PTX', '7.5+PTX'}.Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable
return hook(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 441, in get_requires_for_build_editable
return self.get_requires_for_build_wheel(config_settings)
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-4xoxai9j/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 297, in
File "", line 267, in get_vllm_version
NameError: name 'nvcc_cuda_version' is not defined. Did you mean: 'cuda_version'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build editable did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: python -m pip install --upgrade pip
The text was updated successfully, but these errors were encountered: