-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels #3859
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this won't work because we don't want to ship devel
image for production. We should still use the runtime base image
. The right fix should be change
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 AS vllm-base
to
FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS vllm-base
@simon-mo please take a look and see if the modification of dockerfile is good. The test seems to be ok with the modification. |
@simon-mo so we want to build wheel using |
Yes. Also also manually test the openai image locally to ensure it has all the necessary dependencies. We want to avoid using the |
Ideal case:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz confirm locally that the openai server still work as expected.
don't get it. I think we have tests for api server 👀 |
Similar to release process, can you build the final container locally (DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=1) and confirm the server works (docker run --runtime nvidia --gpus all -p 8000:8000 vllm/vllm-openai). Our CI uses the test container but not the openai server container. |
Okay, confirmed it works. |
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (vllm-project#3859)
the
vllm_nccl
package must be installed from source distributionpip is too smart to store a wheel in the cache, and other CI jobs
will directly use the wheel from the cache, which is not what we want.
we need to remove it manually