Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels #3859

Merged
merged 13 commits into from
Apr 5, 2024

Conversation

youkaichao
Copy link
Member

the vllm_nccl package must be installed from source distribution
pip is too smart to store a wheel in the cache, and other CI jobs
will directly use the wheel from the cache, which is not what we want.
we need to remove it manually

@youkaichao youkaichao requested a review from simon-mo April 4, 2024 21:23
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this won't work because we don't want to ship devel image for production. We should still use the runtime base image. The right fix should be change

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 AS vllm-base

to

FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS vllm-base

@youkaichao
Copy link
Member Author

@simon-mo please take a look and see if the modification of dockerfile is good. The test seems to be ok with the modification.

@youkaichao
Copy link
Member Author

@simon-mo so we want to build wheel using dev image, and use that wheel for the rest images, right?

@simon-mo
Copy link
Collaborator

simon-mo commented Apr 4, 2024

Yes. Also also manually test the openai image locally to ensure it has all the necessary dependencies.

We want to avoid using the devel image for testing and production in general

@youkaichao
Copy link
Member Author

Ideal case:

nvidia/cuda:12.1.0-devel-ubuntu22.04 --> dev (install requirements) --> build (for building vllm wheels) and flash-attn-builder (for building flash-attn wheels)

nvidia/cuda:12.1.0-base-ubuntu22.04 --> vllm-base (install vllm from wheel) --> test ( for testing, only need to copy tests folder), and vllm-openai (for openai server, only need to install accelerate hf_transfer modelscope and start the server)

@youkaichao youkaichao changed the title [CI/Build] fix pip cache with vllm_nccl [CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels Apr 4, 2024
@youkaichao youkaichao requested a review from simon-mo April 5, 2024 00:05
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz confirm locally that the openai server still work as expected.

@youkaichao
Copy link
Member Author

plz confirm locally that the openai server still work as expected.

don't get it. I think we have tests for api server 👀

@simon-mo
Copy link
Collaborator

simon-mo commented Apr 5, 2024

Similar to release process, can you build the final container locally (DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=1) and confirm the server works (docker run --runtime nvidia --gpus all -p 8000:8000 vllm/vllm-openai). Our CI uses the test container but not the openai server container.

@youkaichao
Copy link
Member Author

Okay, confirmed it works.

@youkaichao youkaichao merged commit d03d64f into main Apr 5, 2024
35 checks passed
@youkaichao youkaichao deleted the fix_ci_nccl branch April 5, 2024 04:53
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (vllm-project#3859)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants