Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add one example to run batch inference distributed on Ray #2696

Merged
merged 4 commits into from
Feb 2, 2024

Conversation

c21
Copy link
Contributor

@c21 c21 commented Feb 1, 2024

This PR is to add an example to run vLLM batch inference in a multi-nodes environment.
Ray Data is used here to orchestrate the workflow:

  • Read from multiple input files on cloud storage.
  • Start vLLM engines on multiple nodes, run inference on all input in parallel.
  • Write the result and prompt back to cloud storage.

Tested w/ 58k number of prompts and meta-llama/Llama-2-7b-chat-hf. Whole job took 5 minutes on 10 L4 GPU nodes.

Co-authored-by: Zhe Zhang <zhz@anyscale.com>
@zhe-thoughts
Copy link

This is a small and clean example. lgtm

cc @simon-mo

@simon-mo simon-mo self-assigned this Feb 1, 2024
Signed-off-by: Cheng Su <scnju13@gmail.com>
Signed-off-by: Cheng Su <scnju13@gmail.com>
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's verify this works with tensor parallelism.

examples/offline_inference_distributed.py Outdated Show resolved Hide resolved
examples/offline_inference_distributed.py Outdated Show resolved Hide resolved
examples/offline_inference_distributed.py Outdated Show resolved Hide resolved
@c21
Copy link
Contributor Author

c21 commented Feb 2, 2024

Let's verify this works with tensor parallelism.

@Yard1 - confirmed tensor parallelism works when testing num_gpus=0 with tensor_parallel_size=2. Underlying 2 child actors (logical GPU=1 per each) got scheduled, and 2 GPUs are actually used. I added comment in PR for tensor parallelism, do not set num_gpus for now.

@c21
Copy link
Contributor Author

c21 commented Feb 2, 2024

Hi @Yard1 and @simon-mo - addressed all comments, and this PR is ready for review again. Thanks.

@simon-mo simon-mo merged commit 4abf633 into vllm-project:main Feb 2, 2024
17 checks passed
@c21 c21 deleted the ray branch February 2, 2024 23:57
@c21
Copy link
Contributor Author

c21 commented Feb 2, 2024

Thank you @simon-mo and @Yard1 for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants