Add one example to run batch inference distributed on Ray #2696

c21 · 2024-02-01T00:02:25Z

This PR is to add an example to run vLLM batch inference in a multi-nodes environment.
Ray Data is used here to orchestrate the workflow:

Read from multiple input files on cloud storage.
Start vLLM engines on multiple nodes, run inference on all input in parallel.
Write the result and prompt back to cloud storage.

Tested w/ 58k number of prompts and meta-llama/Llama-2-7b-chat-hf. Whole job took 5 minutes on 10 L4 GPU nodes.

Co-authored-by: Zhe Zhang <zhz@anyscale.com>

zhe-thoughts · 2024-02-01T01:05:20Z

This is a small and clean example. lgtm

cc @simon-mo

Signed-off-by: Cheng Su <scnju13@gmail.com>

Yard1

Let's verify this works with tensor parallelism.

examples/offline_inference_distributed.py

c21 · 2024-02-02T00:02:40Z

Let's verify this works with tensor parallelism.

@Yard1 - confirmed tensor parallelism works when testing num_gpus=0 with tensor_parallel_size=2. Underlying 2 child actors (logical GPU=1 per each) got scheduled, and 2 GPUs are actually used. I added comment in PR for tensor parallelism, do not set num_gpus for now.

c21 · 2024-02-02T21:10:17Z

Hi @Yard1 and @simon-mo - addressed all comments, and this PR is ready for review again. Thanks.

c21 · 2024-02-02T23:57:43Z

Thank you @simon-mo and @Yard1 for review!

…ct#2696)

Add one example to run batch inference distributed on Ray

62b8a19

Co-authored-by: Zhe Zhang <zhz@anyscale.com>

zhe-thoughts mentioned this pull request Feb 1, 2024

Create an example of doing distributed offline inference #1230

Closed

simon-mo self-assigned this Feb 1, 2024

c21 added 2 commits January 31, 2024 19:11

Format code

f6cf768

Signed-off-by: Cheng Su <scnju13@gmail.com>

Fix format

52f721c

Signed-off-by: Cheng Su <scnju13@gmail.com>

Yard1 reviewed Feb 1, 2024

View reviewed changes

examples/offline_inference_distributed.py Outdated Show resolved Hide resolved

examples/offline_inference_distributed.py Outdated Show resolved Hide resolved

examples/offline_inference_distributed.py Outdated Show resolved Hide resolved

Address comments

3596d35

simon-mo approved these changes Feb 2, 2024

View reviewed changes

simon-mo merged commit 4abf633 into vllm-project:main Feb 2, 2024
17 checks passed

c21 deleted the ray branch February 2, 2024 23:57

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add one example to run batch inference distributed on Ray (vllm-proje…

46a462c

…ct#2696)

alexm-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Feb 13, 2024

Add one example to run batch inference distributed on Ray (vllm-proje…

da1360f

…ct#2696)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Add one example to run batch inference distributed on Ray (vllm-proje…

9f7c059

…ct#2696)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Add one example to run batch inference distributed on Ray (vllm-proje…

bb3ce01

…ct#2696)

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Add one example to run batch inference distributed on Ray (vllm-proje…

7740e71

…ct#2696)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add one example to run batch inference distributed on Ray #2696

Add one example to run batch inference distributed on Ray #2696

c21 commented Feb 1, 2024

zhe-thoughts commented Feb 1, 2024

Yard1 left a comment

c21 commented Feb 2, 2024

c21 commented Feb 2, 2024

c21 commented Feb 2, 2024

Add one example to run batch inference distributed on Ray #2696

Add one example to run batch inference distributed on Ray #2696

Conversation

c21 commented Feb 1, 2024

zhe-thoughts commented Feb 1, 2024

Yard1 left a comment

Choose a reason for hiding this comment

c21 commented Feb 2, 2024

c21 commented Feb 2, 2024

c21 commented Feb 2, 2024