Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: automatically select distributed inference backend #4955

Closed
youkaichao opened this issue May 21, 2024 · 3 comments · Fixed by #5230
Closed

[Feature]: automatically select distributed inference backend #4955

youkaichao opened this issue May 21, 2024 · 3 comments · Fixed by #5230
Assignees

Comments

@youkaichao
Copy link
Member

🚀 The feature, motivation and pitch

ray is kind of an overkill for single gpu case, but is currently the only choice for multi-node inference.

we can add an auto backend, that checks the world size and the number of gpus available in the node, if this fits within this node, we can use multiprocessing, otherwise we can use ray.

this will help performance a lot.

@njhill do you have any bandwidth for this?

Alternatives

No response

Additional context

No response

@Yard1
Copy link
Collaborator

Yard1 commented May 21, 2024

this will help performance a lot.

Given that we have almost completely replaced Ray communication layer for data (and soon for control), I doubt there will be a huge performance difference between multiprocessing and Ray at this point. But it would be good to benchmark.

@njhill
Copy link
Member

njhill commented May 22, 2024

@youkaichao yep I had been intending to do this as soon as I had a chance, should be able to tomorrow.

njhill added a commit to njhill/vllm that referenced this issue Jun 3, 2024
Also update docs to reflect support for the multiprocessing distributed executor.

Resolves vllm-project#4955
Resolves vllm-project#5221
@njhill
Copy link
Member

njhill commented Jun 3, 2024

@youkaichao I have opened #5230, PTAL.

@njhill njhill self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants