Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Distributed] enable multiple tp group #4512

Merged
merged 8 commits into from
May 2, 2024

Conversation

youkaichao
Copy link
Member

Improve the code to support multiple groups. An ongoing effort to support pipeline parallel #4412 in the end.

cc @simon-mo when we have 4 GPU CI machines ready, tests in this PR can be merged. I tested it locally, and it works.

@youkaichao youkaichao requested a review from zhuohan123 May 1, 2024 03:24
@simon-mo
Copy link
Collaborator

simon-mo commented May 1, 2024

adding...

@simon-mo
Copy link
Collaborator

simon-mo commented May 1, 2024

ok node is up, try setting num_gpus to 4 in the pipeline yaml?

@youkaichao
Copy link
Member Author

ok node is up, try setting num_gpus to 4 in the pipeline yaml?

Only run this test in 4 gpu machine, or run all distributed tests in 4 gpu machine?

@youkaichao
Copy link
Member Author

TODO: we can also use 4 GPU node to test pipeline parallel and multi-node setup, by using two docker containers with 2 GPUs each.

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
@youkaichao youkaichao enabled auto-merge (squash) May 1, 2024 22:22
@youkaichao youkaichao merged commit 2a85f93 into vllm-project:main May 2, 2024
49 checks passed
@youkaichao youkaichao deleted the multiple_tp branch May 2, 2024 04:34
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants