Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict #4660

Merged
merged 3 commits into from
May 8, 2024

Conversation

youkaichao
Copy link
Member

Prior to this PR, broadcast_tensor_dict can only work for cuda tensor.

This PR enables both cuda tensor and cpu tensor for broadcast_tensor_dict.

It will be useful when we have some metadata in cpu tensor, e.g. blocks_to_swap_in and blocks_to_swap_out to be introduced in #4659 .

Note: blocks_to_copy is still a cuda tensor, because the src and target for copy both lives in GPU, and we have a dedicated copy kernel for it. blocks_to_swap_in and blocks_to_swap_out has to be cpu tensor, because they are kernel launch arguments.

@youkaichao youkaichao requested a review from zhuohan123 May 7, 2024 22:18
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@youkaichao youkaichao merged commit cc466a3 into vllm-project:main May 8, 2024
55 checks passed
@youkaichao youkaichao deleted the split_broadcast branch May 8, 2024 02:36
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 8, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants