Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make _prepare_sample non blocking and pin memory of CPU input buffers #2207

Merged
merged 9 commits into from
Dec 20, 2023

Conversation

hanzhi713
Copy link
Contributor

@hanzhi713 hanzhi713 commented Dec 19, 2023

Hide _prepare_sample latency with model execution since it looks like it doesn't depend on model forward.

We can use a copy stream for h2ds in _prepare_sample, but we probably don't really need to because these h2ds are very short.

cc @Yard1

@Yard1
Copy link
Collaborator

Yard1 commented Dec 19, 2023

Looks good - unfortunately we cannot use pinned memory if we are in wsl, can we add a check for that?

vllm/worker/model_runner.py Outdated Show resolved Hide resolved
@hanzhi713
Copy link
Contributor Author

Looks good - unfortunately we cannot use pinned memory if we are in wsl, can we add a check for that?

Just added. Please check

vllm/worker/model_runner.py Outdated Show resolved Hide resolved
@WoosukKwon
Copy link
Collaborator

@hanzhi713 A quick question: do you happen to evaluate the performance impact of this change? Just wondering, because the input preparation part only takes 1~3% of the overall running time in our profiling results.

@hanzhi713
Copy link
Contributor Author

hanzhi713 commented Dec 19, 2023

@hanzhi713 A quick question: do you happen to evaluate the performance impact of this change? Just wondering, because the input preparation part only takes 1~3% of the overall running time in our profiling results.

About 1% for 70B tp4 bs=64. Just a minor optimization. Merge at your discretion 😃

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
hanzhi713 and others added 4 commits December 19, 2023 16:40
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanzhi713 LGTM! Thanks for the PR!

@WoosukKwon WoosukKwon merged commit 31bff69 into vllm-project:main Dec 20, 2023
2 checks passed
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants