Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Minor] Optimize cuda graph memory usage #2437

Merged
merged 1 commit into from
Jan 14, 2024

Conversation

esmeetu
Copy link
Collaborator

@esmeetu esmeetu commented Jan 14, 2024

This PR can reduce cuda graph captured memory usage by max_num_seqs parameter. For personal and test use case, there is no need to occupy up to all level batch size ([1,2...256]) cuda graphs caches. So we can inform user to tune the max_num_seqs to save memory.

Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Yard1 Yard1 merged commit 9f659bf into vllm-project:main Jan 14, 2024
2 of 4 checks passed
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
@esmeetu esmeetu deleted the optimize-graph-memory branch February 3, 2024 04:13
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants