[Minor] Optimize cuda graph memory usage #2437

esmeetu · 2024-01-14T09:33:29Z

This PR can reduce cuda graph captured memory usage by max_num_seqs parameter. For personal and test use case, there is no need to occupy up to all level batch size ([1,2...256]) cuda graphs caches. So we can inform user to tune the max_num_seqs to save memory.

Yard1

LGTM, thanks!

optimize memory usage by max_num_seqs parameter

02f5278

Yard1 approved these changes Jan 14, 2024

View reviewed changes

Yard1 merged commit 9f659bf into vllm-project:main Jan 14, 2024
2 of 4 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024

[Minor] Optimize cuda graph memory usage (vllm-project#2437)

fed9ac8

esmeetu deleted the optimize-graph-memory branch February 3, 2024 04:13

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Minor] Optimize cuda graph memory usage (vllm-project#2437)

f7dc794

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor] Optimize cuda graph memory usage #2437

[Minor] Optimize cuda graph memory usage #2437

esmeetu commented Jan 14, 2024 •

edited

Loading

Yard1 left a comment

[Minor] Optimize cuda graph memory usage #2437

[Minor] Optimize cuda graph memory usage #2437

Conversation

esmeetu commented Jan 14, 2024 • edited Loading

Yard1 left a comment

Choose a reason for hiding this comment

esmeetu commented Jan 14, 2024 •

edited

Loading