-
Notifications
You must be signed in to change notification settings - Fork 43
Polish DeepSpeed blog post #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| - vLLM’s mission is to build the fastest and easiest-to-use open-source LLM inference and serving engine. It is Apache 2.0 and community-owned with broad model and optimization support. | ||
|
|
||
| - vLLM matches DeepSpeed's speed in common scenarios and surpasses it when handling longer outputs. | ||
| - DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs, due to its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs, due to its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap. | |
| - DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs with its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current one is a bit better since it only has 1 with?
| In our blog today, we'll elucidate the specific scenarios where the Dynamic SplitFuse technique is advantageous, noting that these cases are relatively limited. | ||
| For the majority of workloads, vLLM is faster than (or performs comparably to) DeepSpeed MII. | ||
|
|
||
| In this post, we will discuss the difference between the two systems, share our benchmarks, and discuss future steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant. In the previous sentence we already said "In this blog, ..."
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
zhuohan123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix!
Update 2025-01-12-intro-to-llama-stack-with-vllm.md
No description provided.