Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

No description provided.

@WoosukKwon WoosukKwon requested review from LiuXiaoxuanPKU and zhuohan123 and removed request for LiuXiaoxuanPKU and zhuohan123 November 14, 2023 20:26
- vLLM’s mission is to build the fastest and easiest-to-use open-source LLM inference and serving engine. It is Apache 2.0 and community-owned with broad model and optimization support.

- vLLM matches DeepSpeed's speed in common scenarios and surpasses it when handling longer outputs.
- DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs, due to its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs, due to its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap.
- DeepSpeed only outperforms vLLM in scenarios with long prompts and short outputs with its Dynamic SplitFuse optimization. This optimization is on vLLM’s roadmap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current one is a bit better since it only has 1 with?

In our blog today, we'll elucidate the specific scenarios where the Dynamic SplitFuse technique is advantageous, noting that these cases are relatively limited.
For the majority of workloads, vLLM is faster than (or performs comparably to) DeepSpeed MII.

In this post, we will discuss the difference between the two systems, share our benchmarks, and discuss future steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant. In the previous sentence we already said "In this blog, ..."

WoosukKwon and others added 6 commits November 14, 2023 12:44
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@WoosukKwon WoosukKwon merged commit 600dace into main Nov 14, 2023
@WoosukKwon WoosukKwon deleted the woosuk branch November 14, 2023 21:50
simon-mo pushed a commit that referenced this pull request Jan 27, 2025
Update 2025-01-12-intro-to-llama-stack-with-vllm.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants