-
-
Notifications
You must be signed in to change notification settings - Fork 7k
[Roadmap] vLLM Roadmap Q2 2025 #15735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
great! Thanks for the work. And here is the Q2 roadmap of vllm-ascend vllm-project/vllm-ascend#448 following up. Could you please add the link to Hardware or Ecosystem section? Thanks! |
For the v1 you should consider also the security side of it. I guess a lot of people are using vllm via the docker images, which partially are using 20.04, some 22.04. |
Hi! When switching to a new engine, I am very interested in how things will be with AMD ROCM support and in particular Navi 3 (rdna 3). I have been waiting for a bug fix for the codestral-mamba model for almost two months. And the model itself was released a long time ago in 2024. But it seems that no one is fixing the bug that was introduced. |
It would be great to see fp8 support for sm120 (Blackwell devices) now that Cutlass has added support for sm120, sm120a as of V3.9. This would mean that Blackwell users can best take advantage of native int4 and int8 support for extra speed. Currently there is only support for sm100 and prior. |
Does "Redesigned spec decode" mean redesigning the implementation of v0? What are the shortcomings of v0's implementation? |
【Further reduce scheduler overhead】, we tested v1 and found that the effect was quite good. Where else can optimizations be made to the scheduler? |
【API Server Scale-out】 I don't understand, can you further explain it? |
It is quite cool. |
This page is accessible via roadmap.vllm.ai
This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack
Core Themes
Path to vLLM v1.0.0
We want to fully remove the V0 engine and clean up the codebase for unpopular and unsupported features. The v1.0.0 version of vLLM will be performant and easy to maintain, as well as modular and extensible, with backward compatibility.
Cluster Scale Serving
As the model expands in size, serving them in multi-node scale-out and disaggregating prefill and decode becomes the way to go. We are fully committed to making vLLM the best engine for cluster scale serving.
vLLM for Production
vLLM is designed for production. We will continue to enhance stability and tune the systems around vLLM for optimal performance.
Features
Models
Use Case
Hardware
Optimizations
Community
vLLM Ecosystem
Hardware Plugins
AIBrix: v0.3.0 roadmap aibrix#698
Production Stack: [Roadmap] vLLM Production Stack roadmap for 2025 Q2 production-stack#300
Ray LLM: [llm] Roadmap for Data and Serve LLM APIs ray-project/ray#51313
LLM Compressor
GuideLLM
Dynamo
Prioritized Support for RLHF Systems: veRL, OpenRLHF, TRL, OpenInstruct, Fairseq2, ...
If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Historical Roadmap: #11862, #9006, #5805, #3861, #2681, #244
The text was updated successfully, but these errors were encountered: