-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] vLLM Roadmap Q4 2024 #9006
Comments
Support for KV cache compression
|
Do we have plans to support #5540? We are having a production level use case and would really appreciate if someone can look into it for Q4 onwards. |
Hi, do we have any follow-up issue or Slack channel for the "KV cache offload to CPU and disk" task? Our team has previously explored some "KV cache offload" work based on vLLM, and we’d be happy to join any relevant discussion or contribute to the development if there's such chance~ Personally, also looking forward to know more about "More control in prefix caching, and scheduler policies" part😊. |
@simon-mo hi,regarding the topic “KV cache offload to CPU and disk”, I previously implemented a version that stores kv cache in a local file(#8018). Of course, I also did relevant abstractions and can add other media. Is there a slack channel for this? We can discuss the specific scheme. I am also quite interested in this function. |
This page is accessible via roadmap.vllm.ai
Themes.
As before, we categorized our roadmap into 6 broad themes: broad model support, wide hardware coverage, state of the art performance optimization, production level engine, strong OSS community, and extensible architectures. As we are seeing more
Broad Model Support
Help wanted:
Hardware Support
Help wanted:
Performance Optimizations
Help wanted:
Production Features
Help wanted
OSS Community
Help wanted
Extensible Architecture
If any of the items you wanted is not on the roadmap, your suggestion and contribution is still welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Historical Roadmap: #5805, #3861, #2681, #244
The text was updated successfully, but these errors were encountered: