-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] vLLM Roadmap Q3 2024 #5805
Comments
Does vLLM need the multi-model support similar like what FastChat does or something else? |
#2809 hello,how about this? |
Hi, the issues were mentioned in #5036 and should be taken into account. |
Will vLLM use Triton more to optimize operators' performance in future, or will it consider using the torch.compile mechanism more? And are there any plans for this? |
Hi! Is there or will there be support for the OpenAI Batch API ? |
I am doing for Whisper, my fork at https://github.com/mesolitica/vllm-whisper, the frontend later should compatible with OpenAI API plus able to stream output tokens, few hiccups, still trying to figure out based on T5 branch,
|
Able to load and infer, https://github.com/mesolitica/vllm-whisper/blob/main/examples/whisper_example.py, but the output is still trash, might be bugs related to weights or the attention, still debugging |
Anything you want to discuss about vllm.
This document includes the features in vLLM's roadmap for Q3 2024. Please feel free to discuss and contribute, as this roadmap is shaped by the vLLM community.
Themes.
As before, we categorized our roadmap into 6 broad themes:
Broad Model Support
Help wanted:
Hardware Support
Performance Optimizations
Production Features
Help wanted
OSS Community
Help wanted
Extensible Architecture
If any of the item you wanted is not on the roadmap, your suggestion and contribution is still welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
The text was updated successfully, but these errors were encountered: