-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow model to be served under multiple names #2894
Allow model to be served under multiple names #2894
Conversation
This meanings that you can have more specific model names without requiring users to update their configs whenever you change something that warrants a model name change. If you passed `--served-model-name gpt-4-0613 gpt-4`, then your users could make requests to either `gpt-4` or `gpt-4-0613`. The `model` field of any responses will contain the first model name`gpt-4-0613` in this case, so that a user using `gpt-4` knows which version of the model answered their request. OpenAI calls this [Continuous model upgrades](https://platform.openai.com/docs/models/continuous-model-upgrades).
Hi, @hmellor Thanks for proposing this idea. But i have a different opinion. |
I agree this is out of scope for vLLM model server. It is the responsibility of a router. Is there concrete models that you have in mind that needs this or custom model? |
My use case was that I wanted to be able to change models without my users needing to change their configs. For example if I were to use You're right that this could (and likely should) be done by the router. It just seemed like a small change and a nice QoL feature. But, if it's out of scope I won't fight to get it in. |
I understand the use case. I'm fine with multiple names. But
|
The first served model name is returned because it is meant to be the most specific. If a user queries
Would you prefer it if for the lora models we return all combinations of |
Ah this make sense.
@Yard1 wdyt? |
I think it makes sense to leave it as is. If
It makes sense for the |
I've changed the So a user calling |
@simon-mo Strong support for merging this to make model upgrades easier in IaC scenarios. |
Ok let's get this PR in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hmellor, I'm also in favour of this change.
Do these changes allow configuring such that the model name isn't required at all in the API (as requested in #1478)? Since the server currently will only ever serve one "base" model, there's technically no need to include this API field, unless you're using a lora adapter.
No, it doesn't. A PR that removes the need for model name in requests was already rejected in #1541. |
If |
Potentially actually, @samos123 would this solve your issue? |
sorry about the delay, merged. |
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
This doesn't really solve my issue, I would simply want the check to be removed as an option. I don't want vLLM to handle this and prefer to let it accept any request it takes since in my use case vLLM only ever serves 1 model. |
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
Is this feature available in any vllm version? I failed on vllm 0.4.1 with
|
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
This means that you can have more specific model names without requiring users to update their configs whenever you change something that warrants a model name change.
If you passed
--served-model-name gpt-4-0613 gpt-4
, then your users could make requests to eithergpt-4
orgpt-4-0613
. Themodel
field of any responses will contain the first model name,gpt-4-0613
in this case, so that a user usinggpt-4
knows which version of the model answered their request.OpenAI calls this Continuous model upgrades.