Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics #3937

Merged
merged 18 commits into from
May 4, 2024

Conversation

DearPlanet
Copy link
Contributor

@DearPlanet DearPlanet commented Apr 9, 2024

Description

When deploying a local model, such as one located at a path like /mnt/Qwen1.5-14B-chat/, the Prometheus metrics output exposes this model path information.

The metrics output looks like this:

...
# HELP vllm:num_requests_running Number of requests currently running on GPU.
# TYPE vllm:num_requests_running gauge
vllm:num_requests_running{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:num_requests_swapped Number of requests swapped to CPU.
# TYPE vllm:num_requests_swapped gauge
vllm:num_requests_swapped{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:num_requests_waiting Number of requests waiting to be processed.
# TYPE vllm:num_requests_waiting gauge
vllm:num_requests_waiting{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:gpu_cache_usage_perc GPU KV-cache usage. 1 means 100 percent usage.
# TYPE vllm:gpu_cache_usage_perc gauge
vllm:gpu_cache_usage_perc{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:cpu_cache_usage_perc CPU KV-cache usage. 1 means 100 percent usage.
# TYPE vllm:cpu_cache_usage_perc gauge
vllm:cpu_cache_usage_perc{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:prompt_tokens_total Number of prefill tokens processed.
# TYPE vllm:prompt_tokens_total counter
vllm:prompt_tokens_total{model_name="/mnt/Qwen1.5-14B-Chat/"} 0.0
# HELP vllm:generation_tokens_total Number of generation tokens processed.
...

This raises two issues:

  • The local deployment path of the model is inadvertently revealed, this is a high risk;
  • The model_name in metrics does not correspond to the model name provided in ModelCard by openai.api_server.

This PR solves the problem, by passing the served_model_name parameter into ModelConfig and parsing it in the LLMEngine. For scenarios not using openai.apiserver, served_model_name can also be passed to AsyncEngineArgs/EngineArgs. And when served_model_name parameter is not specified, the Prometheus metrics output will behave as before (although I still do not recommend this).

@DearPlanet DearPlanet changed the title [Bugfix] Fix model_name tag in Prometheus metrics [Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics Apr 9, 2024
@njhill njhill self-assigned this Apr 9, 2024
attr: getattr(args, attr)
for attr in attrs if hasattr(args, attr)
})
if hasattr(args, "served_model_name"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to

if engine_args.served_model_name is None:
     engine_args.served_model_name = args.model

The above dict comprehension would have already set engine_args.served_model_name if served_model_name was an attribute on args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion!

I adopted this approach as a means to adapt to vllm.entrypoint.apiserver. For now, served_model_name only appears in vllm.entrypoint.openai.apiserver and currently has no description, when using vllm.entrypoint.apiserver, served_model_name is not an attribute of args.

Perhaps we could consider to complete served_model_name parameter in vllm.entrypoint.apiserver as another approach? If so, the implementation here could be simplified as suggested. And this change will not affect anyone using AsyncEngineArgs/EngineArgs directly.

I believe served_model_name is useful for all scenarios, if it can not be set, like in vllm.entrypoint.apiserver, sometimes a local path will be treated as the model's label, this doesn't appear to be the right way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DearPlanet vllm.entrypoint.apiserver is meant as more of a toy example and not being actively improved.

That said I agree if we are using this name for internal purposes it makes more sense to move it into EngineArgs, would you be ok making that change? And could augment its existing description with a mention that the first name will also be used in the metrics output. This should also address some of the latest CI test failures.

And then maybe best for the check to be if not engine_args.served_model_name: just in case it's an empty list (in theory it shouldn't be).

Copy link
Collaborator

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DearPlanet thank you very much for the contribution!

Would you also be willing to add a unit test for this, hopefully it should be straightforward to extend the existing test_metrics.py.

attr: getattr(args, attr)
for attr in attrs if hasattr(args, attr)
})
if hasattr(args, "served_model_name"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DearPlanet vllm.entrypoint.apiserver is meant as more of a toy example and not being actively improved.

That said I agree if we are using this name for internal purposes it makes more sense to move it into EngineArgs, would you be ok making that change? And could augment its existing description with a mention that the first name will also be used in the metrics output. This should also address some of the latest CI test failures.

And then maybe best for the check to be if not engine_args.served_model_name: just in case it's an empty list (in theory it shouldn't be).

for attr in attrs if hasattr(args, attr)
})
if hasattr(args, "served_model_name"):
engine_args.served_model_name = args.served_model_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a multi-valued arg and will be a list

Suggested change
engine_args.served_model_name = args.served_model_name
engine_args.served_model_name = args.served_model_name[0]

@DearPlanet
Copy link
Contributor Author

Thanks for suggestions @njhill!
I agree moving served_model_name to EngineArgs is a more elegant way and has less impact on the other parts. In addition, this parameter should support either list or str.
The latest update contains:

  • Move served_model_name to EngineArgs
  • Support served_model_name as type list or str
  • Add descriptions for this parameter
  • Add test case in test_metrics.py

@DearPlanet DearPlanet requested a review from njhill April 28, 2024 04:38
Copy link
Collaborator

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DearPlanet, just have couple of minor suggestions

vllm/config.py Outdated Show resolved Hide resolved
vllm/config.py Outdated Show resolved Hide resolved
@DearPlanet
Copy link
Contributor Author

Thank you again for the suggestions @njhill !😊

@DearPlanet DearPlanet requested a review from njhill May 4, 2024 12:43
@njhill njhill merged commit 4302987 into vllm-project:main May 4, 2024
59 checks passed
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants