Skip to content

Add EPP operational metrics #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
liu-cong opened this issue Apr 14, 2025 · 6 comments
Open

Add EPP operational metrics #690

liu-cong opened this issue Apr 14, 2025 · 6 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@liu-cong
Copy link
Contributor

liu-cong commented Apr 14, 2025

What would you like to be added:

I would like to add the following list of metrics:

[ ] Overall latency for EPP to receive a request and make a decision on the target pod. Why? EPP provides benefits with the cost of added latency. This is a generally useful metric to quantify the cost, as well as detect any regressions in EPP.
[ ] Overall latency for EPP to process the response. Why? Same as above.
[ ] Latency/error count when EPP scrapes model server metrics Why? The freshness of metrics is vital to the effectiveness of the EPP algorithm. These metrics help detect issues with metric scraping due to regressions of EPP/model server or transient failures.
[ ] Age of caches (the model server metrics, etc.) Why? This is the observed cache freshness. It can be a useful metric to alert on if the cache age exceeds certain limit.
[ ] Overall latency of each component (flow controller, scheduler) . Why? These component level metrics provide latency breakdown of the overall EPP latency, and help with identify bottlenecks, source of regressions, etc.
[ ] Future: Prefix cache, queuing metrics (latency, error, size, and any other custom metrics) Why? These are important operational metrics to monitor. For instance, if the queue size is hitting the limit, it's a signal to perhaps increase the size limit.

Why is this needed:

@JeffLuoo
Copy link
Contributor

Can you please add "Why is this needed:" for each proposed metrics?

@kfswain
Copy link
Collaborator

kfswain commented Apr 21, 2025

/assign liu-cong

To address the above comment

@liu-cong
Copy link
Contributor Author

@JeffLuoo Added "whys", please take a look.

@liu-cong
Copy link
Contributor Author

/unassign

@kfswain kfswain added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 24, 2025
@JeffLuoo
Copy link
Contributor

Can you clarify on:

  1. What is the difference between
Overall latency of each component (flow controller, scheduler) https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/581. 

and

Overall latency for EPP to receive a request and make a decision on the target pod

#581 mentions about the scheduling latency but the first metric is also about the scheduling latency.

  1. For
Overall latency for EPP to process the response.

Do you mean how long it takes for EPP to process the response header and then the response body?

@liu-cong
Copy link
Contributor Author

Sorry for the confusion.

For 1, #581 captures the overall latency for the scheduler component, it's a sub task of this issue.

For 2, the latency between EPP receives the response from the backend to EPP returns the response back to the LB. The intent behind this is to understand the overall latency cost added by EPP in the request/response paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants