Add EPP operational metrics #690

liu-cong · 2025-04-14T22:14:20Z

What would you like to be added:

I would like to add the following list of metrics:

[ ] Overall latency for EPP to receive a request and make a decision on the target pod. Why? EPP provides benefits with the cost of added latency. This is a generally useful metric to quantify the cost, as well as detect any regressions in EPP.
[ ] Overall latency for EPP to process the response. Why? Same as above.
[ ] Latency/error count when EPP scrapes model server metrics Why? The freshness of metrics is vital to the effectiveness of the EPP algorithm. These metrics help detect issues with metric scraping due to regressions of EPP/model server or transient failures.
[ ] Age of caches (the model server metrics, etc.) Why? This is the observed cache freshness. It can be a useful metric to alert on if the cache age exceeds certain limit.
[ ] Overall latency of each component (flow controller, scheduler) . Why? These component level metrics provide latency breakdown of the overall EPP latency, and help with identify bottlenecks, source of regressions, etc.
[ ] Future: Prefix cache, queuing metrics (latency, error, size, and any other custom metrics) Why? These are important operational metrics to monitor. For instance, if the queue size is hitting the limit, it's a signal to perhaps increase the size limit.

Why is this needed:

JeffLuoo · 2025-04-17T15:01:42Z

Can you please add "Why is this needed:" for each proposed metrics?

kfswain · 2025-04-21T22:34:31Z

/assign liu-cong

To address the above comment

liu-cong · 2025-04-24T19:05:10Z

@JeffLuoo Added "whys", please take a look.

liu-cong · 2025-04-24T19:05:20Z

/unassign

JeffLuoo · 2025-04-25T13:50:38Z

Can you clarify on:

What is the difference between

Overall latency of each component (flow controller, scheduler) https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/581.

and

Overall latency for EPP to receive a request and make a decision on the target pod

#581 mentions about the scheduling latency but the first metric is also about the scheduling latency.

For

Overall latency for EPP to process the response.

Do you mean how long it takes for EPP to process the response header and then the response body?

liu-cong · 2025-04-25T18:08:08Z

Sorry for the confusion.

For 1, #581 captures the overall latency for the scheduler component, it's a sub task of this issue.

For 2, the latency between EPP receives the response from the backend to EPP returns the response back to the LB. The intent behind this is to understand the overall latency cost added by EPP in the request/response paths.

k8s-ci-robot assigned liu-cong Apr 21, 2025

k8s-ci-robot unassigned liu-cong Apr 24, 2025

kfswain added the needs-triage label Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EPP operational metrics #690

Add EPP operational metrics #690

liu-cong commented Apr 14, 2025 •

edited

Loading

JeffLuoo commented Apr 17, 2025

kfswain commented Apr 21, 2025 •

edited

Loading

liu-cong commented Apr 24, 2025

liu-cong commented Apr 24, 2025

JeffLuoo commented Apr 25, 2025

liu-cong commented Apr 25, 2025

Add EPP operational metrics #690

Add EPP operational metrics #690

Comments

liu-cong commented Apr 14, 2025 • edited Loading

JeffLuoo commented Apr 17, 2025

kfswain commented Apr 21, 2025 • edited Loading

liu-cong commented Apr 24, 2025

liu-cong commented Apr 24, 2025

JeffLuoo commented Apr 25, 2025

liu-cong commented Apr 25, 2025

liu-cong commented Apr 14, 2025 •

edited

Loading

kfswain commented Apr 21, 2025 •

edited

Loading