Skip to content

cleanup: refactor PodList calls to prepare for making pod metrics staleness configurable #1046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2025

Conversation

nayihz
Copy link
Contributor

@nayihz nayihz commented Jun 23, 2025

fix: #336
changes ref: #336 (comment)

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 23, 2025
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and robscott June 23, 2025 12:33
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 23, 2025
Copy link

netlify bot commented Jun 23, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c677334
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/688c24d6f091430007cfbd0b
😎 Deploy Preview https://deploy-preview-1046--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@nayihz nayihz force-pushed the feat_metric_stale_time branch from 12f8bfe to 2d42a53 Compare June 23, 2025 12:35
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025
@nayihz nayihz force-pushed the feat_metric_stale_time branch from a339897 to 1005486 Compare June 25, 2025 05:27
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 25, 2025
@nayihz nayihz force-pushed the feat_metric_stale_time branch from 1005486 to 9b1e7e2 Compare June 25, 2025 05:28
@nayihz nayihz marked this pull request as ready for review June 25, 2025 09:22
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2025
@k8s-ci-robot k8s-ci-robot requested a review from danehans June 25, 2025 09:22
@nayihz nayihz force-pushed the feat_metric_stale_time branch from 9b1e7e2 to 12943a7 Compare June 25, 2025 09:29
@nayihz
Copy link
Contributor Author

nayihz commented Jun 25, 2025

/cc @liu-cong

@k8s-ci-robot k8s-ci-robot requested a review from liu-cong June 25, 2025 09:48
@nayihz nayihz force-pushed the feat_metric_stale_time branch from 12943a7 to 518655c Compare June 29, 2025 07:11
@nayihz nayihz force-pushed the feat_metric_stale_time branch from 518655c to e54be57 Compare June 29, 2025 13:31
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 30, 2025
@nayihz
Copy link
Contributor Author

nayihz commented Jul 1, 2025

I found that it becomes very inconvenient to write unit tests after updating PodGetAll to PodGetAllWithFreshMetrics. But after reading the code in depth, I still couldn't come up with a good solution. Any ideas on this? @nirrozenbaum @liu-cong
https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1046/files#diff-1b7741fc131b712835ea0040fe1dc86b62403c0b124f0d672ef8bfadb84d32d3R325-R328

https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1046/files#diff-1b7741fc131b712835ea0040fe1dc86b62403c0b124f0d672ef8bfadb84d32d3R353

@nirrozenbaum
Copy link
Contributor

I found that it becomes very inconvenient to write unit tests after updating PodGetAll to PodGetAllWithFreshMetrics. But after reading the code in depth, I still couldn't come up with a good solution. Any ideas on this? @nirrozenbaum @liu-cong https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1046/files#diff-1b7741fc131b712835ea0040fe1dc86b62403c0b124f0d672ef8bfadb84d32d3R325-R328

https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1046/files#diff-1b7741fc131b712835ea0040fe1dc86b62403c0b124f0d672ef8bfadb84d32d3R353

@nayihz I don’t want to nitpick too much, but to be honest I’m not sure why the interface change was required.
we have (today, before this PR) in datastore PodGetAll and PodList(predicate).
couldn’t we implement the “get pod with fresh metrics” with PodList(predicate == function to return only fresh pod)?

@nirrozenbaum
Copy link
Contributor

nirrozenbaum commented Jul 1, 2025

I mean - to leave PodGetAll function as is.. and use the ListPod with that predicate only in the specific places it’s needed. would that help?

@nayihz nayihz force-pushed the feat_metric_stale_time branch 2 times, most recently from 6bde389 to bff4272 Compare July 2, 2025 02:38
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2025
@nayihz
Copy link
Contributor Author

nayihz commented Jul 2, 2025

to leave PodGetAll function as is.. and use the ListPod with that predicate only in the specific places it’s needed.

make sense to me.

@nayihz
Copy link
Contributor Author

nayihz commented Jul 30, 2025

Most of the comments have been addressed. There are still two remaining comments that can be discussed further. @liu-cong

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2025
},
},
storePods: []*corev1.Pod{pod1, pod2, pod3},
want: []*backendmetrics.MetricsState{pod1Metrics, pod2Metrics}, // pod3 metrics were stale and should not be included.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is using a FakePodMetricsClient, you should be able to inject whatever test data you want. I don't think the current test is really testing what you want.

@nayihz nayihz force-pushed the feat_metric_stale_time branch from 48e5285 to 3dfb4d8 Compare July 31, 2025 03:22
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2025
@nayihz
Copy link
Contributor Author

nayihz commented Jul 31, 2025

This PR involves changes to many files, so I kindly ask everyone to move forward as soon as possible. Otherwise, it's easy to encounter branch conflicts, which can sometimes take a lot of time to resolve.

@nayihz nayihz requested a review from liu-cong July 31, 2025 06:25
@liu-cong
Copy link
Contributor

Thank you for your patience @nayihz . I had two comments which are small I believe, I will lgtm once those are addressed. Thanks again!

@nayihz nayihz force-pushed the feat_metric_stale_time branch from 3dfb4d8 to ad0171a Compare August 1, 2025 02:07
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2025
@nayihz
Copy link
Contributor Author

nayihz commented Aug 1, 2025

Oops, it conflicted again...😅

@nayihz nayihz force-pushed the feat_metric_stale_time branch from ad0171a to c677334 Compare August 1, 2025 02:22
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2025
@liu-cong
Copy link
Contributor

liu-cong commented Aug 1, 2025

/lgtm

@nirrozenbaum or @kfswain mind taking another look? This PR became a pure refactor now, and the change of callers to take into consideration of the stale metrics will be in a follow up PR.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2025
@liu-cong
Copy link
Contributor

liu-cong commented Aug 1, 2025

/hold

@nayihz Can you update the PR title to something like 'refactor PodList calls to prepare for making pod metrics staleness configurable"?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2025
@nayihz nayihz changed the title feat: Make metrics stale time configurable cleanup: refactor PodList calls to prepare for making pod metrics staleness configurable Aug 1, 2025
@kfswain
Copy link
Collaborator

kfswain commented Aug 1, 2025

/approve
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nayihz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2025
@k8s-ci-robot k8s-ci-robot merged commit b1b43dc into kubernetes-sigs:main Aug 1, 2025
9 checks passed
@nayihz nayihz deleted the feat_metric_stale_time branch August 2, 2025 07:16
@@ -28,16 +28,22 @@ import (
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/backend"
)

func NewPodMetricsFactory(pmc PodMetricsClient, refreshMetricsInterval time.Duration) *PodMetricsFactory {
var (
AllPodPredicate = func(PodMetrics) bool { return true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? AllPodsPredicate?

@@ -80,6 +81,7 @@ const (
DefaultCertPath = "" // default for --cert-path
DefaultConfigFile = "" // default for --config-file
DefaultConfigText = "" // default for --config-text
DefaultMetricsStalenessThreshold = 2 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liu-cong wasn't this originally 5 seconds (when it was named metricsValidityPeriod)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally there is no threshold enforced (though we highlight metrics older than 5s in logs, never saw that happen though). This PR doesn't enforce the threshold either. But a followup PR is expected to enforce the threshold

@nayihz
Copy link
Contributor Author

nayihz commented Aug 4, 2025

Create an issue to track this. @liu-cong
#1292

/lgtm

@nirrozenbaum or @kfswain mind taking another look? This PR became a pure refactor now, and the change of callers to take into consideration of the stale metrics will be in a follow up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make metrics stale time configurable
6 participants