-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet: Add a metrics in kubelet to track how long it takes for pod to fully start #124892
Comments
cc: @ruiwen-zhao for review. |
/sig instrumentation |
/sig node |
Just to bring up previous discussion around metric cardinality, adding both pod name and node name to metric labels might be too much cardinality. We need to come up with a way to address this. |
Thank you Ruiwen, for the cardinality issue I have some comments on it:
Kubenetes has another metric
|
This is very different from the existing implementation of |
@yujuhong Yes. The @dashpole Hi David, could you please provide some insights here? Thanks! |
A few questions to get the discussion started:
Bikeshedding: From the names, |
@dashpole Hi David thank you for the comment.
I want to use a gauge because I want to record the exact startup time of the pod, and it will allow users to know the exact time it takes for their pods to become ready to serve. With the pod-level metric, users could also group them together under the workload (e.g. deployment).
I use kubelet as kubelet will track the status of each pod in pod_startup_latency_tracker, and kubelet will watch for the status change of each pod. Also, kubelet is usually the first layer to process the pod status and it's a stable component (compared to other components in the cluster like kube-state-metrics which I usually see out-of-memory issue..) Do you have any recommendation for other places to add such metric?
For "Most pod-level metrics exist for the lifetime of the pod, but doing that would mean any aggregation would be less meaningful", can you provide more context here to help me understand? Thanks!
|
Would something like kube-state-metrics more suitable for this? |
/assign @JeffLuoo |
What would you like to be added?
Add a new metrics to record the end-to-end startup latency of the pod since pod created to pod ready for the first time. The metrics will include all stages of the pod life cycle like scheduling and image pulling.
Metrics Name:
kubelet_pod_first_ready_latency_seconds {namespace=<namespace_name>, pod=<pod_name>, uid=<uid>, node=<node_name>}
Metrics Type: Gauge
Metrics Unit: Seconds
The metric exists for the lifetime of the pod.
Why is this needed?
Kubelet currently reports a Histogram metric
pod_start_total_duration_seconds
that gives users overview of the pod end-to-end startup latency from pod creation to pod running. However, pod ready will usually be the signal to say that a pod is ready to serve traffic.Having the new metric will allow users to track how long it takes for their pods under the workload to fully start and ready to serve traffic, and with the metrics label of
node_name
, this metric can be a supplementation to the existing metricpod_start_total_duration_seconds
if users want to track the node-level pod end-to-end startup latency from creation to ready.Also, user could aggregate the metric by the workload (Deployment, StatefulSet, and etc.) to present the workload-level pod end-to-end startup latency.
The text was updated successfully, but these errors were encountered: