Status | |
---|---|
Stability | beta: metrics |
Distributions | contrib, k8s |
Issues | |
Code Owners | @dmitryax, @TylerHelmuth, @ChrsMark |
The Kubelet Stats Receiver pulls node, pod, container, and volume metrics from the API server on a kubelet and sends it down the metric pipeline for further processing.
Details about the metrics produced by this receiver can be found in metadata.yaml with further documentation in documentation.md
A kubelet runs on a kubernetes node and has an API server to which this receiver connects. To configure this receiver, you have to tell it how to connect and authenticate to the API server and how often to collect data and send it to the next consumer.
Kubelet Stats Receiver supports both secure Kubelet endpoint exposed at port 10250 by default and read-only
Kubelet endpoint exposed at port 10255. If auth_type
set to none
, the read-only endpoint will be used. The secure
endpoint will be used if auth_type
set to any of the following values:
tls
tells the receiver to use TLS for auth and requires that the fieldsca_file
,key_file
, andcert_file
also be set.serviceAccount
tells this receiver to use the default service account token to authenticate to the kubelet API along with the default certificate which is signed by the cluster's root CA cert:/var/run/secrets/kubernetes.io/serviceaccount/token
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kubeConfig
tells this receiver to use the kubeconfig file (KUBECONFIG env variable or ~/.kube/config) to authenticate and use API server proxy to access the kubelet API.initial_delay
(default =1s
): defines how long this receiver waits before starting.
receivers:
kubeletstats:
collection_interval: 20s
initial_delay: 1s
auth_type: "tls"
ca_file: "/path/to/ca.crt"
key_file: "/path/to/apiserver.key"
cert_file: "/path/to/apiserver.crt"
endpoint: "https://192.168.64.1:10250"
insecure_skip_verify: true
exporters:
file:
path: "fileexporter.txt"
service:
pipelines:
metrics:
receivers: [kubeletstats]
exporters: [file]
Although it's possible to use kubernetes' hostNetwork feature to talk to the kubelet api from a pod, the preferred approach is to use the downward API.
Make sure the pod spec sets the node name as follows:
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Then the otel config can reference the K8S_NODE_NAME
environment variable:
receivers:
kubeletstats:
collection_interval: 20s
auth_type: "serviceAccount"
endpoint: "https://${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
exporters:
file:
path: "fileexporter.txt"
service:
pipelines:
metrics:
receivers: [kubeletstats]
exporters: [file]
Note: a missing or empty endpoint
will cause the hostname on which the
collector is running to be used as the endpoint. If the hostNetwork flag is
set, and the collector is running in a pod, this hostname will resolve to the
node's network namespace.
The following config can be used to collect Kubelet metrics from read-only endpoint:
receivers:
kubeletstats:
collection_interval: 20s
auth_type: "none"
endpoint: "http://${env:K8S_NODE_NAME}:10255"
exporters:
file:
path: "fileexporter.txt"
service:
pipelines:
metrics:
receivers: [kubeletstats]
exporters: [file]
The following config can be used to collect Kubelet metrics from read-only endpoint, proxied by the API server:
receivers:
kubeletstats:
collection_interval: 20s
auth_type: "kubeConfig"
context: "my-context"
insecure_skip_verify: true
endpoint: "${env:K8S_NODE_NAME}"
exporters:
file:
path: "fileexporter.txt"
service:
pipelines:
metrics:
receivers: [kubeletstats]
exporters: [file]
Note that using auth_type
kubeConfig
, the endpoint should only be the node name as the communication to the kubelet is proxied by the API server configured in the kubeConfig
.
insecure_skip_verify
still applies by overriding the kubeConfig
settings.
If no context
is specified, the current context or the default context is used.
By default, all produced metrics get resource labels based on what kubelet /stats/summary endpoint provides. For some use cases it might be not enough. So it's possible to leverage other endpoints to fetch additional metadata entities and set them as extra labels on metric resource. Currently supported metadata include the following:
container.id
- to augment metrics with Container ID label obtained from container statuses exposed via/pods
.k8s.volume.type
- to collect volume type from the Pod spec exposed via/pods
and have it as a label on volume metrics. If there's more information available from the endpoint than just volume type, those are synced as well depending on the available fields and the type of volume. For example,aws.volume.id
would be synced fromawsElasticBlockStore
andgcp.pd.name
is synced forgcePersistentDisk
.
If you want to have container.id
label added to your metrics, use extra_metadata_labels
field to enable
it, for example:
receivers:
kubeletstats:
collection_interval: 10s
auth_type: "serviceAccount"
endpoint: "${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
extra_metadata_labels:
- container.id
If extra_metadata_labels
is not set, no additional API calls is done to fetch extra metadata.
When dealing with Persistent Volume Claims, it is possible to optionally sync metdadata from the underlying storage resource rather than just the volume claim. This is achieved by talking to the Kubernetes API. Below is an example, configuration to achieve this.
receivers:
kubeletstats:
collection_interval: 10s
auth_type: "serviceAccount"
endpoint: "${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
extra_metadata_labels:
- k8s.volume.type
k8s_api_config:
auth_type: serviceAccount
If k8s_api_config
set, the receiver will attempt to collect metadata from underlying storage resources for
Persistent Volume Claims. For example, if a Pod is using a PVC backed by an EBS instance on AWS, the receiver
would set the k8s.volume.type
label to be awsElasticBlockStore
rather than persistentVolumeClaim
.
A list of metric groups from which metrics should be collected. By default, metrics from containers,
pods and nodes will be collected. If metric_groups
is set, only metrics from the listed groups
will be collected. Valid groups are container
, pod
, node
and volume
. For example, if you're
looking to collect only node
and pod
metrics from the receiver use the following configuration.
receivers:
kubeletstats:
collection_interval: 10s
auth_type: "serviceAccount"
endpoint: "${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
metric_groups:
- node
- pod
In order to calculate the k8s.container.cpu.node.utilization
, k8s.pod.cpu.node.utilization
,
k8s.container.memory.node.utilization
and k8s.pod.memory.node.utilization
metrics, the
information of the node's capacity must be retrieved from the k8s API. In this, the k8s_api_config
needs to be set.
In addition, the node name must be identified properly. The K8S_NODE_NAME
env var can be set using the
downward API inside the collector pod spec as follows:
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Then set node
value to ${env:K8S_NODE_NAME}
in the receiver's configuration:
receivers:
kubeletstats:
collection_interval: 10s
auth_type: 'serviceAccount'
endpoint: '${env:K8S_NODE_NAME}:10250'
node: '${env:K8S_NODE_NAME}'
k8s_api_config:
auth_type: serviceAccount
metrics:
k8s.container.cpu.node.utilization:
enabled: true
k8s.pod.cpu.node.utilization:
enabled: true
k8s.container.memory.node.utilization:
enabled: true
k8s.pod.memory.node.utilization:
enabled: true
The following parameters can also be specified:
collection_interval
(default =10s
): The interval at which to collect data.insecure_skip_verify
(default =false
): Whether or not to skip certificate verification.
The full list of settings exposed for this receiver are documented in config.go with detailed sample configurations in testdata/config.yaml.
The Kubelet Stats Receiver needs get
permissions on the nodes/stats
resources. Additionally, when using extra_metadata_labels
or any of the {request|limit}_utilization
metrics the processor also needs get
permissions for nodes/proxy
resources.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups: [""]
resources: ["nodes/stats"]
verbs: ["get"]
# Only needed if you are using extra_metadata_labels or
# are collecting the request/limit utilization metrics
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
The following metrics will be renamed in a future version:
k8s.node.cpu.utilization
(renamed tok8s.node.cpu.usage
)k8s.pod.cpu.utilization
(renamed tok8s.pod.cpu.usage
)container.cpu.utilization
(renamed tocontainer.cpu.usage
)
The above metrics show usage counted in CPUs and it's not a percentage of used resources. These metrics were previously incorrectly named using the utilization term.
- alpha: when enabled it makes the
.cpu.usage
metrics enabled by default, disabling the.cpu.utilization
metrics - beta:
.cpu.usage
metrics are enabled by default and any configuration enabling the deprecated.cpu.utilization
metrics will be failing. Explicitly disabling the feature gate provides the old (deprecated) behavior. - stable:
.cpu.usage
metrics are enabled by default and the deprecated metrics are completely removed. - removed three releases after stable.
More information about the deprecation plan and the background reasoning can be found at #27885.