-
-
Notifications
You must be signed in to change notification settings - Fork 104
Closed
Labels
bugSomething isn't workingSomething isn't workingenhancementEnhancements for current featuresEnhancements for current featureshelp-wantedAll issues where people can contribute to the projectAll issues where people can contribute to the project
Description
Report
After deploying the resource discovery agent the cpu usages raises to 3000+ milliCPU and it stays there
I'm unsure where the cause of the high cpu usage is. I tried to reduce the resourceDiscoveryGroups but that didn't help much.
for now I added limits, so that it doesn't effect other pods that much
- resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 100m
memory: 128MiExpected Behavior
Less CPU usage :)
Actual Behavior
Resource Discovery Agent uses more then 3 cores of an Azure Kubernetes Node
Steps to Reproduce the Problem
- Deploy the configuration below
- After a few seconds the cpu usage raises and stays high
Component
Resource Discovery
Version
0.7.3
Configuration
Configuration:
---
repositories:
- name: promitor
url: https://charts.promitor.io/
- name: bedag
url: https://bedag.github.io/helm-charts
releases:
- name: promitor-scraper
namespace: promitor
chart: promitor/promitor-agent-scraper
values:
- azureAuthentication:
mode: UserAssignedManagedIdentity
identity:
id: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_scraper_client_id") }}
binding: promitor-scraper-identity
- azureMetadata:
tenantId: {{ requiredEnv "ARM_TENANT_ID" }}
subscriptionId: {{ requiredEnv "ARM_SUBSCRIPTION_ID" }}
resourceGroupName: REDACTED
- resourceDiscovery:
enabled: true
host: promitor-agent-resource-discovery
port: 8889
- metricSinks:
prometheusScrapingEndpoint:
enabled: true
serviceMonitor:
enabled : true
namespace: promitor
enableServiceDiscovery: true
- metrics:
- name: azure_container_registry_agent_pool_cpu_time
description: "AgentPool CPU Time in seconds"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: AgentPoolCPUTime
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_run_duration
description: "Run Duration in milliseconds"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: RunDuration
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_storage_used
description: "The amount of storage used by the container registry. \
For a registry account, it's the sum of capacity used by all the repositories within a registry. \
It's sum of capacity used by shared layers, manifest files, and replica copies in each of its repositories."
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: StorageUsed
aggregation:
type: Average
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_successful_pull_count
description: "Number of successful image pulls"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: SuccessfulPullCount
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_successful_push_count
description: "Number of successful image pushes"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: SuccessfulPushCount
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_total_pull_count
description: "Number of image pulls in total"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: TotalPullCount
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_container_registry_total_push_count
description: "Number of image pushes in total"
resourceType: ContainerRegistry
azureMetricConfiguration:
metricName: TotalPushCount
aggregation:
type: Total
resourceDiscoveryGroups:
- name: container-registry-landscape
- name: azure_key_vault_availability
description: "Vault requests availability"
resourceType: KeyVault
azureMetricConfiguration:
metricName: Availability
aggregation:
type: Average
resources:
- vaultName: akvzvoovesaas{{ requiredEnv "ENVIRONMENT" }}
- name: azure_key_vault_saturation_shoebox
description: "Vault capacity used"
resourceType: KeyVault
azureMetricConfiguration:
metricName: SaturationShoebox
aggregation:
type: Average
resources:
- vaultName: akvzvoovesaas{{ requiredEnv "ENVIRONMENT" }}
- name: azure_key_vault_api_hit_count
description: "Number of total service api hits"
resourceType: KeyVault
azureMetricConfiguration:
metricName: ServiceApiHit
aggregation:
type: Count
resources:
- vaultName: akvzvoovesaas{{ requiredEnv "ENVIRONMENT" }}
- name: azure_key_vault_api_latency
description: "Overall latency of service api requests"
resourceType: KeyVault
azureMetricConfiguration:
metricName: ServiceApiLatency
aggregation:
type: Average
resourceDiscoveryGroups:
- name: key-vault-landscape
- name: azure_key_vault_api_result_count
description: "Number of total service api results"
resourceType: KeyVault
azureMetricConfiguration:
metricName: ServiceApiResult
aggregation:
type: Count
resourceDiscoveryGroups:
- name: key-vault-landscape
- name: cluster_autoscaler_cluster_safe_to_autoscale
description: "Determines whether or not cluster autoscaler will take action on the cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: cluster_autoscaler_cluster_safe_to_autoscale
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: cluster_autoscaler_scale_down_in_cooldown
description: "Determines if the scale down is in cooldown - No nodes will be removed during this timeframe"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: cluster_autoscaler_scale_down_in_cooldown
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: cluster_autoscaler_unneeded_nodes_count
description: "Cluster auotscaler marks those nodes as candidates for deletion and are eventually deleted"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: cluster_autoscaler_unneeded_nodes_count
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: cluster_autoscaler_unschedulable_pods_count
description: "Number of pods that are currently unschedulable in the cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: cluster_autoscaler_unschedulable_pods_count
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: kube_node_status_allocatable_cpu_cores
description: "Total number of available cpu cores in a managed cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: kube_node_status_allocatable_cpu_cores
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: kube_node_status_allocatable_memory_bytes
description: "Total amount of available memory in a managed cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: kube_node_status_allocatable_memory_bytes
aggregation:
type: Average
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_cpu_usage_millicores
description: "Aggregated measurement of CPU utilization in millicores across the cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_cpu_usage_millicores
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_cpu_usage_percentage
description: "Aggregated average CPU utilization measured in percentage across the cluster"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_cpu_usage_percentage
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_disk_usage_bytes
description: "Disk space used in bytes by device"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_disk_usage_bytes
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_disk_usage_percentage
description: "Disk space used in percent by device"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_disk_usage_percentage
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_memory_rss_bytes
description: "Container RSS memory used in bytes"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_memory_rss_bytes
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_memory_rss_percentage
description: "Container RSS memory used in percent"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_memory_rss_percentage
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_memory_working_set_bytes
description: "Container working set memory used in bytes"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_memory_working_set_bytes
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_memory_working_set_percentage
description: "Container working set memory used in percent"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_memory_working_set_percentage
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_network_in_bytes
description: "Network received bytes"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_network_in_bytes
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: node_network_out_bytes
description: "Network transmitted bytes"
resourceType: KubernetesService
azureMetricConfiguration:
metricName: node_network_out_bytes
aggregation:
type: Average
dimension:
name: node
resourceDiscoveryGroups:
- name: azure-kubernetes-service
- name: azure_storage_account_availability
description: "The percentage of availability for the storage service or the specified API operation. \
Availability is calculated by taking the TotalBillableRequests value and dividing it by the number of applicable requests, \
including those that produced unexpected errors. \
All unexpected errors result in reduced availability for the storage service or the specified API operation."
resourceType: NetworkInterface
azureMetricConfiguration:
metricName: Availability
aggregation:
type: Average
dimension:
name: ApiName
resourceDiscoveryGroups:
- name: storage-account-landscape
- name: azure_storage_account_bytes_sent_rate
description: "The amount of egress data, in bytes. \
This number includes egress from an external client into Azure Storage as well as egress within Azure. \
As a result, this number does not reflect billable egress."
resourceType: NetworkInterface
azureMetricConfiguration:
metricName: Egress
aggregation:
type: Total
dimension:
name: ApiName
resourceDiscoveryGroups:
- name: storage-account-landscape
- name: azure_storage_account_bytes_received_rate
description: "The amount of ingress data, in bytes. \
This number includes ingress from an external client into Azure Storage as well as ingress within Azure."
resourceType: NetworkInterface
azureMetricConfiguration:
metricName: Ingress
aggregation:
type: Total
dimension:
name: ApiName
resourceDiscoveryGroups:
- name: storage-account-landscape
- name: azure_storage_account_success_end2end_latency
description: "The average end-to-end latency of successful requests made to a storage service or the specified API operation, in milliseconds. \
This value includes the required processing time within Azure Storage to read the request, \
send the response, and receive acknowledgment of the response."
resourceType: NetworkInterface
azureMetricConfiguration:
metricName: SuccessE2ELatency
aggregation:
type: Average
dimension:
name: ApiName
resourceDiscoveryGroups:
- name: storage-account-landscape
- name: azure_virtual_network_ping_mesh_average_roundtrip_ms
description: "Round trip time for Pings sent to a destination VM."
resourceType: VirtualNetwork
azureMetricConfiguration:
metricName: PingMeshAverageRoundtripMs
aggregation:
type: Average
dimension:
name: DestinationCustomerAddress
resourceDiscoveryGroups:
- name: virtual-network-landscape
- name: azure_virtual_network_mesh_probe_failed_percent
description: "Percent of number of failed Pings to total sent Pings of a destination VM."
resourceType: VirtualNetwork
azureMetricConfiguration:
metricName: PingMeshProbesFailedPercent
aggregation:
type: Average
dimension:
name: DestinationCustomerAddress
resourceDiscoveryGroups:
- name: virtual-network-landscape
- tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
effect: "NoSchedule"
- name: promitor-discovery
namespace: promitor
chart: promitor/promitor-agent-resource-discovery
values:
- azureAuthentication:
mode: UserAssignedManagedIdentity
identity:
id: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_discovery_client_id") }}
binding: promitor-discovery-identity
- azureLandscape:
cloud: Global
tenantId: {{ requiredEnv "ARM_TENANT_ID" }}
subscriptions:
- {{ requiredEnv "ARM_SUBSCRIPTION_ID" }}
- prometheus:
serviceMonitor:
enabled: true
namespace: promitor
- resourceDiscoveryGroups:
- name: container-registry-landscape
type: ContainerRegistry
- name: key-vault-landscape
type: KeyVault
- name: azure-kubernetes-service
type: KubernetesService
- name: virtual-network-landscape
type: VirtualNetwork
- name: storage-account-landscape
type: StorageAccount
- tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
effect: "NoSchedule"
- name: promitor-scraper-identity
namespace: promitor
chart: ../../charts/azure-identity
values:
- identityName: promitor-scraper-identity
- resourceId: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_scraper_resource_id") }}
- clientId: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_scraper_client_id") }}
- selector: promitor-scraper-identity
- name: promitor-discovery-identity
namespace: promitor
chart: ../../charts/azure-identity
values:
- identityName: promitor-discovery-identity
- resourceId: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_discovery_resource_id") }}
- clientId: {{ exec "terraform" (list "-chdir=../.." "output" "-json" "promitor_discovery_client_id") }}
- selector: promitor-discovery-identityLogs
top output in promitor-agent-resource-discovery shell:
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
1 0 root S 20.1g 121% 1 42% dotnet Promitor.Agents.ResourceDiscovery.dllk9s view
NAMESPACE NAME PF READY RESTARTS STATUS CPU↓ MEM %CPU/R %CPU/L %MEM/R %MEM/L IP NODE AGE
promitor promitor-agent-resource-discovery-585f9d9b8d-69wwq ● 1/1 0 Running 3319 259 n/a n/a n/a n/a xx.xxx.x.xx aks-system-33053085-vmss00000e 18h logs
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.3114ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/azure-kubernetes-service/discover?currentPage=1 - - - 200 654 application/json 2.7396ms
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 1.1231ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/container-registry-landscape/discover?currentPage=1 - - - 200 658 application/json 23.8754ms
[09:05:00 INF] Request starting HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/container-registry-landscape/discover?currentPage=1 - -
[09:05:00 INF] Route matched with {action = "Get", controller = "DiscoveryV2"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(System.String, Int32, Int32) on con troller Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller (Promitor.Agents.ResourceDiscovery).
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2467ms
[09:05:00 INF] Route matched with {action = "Get", controller = "DiscoveryV2"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(System.String, Int32, Int32) on con troller Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller (Promitor.Agents.ResourceDiscovery).
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2758ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/azure-kubernetes-service/discover?currentPage=1 - - - 200 654 application/json 33.5840ms
[09:05:00 INF] Request starting HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/azure-kubernetes-service/discover?currentPage=1 - -
[09:05:00 INF] Request starting HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/container-registry-landscape/discover?currentPage=1 - -
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/network-interfaces-landscape/discover?currentPage=1 - - - 200 2701 application/json 3.7663ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/container-registry-landscape/discover?currentPage=1 - - - 200 658 application/json 3.7642ms
[09:05:00 INF] Route matched with {action = "Get", controller = "DiscoveryV2"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(System.String, Int32, Int32) on con troller Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller (Promitor.Agents.ResourceDiscovery).
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2377ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/azure-kubernetes-service/discover?currentPage=1 - - - 200 654 application/json 3.0454ms
[09:05:00 INF] Request starting HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/virtual-network-landscape/discover?currentPage=1 - -
[09:05:00 INF] Route matched with {action = "Get", controller = "DiscoveryV2"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(System.String, Int32, Int32) on con troller Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller (Promitor.Agents.ResourceDiscovery).
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2297ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/virtual-network-landscape/discover?currentPage=1 - - - 200 640 application/json 8.9746ms
[09:05:00 INF] Route matched with {action = "Get", controller = "DiscoveryV2"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(System.String, Int32, Int32) on con troller Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller (Promitor.Agents.ResourceDiscovery).
[09:05:00 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v2.DiscoveryV2Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.1904ms
[09:05:00 INF] Request finished HTTP/1.1 GET http://promitor-agent-resource-discovery:8889/api/v2/resources/groups/container-registry-landscape/discover?currentPage=1 - - - 200 658 application/json 7.9096ms
[09:05:01 INF] Request starting HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - -
[09:05:01 INF] Route matched with {action = "Get", controller = "HealthV1"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(Boolean) on controller Promitor.Agents .ResourceDiscovery.Controllers.v1.HealthV1Controller (Promitor.Agents.ResourceDiscovery).
[09:05:01 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v1.HealthV1Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2385ms
[09:05:01 INF] Request finished HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - - - 200 60 application/json;+charset=utf-8 18.9811ms
[09:05:06 INF] Request starting HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - -
[09:05:06 INF] Route matched with {action = "Get", controller = "HealthV1"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(Boolean) on controller Promitor.Agents .ResourceDiscovery.Controllers.v1.HealthV1Controller (Promitor.Agents.ResourceDiscovery).
[09:05:06 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v1.HealthV1Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.2379ms
[09:05:06 INF] Request finished HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - - - 200 60 application/json;+charset=utf-8 1.4680ms
[09:05:11 INF] Request starting HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - -
[09:05:11 INF] Route matched with {action = "Get", controller = "HealthV1"}. Executing controller action with signature System.Threading.Tasks.Task`1[Microsoft.AspNetCore.Mvc.IActionResult] Get(Boolean) on controller Promitor.Agents .ResourceDiscovery.Controllers.v1.HealthV1Controller (Promitor.Agents.ResourceDiscovery).
[09:05:11 INF] Executed action Promitor.Agents.ResourceDiscovery.Controllers.v1.HealthV1Controller.Get (Promitor.Agents.ResourceDiscovery) in 0.9228ms
[09:05:11 INF] Request finished HTTP/1.1 GET http://xx.xxx.x.xx:88/api/v1/health?includeDependencies=false - - - 200 60 application/json;+charset=utf-8 4.3626ms Platform
Microsoft Azure
Contact Details
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementEnhancements for current featuresEnhancements for current featureshelp-wantedAll issues where people can contribute to the projectAll issues where people can contribute to the project
Projects
Status
Done
