-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS and ACI reporting slow to metrics server. Need faster scale out than 5 min. #119
Comments
metrics-server collects metrics from nodes pretty much on demand. Currently the best resolution we can even get from ACI is 1 minute (https://github.com/virtual-kubelet/azure-aci/blob/master/client/aci/metrics.go#L44), and that's in a best case scenario. Collections can get missed (within ACI) just due to low priority of the job. It's probably best to look at other metrics (such as requests per second?) for scale-out until ACI's metrics collection is more robust. |
Hello, Thanks for the quick reply. Concerning the resolution, do you mean the flag (https://virtual-kubelet.io/docs/usage/#flags) --full-resync-period duration ? I just want to ensure that we are on the same page concerning the ~5 minute delay we are experiencing. Since that would mean that we are unable to affect this value unless we turn to Azure ACI support? Looking forward to your response! |
No When metrics-server requests metrics, the ACI VK provider fetches them from the ACI API immediately, but ACI itself doesn't publish live metrics, only a summary over a time interval (1 minute being the shortest interval). In terms of initial metrics, there may even be a longer delay here. /cc @ibabou |
Thanks @cpuguy83 for the great input. it sounds as if you are recommending the following page as a solution to this problem. Cheers |
This has been solved by using realtime metrics as the default metrics. |
Environment summary
Provider: ACI
Version: v1.13.1-vk-v0.9.0-1
K8s Master Info: AKS
Install Method: Azure Portal
Issue Details
I have setup a new AKS cluster with Virtual Kubelet enabled. Then I perform a load test with the help of JMeter on my pods. Together with a HPA I succesfully autoscale pods onto ACI instances. However, I have noted that the metrics server does not get any metrics from the ACI instance until after ~5 minutes. After this time the HPA is updated, and a new scale out is performed. If the load the increases during the ~5 minutes waiting time the HPA will not be updated until the next ~5 minutes threshold.
Is there any way I can affect this timing to be less than ~5 minutes, since I want to be even more resilient to burst of traffic for my pods?
Repo Steps
Example output when no metrics is found on metrics-server is:
1 reststorage.go:93] No metrics for pod default/php-apache-86ddb69d6f-9fjwj
HPA.yaml
php-apache.yaml
service.yaml
Any ideas or feedback would be greatly appreciated.
Thanks for an otherwise awesome product!
The text was updated successfully, but these errors were encountered: