Description
Reproduction steps
- Create a cluster with
instance_type: inf1.6xlarge
,min_instances: 1
, andmax_instances: 2
- Add
min_replicas: 4
toexamples/tensorflow/image-classifier-resnet50/cortex_inf.yaml
, andcortex deploy
it - Wait for 4 replicas to be live
- Update to
min_replicas: 5
and re-deploy - Wait for new replica to be live
- Update to
min_replicas: 4
and re-deploy - Wait
Behavior
When the 5th replica is requested, a new instance spins up. When the extra replica is removed after reverting min_replicas
to 4
, k8s removes a replica from the node with 4 running pods, rather than the node with 1. At that point, there is a node with 3 replicas and a node with 1. Then, when the cluster autoscaler terminates the extra instance, it terminates the instance with 3 replicas, causing 3 replicas to be moved to the instance with 1 replica.
Notes
It could make sense for k8s to take the pod out of the node with 4 running replicas rather than the one with only 1, for the sake of spreading the load. Even if that's so, ideally the cluster autoscaler should terminate the node with only one running pod.
It has not been checked whether the behavior is the same for CPU or GPU instances.
Cluster autoscaler logs
I0910 21:01:06.106834 1 static_autoscaler.go:194] Starting main loop
I0910 21:01:06.107160 1 utils.go:272] Removing autoscaler soft taint when creating template from node ip-192-168-19-138.us-west-2.compute.internal
I0910 21:01:06.107707 1 utils.go:328] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0910 21:01:06.107729 1 filter_out_schedulable.go:66] Filtering out schedulables
I0910 21:01:06.107930 1 filter_out_schedulable.go:131] 0 other pods marked as unschedulable can be scheduled.
I0910 21:01:06.108121 1 filter_out_schedulable.go:131] 0 other pods marked as unschedulable can be scheduled.
I0910 21:01:06.108142 1 filter_out_schedulable.go:91] No schedulable pods
I0910 21:01:06.108207 1 static_autoscaler.go:343] No unschedulable pods
I0910 21:01:06.108275 1 static_autoscaler.go:390] Calculating unneeded nodes
I0910 21:01:06.108352 1 pre_filtering_processor.go:66] Skipping ip-192-168-17-254.us-west-2.compute.internal - node group min size reached
I0910 21:01:06.108482 1 scale_down.go:462] Node ip-192-168-93-125.us-west-2.compute.internal - cpu utilization 0.148101
I0910 21:01:06.108504 1 scale_down.go:462] Node ip-192-168-19-138.us-west-2.compute.internal - cpu utilization 0.401266
I0910 21:01:06.108645 1 cluster.go:93] Fast evaluation: ip-192-168-93-125.us-west-2.compute.internal for removal
I0910 21:01:06.108812 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-nbgbb can be moved to ip-192-168-19-138.us-west-2.compute.internal
I0910 21:01:06.108900 1 cluster.go:124] Fast evaluation: node ip-192-168-93-125.us-west-2.compute.internal may be removed
I0910 21:01:06.108910 1 cluster.go:93] Fast evaluation: ip-192-168-19-138.us-west-2.compute.internal for removal
I0910 21:01:06.109010 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-7hn6n can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.109091 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-bn6zx can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.109178 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-8kc2t can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.109244 1 cluster.go:124] Fast evaluation: node ip-192-168-19-138.us-west-2.compute.internal may be removed
I0910 21:01:06.109325 1 static_autoscaler.go:428] ip-192-168-93-125.us-west-2.compute.internal is unneeded since 2020-09-10 20:55:24.8424846 +0000 UTC m=+2168.073702427 duration 5m41.264317462s
I0910 21:01:06.109361 1 static_autoscaler.go:428] ip-192-168-19-138.us-west-2.compute.internal is unneeded since 2020-09-10 20:51:04.066117087 +0000 UTC m=+1907.297334986 duration 10m2.040684903s
I0910 21:01:06.109404 1 static_autoscaler.go:439] Scale down status: unneededOnly=false lastScaleUpTime=2020-09-10 20:38:41.353225136 +0000 UTC m=+1164.584443026 lastScaleDownDeleteTime=2020-09-10 20:19:37.665494816 +0000 UTC m=+20.896712627 lastScaleDownFailTime=2020-09-10 20:19:37.665494907 +0000 UTC m=+20.896712709 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I0910 21:01:06.109426 1 static_autoscaler.go:452] Starting scale down
I0910 21:01:06.109524 1 scale_down.go:716] ip-192-168-93-125.us-west-2.compute.internal was unneeded for 5m41.264317462s
I0910 21:01:06.109537 1 scale_down.go:716] ip-192-168-19-138.us-west-2.compute.internal was unneeded for 10m2.040684903s
I0910 21:01:06.109681 1 cluster.go:93] Detailed evaluation: ip-192-168-19-138.us-west-2.compute.internal for removal
I0910 21:01:06.109805 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-7hn6n can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.109880 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-bn6zx can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.109973 1 cluster.go:251] Pod default/api-image-classifier-resnet50-7984c7f68-8kc2t can be moved to ip-192-168-93-125.us-west-2.compute.internal
I0910 21:01:06.110045 1 cluster.go:124] Detailed evaluation: node ip-192-168-19-138.us-west-2.compute.internal may be removed
I0910 21:01:06.110109 1 scale_down.go:827] Scale-down: removing node ip-192-168-19-138.us-west-2.compute.internal, utilization: {0.4012658227848101 0.2612578686367515 0 cpu 0.4012658227848101}, pods to reschedule: default/api-image-classifier-resnet50-7984c7f68-7hn6n,default/api-image-classifier-resnet50-7984c7f68-bn6zx,default/api-image-classifier-resnet50-7984c7f68-8kc2t
I0910 21:01:06.110408 1 event.go:281] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"1dfcc5cc-9d14-414d-a413-9a3e4d9beb4f", APIVersion:"v1", ResourceVersion:"12215", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' Scale-down: removing node ip-192-168-19-138.us-west-2.compute.internal, utilization: {0.4012658227848101 0.2612578686367515 0 cpu 0.4012658227848101}, pods to reschedule: default/api-image-classifier-resnet50-7984c7f68-7hn6n,default/api-image-classifier-resnet50-7984c7f68-bn6zx,default/api-image-classifier-resnet50-7984c7f68-8kc2t
I0910 21:01:06.118121 1 delete.go:102] Successfully added ToBeDeletedTaint on node ip-192-168-19-138.us-west-2.compute.internal
I0910 21:01:06.118525 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"api-image-classifier-resnet50-7984c7f68-7hn6n", UID:"d9e5d6d1-60bd-4598-9f9a-847e738953fa", APIVersion:"v1", ResourceVersion:"4943", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' deleting pod for node scale down
I0910 21:01:06.118552 1 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-192-168-19-138.us-west-2.compute.internal", UID:"71d09345-5162-4488-ae94-4726d61c75a6", APIVersion:"v1", ResourceVersion:"11917", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' marked the node as toBeDeleted/unschedulable
I0910 21:01:06.118566 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"api-image-classifier-resnet50-7984c7f68-8kc2t", UID:"95d685ed-baa9-4798-a6a2-cecf120ac3a8", APIVersion:"v1", ResourceVersion:"4935", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' deleting pod for node scale down
I0910 21:01:06.118769 1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"api-image-classifier-resnet50-7984c7f68-bn6zx", UID:"e3e06400-8699-4acf-9688-47c456b88014", APIVersion:"v1", ResourceVersion:"4922", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' deleting pod for node scale down
E0910 21:01:06.704873 1 scale_down.go:1129] Not deleted yet default/api-image-classifier-resnet50-7984c7f68-7hn6n
E0910 21:01:11.709670 1 scale_down.go:1129] Not deleted yet default/api-image-classifier-resnet50-7984c7f68-7hn6n
E0910 21:01:16.713689 1 scale_down.go:1129] Not deleted yet default/api-image-classifier-resnet50-7984c7f68-7hn6n
E0910 21:01:21.717512 1 scale_down.go:1129] Not deleted yet default/api-image-classifier-resnet50-7984c7f68-7hn6n