node autoscaling not scaling #324

pkelleratwork · 2019-03-27T20:30:15Z

I have issues

I'm submitting a...

bug report
feature request
support request
kudos, thank you, warm fuzzy

What is the current behavior?

node autoscaling does not scale any nodes

If this is a bug, how to reproduce? Please include a code sample if relevant.

ran this module---

module "create-cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "2.2.1"

  cluster_name              = "demothis"
  cluster_version           = "1.11"
  kubeconfig_name           = "demothis"
  manage_aws_auth           = "false"
  subnets                   = "3-public-subnets-here"
  vpc_id                    = "vpc-id-here"

  # worker node configurations
  # https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/local.tf

  workers_group_defaults = {
      asg_max_size          = "50"
      autoscaling_enabled   = "true"
      instance_type         = "t3.medium"
    }

  # tags to add to all resources
  tags = {
    cluster                 = "demothis"
    environment             = "dev"
  }
}

created clusterrolebindings
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
installed cluster-autoscaling via helm

resource "helm_release" "cluster-autoscaling" {
    name        = "cluster-autoscaler"
    namespace   = "kube-system"
    chart       = "stable/cluster-autoscaler"

    set {
        name    = "autoDiscovery.clusterName"
        value   = "demothis"
    }

    set {
        name    = "autoDiscovery.enabled"
        value   = "true"
    }

    set {
        name    = "cloudProvider"
        value   = "aws"
    }

    set {
        name    = "awsRegion"
        value   = "us-east-1"
    }

    set {
        name    = "sslCertPath"
        value   = "/etc/kubernetes/pki/ca.crt"
    }

    set {
        name    = "rbac.create"
        value   = "true"
    }
}

verified cluster-autoscaling installed to kube-system

(⎈ demothis:)➜  projects ✗ kgp -n kube-system
NAME                                                        READY   STATUS    RESTARTS   AGE
aws-node-p4sww                                              1/1     Running   0          18h
cluster-autoscaler-aws-cluster-autoscaler-676f48b86-lzhtv   1/1     Running   1          18h
coredns-7bcbfc4774-6hnr4                                    1/1     Running   0          18h
coredns-7bcbfc4774-zkf64                                    1/1     Running   0          18h
kube-proxy-4btkw                                            1/1     Running   0          18h
kubernetes-dashboard-5478c45897-bqfhh                       1/1     Running   0          18h
metrics-server-5f64dbfb9d-qlvm7                             1/1     Running   0          18h
tiller-deploy-6fb466b55b-5m9b7                              1/1     Running   0          18h

loaded apps and got a pending after number of pods were created....

(⎈ demothis:)➜  projects ✗ kgp
NAME                                READY   STATUS    RESTARTS   AGE
pine-android-bff-9b4bbc55f-mhdzl    0/1     Pending   0          1m
pine-api-7d9776d589-99fpq           1/1     Running   0          1m
pine-api-7d9776d589-dr7rw           1/1     Running   0          1m
pine-auth-service-9b9f5dc66-r7j9z   1/1     Running   0          35m
pine-web-6bf54bf695-brv4w           1/1     Running   0          35m
pine-web-bff-7b49cd44c6-8qcj6       1/1     Running   0          35m

cluster autoscaling logs -

I0327 19:52:01.520925       1 static_autoscaler.go:128] Starting main loop
I0327 19:52:01.697210       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []
I0327 19:52:01.697232       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-03-27 19:52:11.697229478 +0000 UTC m=+47783.645256945
I0327 19:52:01.697302       1 utils.go:526] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0327 19:52:01.697313       1 static_autoscaler.go:261] Filtering out schedulables
I0327 19:52:01.697367       1 static_autoscaler.go:271] No schedulable pods
I0327 19:52:01.697380       1 scale_up.go:262] Pod default/pine-android-bff-6bcb794bd8-7pl2f is unschedulable
I0327 19:52:01.697406       1 scale_up.go:304] Upcoming 0 nodes
I0327 19:52:01.697415       1 scale_up.go:420] No expansion options
I0327 19:52:01.697447       1 static_autoscaler.go:333] Calculating unneeded nodes
I0327 19:52:01.697457       1 utils.go:474] Skipping ip-192-168-174-2.ec2.internal - no node group config
I0327 19:52:01.697507       1 static_autoscaler.go:360] Scale down status: unneededOnly=false lastScaleUpTime=2019-03-27 06:36:26.055084128 +0000 UTC m=+38.003111545 lastScaleDownDeleteTime=2019-03-27 06:36:26.055084241 +0000 UTC m=+38.003111657 lastScaleDownFailTime=2019-03-27 06:36:26.055084352 +0000 UTC m=+38.003111770 scaleDownForbidden=false isDeleteInProgress=false
I0327 19:52:01.697522       1 static_autoscaler.go:370] Starting scale down
I0327 19:52:01.697547       1 scale_down.go:659] No candidates for scale down
I0327 19:52:01.698128       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"pine-android-bff-6bcb794bd8-7pl2f", UID:"90d94a29-50c9-11e9-989f-0e074c530082", APIVersion:"v1", ResourceVersion:"133307", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):

What's the expected behavior?

autoscaling should work - https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md

Are you able to fix this problem and submit a PR? Link here if you have already. nope.

Environment details

Affected module version: terraform-aws-modules/eks/aws 2.2.1
OS: aws eks ec2 worker node
Terraform version: Terraform v0.11.13

Any other relevant info

I noticed there is no k8s.io/cluster-autoscaler/enabled tag created on the ec2 worker nodes. I tried adding it manually and restarting cluster-autoscaling pod - did not work

The text was updated successfully, but these errors were encountered:

max-rocket-internet · 2019-03-28T08:48:19Z

What is the autoscaler logging on startup? It should look something like this:

I0328 08:44:11.028188       1 main.go:333] Cluster Autoscaler 1.13.1
I0328 08:44:11.065134       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/cluster-autoscaler...
I0328 08:44:28.481337       1 leaderelection.go:214] successfully acquired lease kube-system/cluster-autoscaler
I0328 08:44:28.513396       1 predicates.go:122] Using predicate PodFitsResources
I0328 08:44:28.513418       1 predicates.go:122] Using predicate GeneralPredicates
I0328 08:44:28.513425       1 predicates.go:122] Using predicate PodToleratesNodeTaints
I0328 08:44:28.513432       1 predicates.go:122] Using predicate CheckVolumeBinding
I0328 08:44:28.513439       1 predicates.go:122] Using predicate MaxAzureDiskVolumeCount
I0328 08:44:28.513446       1 predicates.go:122] Using predicate MaxEBSVolumeCount
I0328 08:44:28.513453       1 predicates.go:122] Using predicate NoVolumeZoneConflict
I0328 08:44:28.513460       1 predicates.go:122] Using predicate ready
I0328 08:44:28.513467       1 predicates.go:122] Using predicate CheckNodeUnschedulable
I0328 08:44:28.513474       1 predicates.go:122] Using predicate MatchInterPodAffinity
I0328 08:44:28.513531       1 predicates.go:122] Using predicate MaxCSIVolumeCountPred
I0328 08:44:28.513538       1 predicates.go:122] Using predicate MaxGCEPDVolumeCount
I0328 08:44:28.513545       1 predicates.go:122] Using predicate NoDiskConflict
I0328 08:44:28.513552       1 cloud_provider_builder.go:29] Building aws cloud provider.
I0328 08:44:28.793414       1 auto_scaling_groups.go:124] Registering ASG xx01-xxx-xxxxxxxxxxxxxxxxxxx

The last line there is showing you that it has found an ASG to use.

Also this message pod didn't trigger scale-up (it wouldn't fit if a new node is added) could mean that the pod is requesting more resources then a whole node has, i.e. adding a node to the cluster won't help.

pkelleratwork · 2019-03-28T14:24:18Z

@max-rocket-internet looks like Registering ASG never happens. i get up to Building aws cloud provider but after that only receive repeated
I0328 14:11:30.578927 1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []

I'll keep trying to figure it out - thx

pkelleratwork · 2019-03-28T16:07:38Z

@max-rocket-internet im very stuck - out of ideas. CA is not working because pods are not scheduling. I noticed after redeploying cluster from scratch with autoscaling_enabled ="true" the ASG tag was k8s.io/cluster-autoscaler/disabled. I changed that to k8s.io/cluster-autoscaler/enabled` and checked yes to tag new instances.

i increase min number of nodes to 2, and was able to successfull install all apps. I repeated this until the pods filled up but never scaled.

the CA logs give the following -

(⎈ demothis:)➜  projects ✗ kl cluster-autoscaler-aws-cluster-autoscaler-676f48b86-4ltwz -n kube-system
I0328 16:00:32.464407       1 flags.go:52] FLAG: --address=":8085"
I0328 16:00:32.464427       1 flags.go:52] FLAG: --alsologtostderr="false"
I0328 16:00:32.464431       1 flags.go:52] FLAG: --balance-similar-node-groups="false"
I0328 16:00:32.464435       1 flags.go:52] FLAG: --cloud-config=""
I0328 16:00:32.464439       1 flags.go:52] FLAG: --cloud-provider="aws"
I0328 16:00:32.464445       1 flags.go:52] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I0328 16:00:32.464451       1 flags.go:52] FLAG: --cluster-name=""
I0328 16:00:32.464454       1 flags.go:52] FLAG: --cores-total="0:320000"
I0328 16:00:32.464553       1 flags.go:52] FLAG: --estimator="binpacking"
I0328 16:00:32.464560       1 flags.go:52] FLAG: --expander="random"
I0328 16:00:32.464564       1 flags.go:52] FLAG: --expendable-pods-priority-cutoff="-10"
I0328 16:00:32.464569       1 flags.go:52] FLAG: --gke-api-endpoint=""
I0328 16:00:32.464573       1 flags.go:52] FLAG: --gpu-total="[]"
I0328 16:00:32.464577       1 flags.go:52] FLAG: --httptest.serve=""
I0328 16:00:32.464581       1 flags.go:52] FLAG: --ignore-daemonsets-utilization="false"
I0328 16:00:32.464587       1 flags.go:52] FLAG: --ignore-mirror-pods-utilization="false"
I0328 16:00:32.464591       1 flags.go:52] FLAG: --kubeconfig=""
I0328 16:00:32.464595       1 flags.go:52] FLAG: --kubernetes=""
I0328 16:00:32.464607       1 flags.go:52] FLAG: --leader-elect="true"
I0328 16:00:32.464614       1 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I0328 16:00:32.464620       1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I0328 16:00:32.464625       1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints"
I0328 16:00:32.464630       1 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I0328 16:00:32.464635       1 flags.go:52] FLAG: --log-backtrace-at=":0"
I0328 16:00:32.464745       1 flags.go:52] FLAG: --log-dir=""
I0328 16:00:32.464753       1 flags.go:52] FLAG: --log-file=""
I0328 16:00:32.464757       1 flags.go:52] FLAG: --logtostderr="true"
I0328 16:00:32.464761       1 flags.go:52] FLAG: --max-autoprovisioned-node-group-count="15"
I0328 16:00:32.464766       1 flags.go:52] FLAG: --max-empty-bulk-delete="10"
I0328 16:00:32.464770       1 flags.go:52] FLAG: --max-failing-time="15m0s"
I0328 16:00:32.464775       1 flags.go:52] FLAG: --max-graceful-termination-sec="600"
I0328 16:00:32.464779       1 flags.go:52] FLAG: --max-inactivity="10m0s"
I0328 16:00:32.464783       1 flags.go:52] FLAG: --max-node-provision-time="15m0s"
I0328 16:00:32.464787       1 flags.go:52] FLAG: --max-nodes-total="0"
I0328 16:00:32.464791       1 flags.go:52] FLAG: --max-total-unready-percentage="45"
I0328 16:00:32.464796       1 flags.go:52] FLAG: --memory-total="0:6400000"
I0328 16:00:32.464800       1 flags.go:52] FLAG: --min-replica-count="0"
I0328 16:00:32.464805       1 flags.go:52] FLAG: --namespace="kube-system"
I0328 16:00:32.464809       1 flags.go:52] FLAG: --new-pod-scale-up-delay="0s"
I0328 16:00:32.464813       1 flags.go:52] FLAG: --node-autoprovisioning-enabled="false"
I0328 16:00:32.464817       1 flags.go:52] FLAG: --node-group-auto-discovery="[asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/demo]"
I0328 16:00:32.464832       1 flags.go:52] FLAG: --nodes="[]"
I0328 16:00:32.464836       1 flags.go:52] FLAG: --ok-total-unready-count="3"
I0328 16:00:32.464850       1 flags.go:52] FLAG: --regional="false"
I0328 16:00:32.464855       1 flags.go:52] FLAG: --scale-down-candidates-pool-min-count="50"
I0328 16:00:32.464859       1 flags.go:52] FLAG: --scale-down-candidates-pool-ratio="0.1"
I0328 16:00:32.464864       1 flags.go:52] FLAG: --scale-down-delay-after-add="10m0s"
I0328 16:00:32.464868       1 flags.go:52] FLAG: --scale-down-delay-after-delete="10s"
I0328 16:00:32.464873       1 flags.go:52] FLAG: --scale-down-delay-after-failure="3m0s"
I0328 16:00:32.464877       1 flags.go:52] FLAG: --scale-down-enabled="true"
I0328 16:00:32.464882       1 flags.go:52] FLAG: --scale-down-non-empty-candidates-count="30"
I0328 16:00:32.464886       1 flags.go:52] FLAG: --scale-down-unneeded-time="10m0s"
I0328 16:00:32.464890       1 flags.go:52] FLAG: --scale-down-unready-time="20m0s"
I0328 16:00:32.464895       1 flags.go:52] FLAG: --scale-down-utilization-threshold="0.5"
I0328 16:00:32.464899       1 flags.go:52] FLAG: --scan-interval="10s"
I0328 16:00:32.464904       1 flags.go:52] FLAG: --skip-headers="false"
I0328 16:00:32.464908       1 flags.go:52] FLAG: --skip-nodes-with-local-storage="true"
I0328 16:00:32.464912       1 flags.go:52] FLAG: --skip-nodes-with-system-pods="true"
I0328 16:00:32.464917       1 flags.go:52] FLAG: --stderrthreshold="0"
I0328 16:00:32.464921       1 flags.go:52] FLAG: --test.bench=""
I0328 16:00:32.464925       1 flags.go:52] FLAG: --test.benchmem="false"
I0328 16:00:32.465021       1 flags.go:52] FLAG: --test.benchtime="1s"
I0328 16:00:32.465025       1 flags.go:52] FLAG: --test.blockprofile=""
I0328 16:00:32.465029       1 flags.go:52] FLAG: --test.blockprofilerate="1"
I0328 16:00:32.465033       1 flags.go:52] FLAG: --test.count="1"
I0328 16:00:32.465037       1 flags.go:52] FLAG: --test.coverprofile=""
I0328 16:00:32.465041       1 flags.go:52] FLAG: --test.cpu=""
I0328 16:00:32.465045       1 flags.go:52] FLAG: --test.cpuprofile=""
I0328 16:00:32.465049       1 flags.go:52] FLAG: --test.failfast="false"
I0328 16:00:32.465055       1 flags.go:52] FLAG: --test.list=""
I0328 16:00:32.465058       1 flags.go:52] FLAG: --test.memprofile=""
I0328 16:00:32.465062       1 flags.go:52] FLAG: --test.memprofilerate="0"
I0328 16:00:32.465074       1 flags.go:52] FLAG: --test.mutexprofile=""
I0328 16:00:32.465078       1 flags.go:52] FLAG: --test.mutexprofilefraction="1"
I0328 16:00:32.465083       1 flags.go:52] FLAG: --test.outputdir=""
I0328 16:00:32.465087       1 flags.go:52] FLAG: --test.parallel="2"
I0328 16:00:32.465091       1 flags.go:52] FLAG: --test.run=""
I0328 16:00:32.465095       1 flags.go:52] FLAG: --test.short="false"
I0328 16:00:32.465099       1 flags.go:52] FLAG: --test.testlogfile=""
I0328 16:00:32.465104       1 flags.go:52] FLAG: --test.timeout="0s"
I0328 16:00:32.465108       1 flags.go:52] FLAG: --test.trace=""
I0328 16:00:32.465173       1 flags.go:52] FLAG: --test.v="false"
I0328 16:00:32.465181       1 flags.go:52] FLAG: --unremovable-node-recheck-timeout="5m0s"
I0328 16:00:32.465186       1 flags.go:52] FLAG: --v="4"
I0328 16:00:32.465190       1 flags.go:52] FLAG: --vmodule=""
I0328 16:00:32.465194       1 flags.go:52] FLAG: --write-status-configmap="true"
I0328 16:00:32.465202       1 main.go:333] Cluster Autoscaler 1.13.1
I0328 16:00:32.492416       1 leaderelection.go:205] attempting to acquire leader lease  kube-system/cluster-autoscaler...
I0328 16:00:32.504813       1 leaderelection.go:289] lock is held by cluster-autoscaler-aws-cluster-autoscaler-676f48b86-nwbcc and has not yet expired
I0328 16:00:32.504841       1 leaderelection.go:210] failed to acquire lease kube-system/cluster-autoscaler
I0328 16:00:35.960675       1 leaderelection.go:289] lock is held by cluster-autoscaler-aws-cluster-autoscaler-676f48b86-nwbcc and has not yet expired
I0328 16:00:35.960698       1 leaderelection.go:210] failed to acquire lease kube-system/cluster-autoscaler
I0328 16:00:40.225708       1 leaderelection.go:289] lock is held by cluster-autoscaler-aws-cluster-autoscaler-676f48b86-nwbcc and has not yet expired
I0328 16:00:40.225734       1 leaderelection.go:210] failed to acquire lease kube-system/cluster-autoscaler
I0328 16:00:43.831262       1 leaderelection.go:289] lock is held by cluster-autoscaler-aws-cluster-autoscaler-676f48b86-nwbcc and has not yet expired
I0328 16:00:43.831284       1 leaderelection.go:210] failed to acquire lease kube-system/cluster-autoscaler
I0328 16:00:46.886418       1 leaderelection.go:289] lock is held by cluster-autoscaler-aws-cluster-autoscaler-676f48b86-nwbcc and has not yet expired
I0328 16:00:46.886441       1 leaderelection.go:210] failed to acquire lease kube-system/cluster-autoscaler
I0328 16:00:49.917517       1 leaderelection.go:214] successfully acquired lease kube-system/cluster-autoscaler
I0328 16:00:49.917811       1 factory.go:33] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"cluster-autoscaler", UID:"883ca038-516c-11e9-8f7a-0eb365398f8c", APIVersion:"v1", ResourceVersion:"8912", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' cluster-autoscaler-aws-cluster-autoscaler-676f48b86-4ltwz became leader
I0328 16:00:49.919829       1 reflector.go:131] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:239
I0328 16:00:49.919853       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:239
I0328 16:00:49.919862       1 reflector.go:131] Starting reflector *v1beta1.PodDisruptionBudget (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:266
I0328 16:00:49.919870       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:266
I0328 16:00:49.919957       1 reflector.go:131] Starting reflector *v1beta1.DaemonSet (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:293
I0328 16:00:49.919964       1 reflector.go:169] Listing and watching *v1beta1.DaemonSet from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:293
I0328 16:00:49.920027       1 reflector.go:131] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:174
I0328 16:00:49.920033       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:174
I0328 16:00:49.920123       1 reflector.go:131] Starting reflector *v1.Pod (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:149
I0328 16:00:49.920131       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:149
I0328 16:00:49.920211       1 reflector.go:131] Starting reflector *v1.Node (1h0m0s) from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I0328 16:00:49.920218       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:212
I0328 16:00:49.945978       1 predicates.go:122] Using predicate PodFitsResources
I0328 16:00:49.946002       1 predicates.go:122] Using predicate GeneralPredicates
I0328 16:00:49.946007       1 predicates.go:122] Using predicate PodToleratesNodeTaints
I0328 16:00:49.946011       1 predicates.go:122] Using predicate ready
I0328 16:00:49.946015       1 predicates.go:122] Using predicate MaxEBSVolumeCount
I0328 16:00:49.946037       1 predicates.go:122] Using predicate NoDiskConflict
I0328 16:00:49.946065       1 predicates.go:122] Using predicate NoVolumeZoneConflict
I0328 16:00:49.946082       1 predicates.go:122] Using predicate MatchInterPodAffinity
I0328 16:00:49.946118       1 predicates.go:122] Using predicate MaxAzureDiskVolumeCount
I0328 16:00:49.946157       1 predicates.go:122] Using predicate MaxCSIVolumeCountPred
I0328 16:00:49.946195       1 predicates.go:122] Using predicate MaxGCEPDVolumeCount
I0328 16:00:49.946222       1 predicates.go:122] Using predicate CheckNodeUnschedulable
I0328 16:00:49.946227       1 predicates.go:122] Using predicate CheckVolumeBinding
I0328 16:00:49.946242       1 cloud_provider_builder.go:29] Building aws cloud provider.
I0328 16:00:49.948777       1 reflector.go:131] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.948797       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.949188       1 reflector.go:131] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.949204       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.949815       1 reflector.go:131] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.949833       1 reflector.go:169] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.950762       1 reflector.go:131] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.950786       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.951226       1 reflector.go:131] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.951241       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.951559       1 reflector.go:131] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.951579       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.952286       1 reflector.go:131] Starting reflector *v1beta1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.952303       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.957846       1 reflector.go:131] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.957927       1 reflector.go:169] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.958240       1 reflector.go:131] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.958335       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.958620       1 reflector.go:131] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:132
I0328 16:00:49.958694       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:132
I0328 16:00:50.119839       1 request.go:530] Throttling request took 168.496976ms, request: GET:https://10.100.0.1:443/api/v1/persistentvolumes?limit=500&resourceVersion=0
I0328 16:00:50.182236       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []
I0328 16:00:50.182394       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-03-28 16:01:00.182387611 +0000 UTC m=+27.740626393
I0328 16:00:50.182812       1 main.go:252] Registered cleanup signal handler
I0328 16:00:50.319811       1 request.go:530] Throttling request took 361.319819ms, request: GET:https://10.100.0.1:443/api/v1/pods?limit=500&resourceVersion=0
I0328 16:01:00.200777       1 static_autoscaler.go:128] Starting main loop
I0328 16:01:00.398826       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []
I0328 16:01:00.398849       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-03-28 16:01:10.398846028 +0000 UTC m=+37.957084784
I0328 16:01:00.398948       1 utils.go:526] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0328 16:01:00.398957       1 static_autoscaler.go:261] Filtering out schedulables
I0328 16:01:00.399023       1 static_autoscaler.go:271] No schedulable pods
I0328 16:01:00.399031       1 static_autoscaler.go:279] No unschedulable pods
I0328 16:01:00.399041       1 static_autoscaler.go:333] Calculating unneeded nodes
I0328 16:01:00.399056       1 utils.go:474] Skipping ip-192-168-211-186.ec2.internal - no node group config
I0328 16:01:00.399065       1 utils.go:474] Skipping ip-192-168-74-229.ec2.internal - no node group config
I0328 16:01:00.399157       1 static_autoscaler.go:360] Scale down status: unneededOnly=true lastScaleUpTime=2019-03-28 16:00:50.182644653 +0000 UTC m=+17.740883396 lastScaleDownDeleteTime=2019-03-28 16:00:50.182644743 +0000 UTC m=+17.740883492 lastScaleDownFailTime=2019-03-28 16:00:50.182644847 +0000 UTC m=+17.740883587 scaleDownForbidden=false isDeleteInProgress=false
I0328 16:11:01.100587       1 static_autoscaler.go:370] Starting scale down
I0328 16:11:01.100612       1 scale_down.go:659] No candidates for scale down
I0328 16:11:01.100745       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"oak-api-664d8fcb9f-cpm6x", UID:"f543956c-5173-11e9-b66f-027f9ed717b8", APIVersion:"v1", ResourceVersion:"10117", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):
I0328 16:11:01.100767       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"oak-api-664d8fcb9f-77llx", UID:"f54463dc-5173-11e9-b66f-027f9ed717b8", APIVersion:"v1", ResourceVersion:"10122", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):
I0328 16:11:11.112297       1 static_autoscaler.go:128] Starting main loop
I0328 16:11:11.273326       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: []
I0328 16:11:11.273350       1 aws_manager.go:152] Refreshed ASG list, next refresh after 2019-03-28 16:11:21.273346679 +0000 UTC m=+648.831585439
I0328 16:11:11.273444       1 utils.go:526] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0328 16:11:11.273485       1 static_autoscaler.go:261] Filtering out schedulables
I0328 16:11:11.273870       1 static_autoscaler.go:271] No schedulable pods
I0328 16:11:11.273940       1 scale_up.go:262] Pod default/oak-api-664d8fcb9f-cpm6x is unschedulable
I0328 16:11:11.274176       1 scale_up.go:262] Pod default/oak-api-664d8fcb9f-77llx is unschedulable
I0328 16:11:11.274261       1 scale_up.go:304] Upcoming 0 nodes
I0328 16:11:11.274297       1 scale_up.go:420] No expansion options
I0328 16:11:11.274357       1 static_autoscaler.go:333] Calculating unneeded nodes
I0328 16:11:11.274379       1 utils.go:474] Skipping ip-192-168-211-186.ec2.internal - no node group config
I0328 16:11:11.274415       1 utils.go:474] Skipping ip-192-168-74-229.ec2.internal - no node group config
I0328 16:11:11.274539       1 static_autoscaler.go:360] Scale down status: unneededOnly=false lastScaleUpTime=2019-03-28 16:00:50.182644653 +0000 UTC m=+17.740883396 lastScaleDownDeleteTime=2019-03-28 16:00:50.182644743 +0000 UTC m=+17.740883492 lastScaleDownFailTime=2019-03-28 16:00:50.182644847 +0000 UTC m=+17.740883587 scaleDownForbidden=false isDeleteInProgress=false
I0328 16:11:11.274577       1 static_autoscaler.go:370] Starting scale down
I0328 16:11:11.274617       1 scale_down.go:659] No candidates for scale down
I0328 16:11:11.275259       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"oak-api-664d8fcb9f-77llx", UID:"f54463dc-5173-11e9-b66f-027f9ed717b8", APIVersion:"v1", ResourceVersion:"10122", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):
I0328 16:11:11.275319       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"oak-api-664d8fcb9f-cpm6x", UID:"f543956c-5173-11e9-b66f-027f9ed717b8", APIVersion:"v1", ResourceVersion:"10117", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added):
(⎈ demothis:)➜  projects ✗

pkelleratwork · 2019-03-28T19:59:16Z

GOT IT! I think i found a bug!! the tag is created wrong in the ASG....

i ran my cluster module (above)
i manually changed the ASG tag from k8s.io/cluster-autoscaler/disabled to k8s.io/cluster-autoscaler/enabled and checked Yes to tag new instances.
ran cluster autocaling helm install
verified in the logs the CA Registered the ASG!!!

max-rocket-internet · 2019-04-03T10:30:15Z

OK cool but how is your tag set like that? I am using module version v2.3.0 and this is the worker group config:

  worker_groups = [
    {
      instance_type       = "m4.xlarge"
      asg_max_size        = 40
      autoscaling_enabled = true
      additional_userdata = "${xxxxx}"
      kubelet_extra_args  = "--node-labels=xxx=xxxx"
    },
  ]

And the tag on the ASG is k8s.io/cluster-autoscaler/enabled=true

pkelleratwork · 2019-04-03T18:35:37Z

I'm using terraform-aws-modules/eks/aws v2.2.1 - didnt realize there was a newer version. I'll try 2.3.1 now.

pkelleratwork · 2019-04-03T19:38:52Z

v2.3.1 is not working either. here is the module ran, the output and aws console screenshot.

module

module "create-cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "2.3.1"

  cluster_name              = "demothis"
  cluster_version           = "1.11"
  kubeconfig_name           = "demothis"
  manage_aws_auth           = "false"
  subnets                   = "[x,yz]"
  vpc_id                    = "vpc-123"

  # worker node configurations
  worker_groups = [
    {
      asg_desired_capacity  = "2"
      asg_max_size          = "50"
      autoscaling_enabled   = "true"
      instance_type         = "t3.medium"
    }
  ]

  # tags to add to all resources
  tags = {
    cluster                 = "demothis"
    environment             = "dev"
  }
}

auto-scaling output - tags.2.key are wrong

module.demothis-cluster.module.create-cluster.aws_autoscaling_group.workers: Creating...
  arn:                            "" => "<computed>"
  default_cooldown:               "" => "<computed>"
  desired_capacity:               "" => "2"
  force_delete:                   "" => "false"
  health_check_grace_period:      "" => "300"
  health_check_type:              "" => "<computed>"
  launch_configuration:           "" => "demothis-02019040319275202730000000d"
  load_balancers.#:               "" => "<computed>"
  max_size:                       "" => "50"
  metrics_granularity:            "" => "1Minute"
  min_size:                       "" => "1"
  name:                           "" => "<computed>"
  name_prefix:                    "" => "demothis-0"
  protect_from_scale_in:          "" => "false"
  service_linked_role_arn:        "" => "<computed>"
  tags.#:                         "" => "7"
  tags.0.%:                       "" => "3"
  tags.0.key:                     "" => "Name"
  tags.0.propagate_at_launch:     "" => "1"
  tags.0.value:                   "" => "demothis-0-eks_asg"
  tags.1.%:                       "" => "3"
  tags.1.key:                     "" => "kubernetes.io/cluster/demothis"
  tags.1.propagate_at_launch:     "" => "1"
  tags.1.value:                   "" => "owned"
  tags.2.%:                       "" => "3"
  tags.2.key:                     "" => "k8s.io/cluster-autoscaler/disabled"
  tags.2.propagate_at_launch:     "" => "0"
  tags.2.value:                   "" => "true"
  tags.3.%:                       "" => "3"
  tags.3.key:                     "" => "k8s.io/cluster-autoscaler/demothis"
  tags.3.propagate_at_launch:     "" => "0"
  tags.3.value:                   "" => ""
  tags.4.%:                       "" => "3"
  tags.4.key:                     "" => "k8s.io/cluster-autoscaler/node-template/resources/ephemeral-storage"
  tags.4.propagate_at_launch:     "" => "0"
  tags.4.value:                   "" => "100Gi"
  tags.5.%:                       "" => "3"
  tags.5.key:                     "" => "cluster"
  tags.5.propagate_at_launch:     "" => "1"
  tags.5.value:                   "" => "demothis"
  tags.6.%:                       "" => "3"
  tags.6.key:                     "" => "environment"
  tags.6.propagate_at_launch:     "" => "1"
  tags.6.value:                   "" => "demo"
  vpc_zone_identifier.#:          "" => "3"
  vpc_zone_identifier.1666119713: "" => "subnet-x"
  vpc_zone_identifier.3712104169: "" => "subnet-y"
  vpc_zone_identifier.3798380006: "" => "subnet-z"
  wait_for_capacity_timeout:      "" => "10m"
module.demothis-cluster.create-cluster.aws_autoscaling_group.workers: Still creating... (10s elapsed)
module.demothis-cluster.create-cluster.aws_autoscaling_group.workers: Still creating... (20s elapsed)
module.demothis-cluster.create-cluster.aws_autoscaling_group.workers: Still creating... (30s elapsed)
module.demothis-cluster.create-cluster.aws_autoscaling_group.workers: Still creating... (40s elapsed)
module.demothis-cluster.module.create-cluster.aws_autoscaling_group.workers: Creation complete after 50s (ID: demothis-02019040319280139330000000e)

aws console showing it tag disabled

dpiddockcmp · 2019-04-04T09:27:43Z

You are passing the string "true", try passing the boolean value true. The problem is in the interpolation used checking against an integer:
${lookup(var.worker_groups[count.index], "autoscaling_enabled", local.workers_group_defaults["autoscaling_enabled"]) == 1 ? "enabled" : "disabled" }

pkelleratwork · 2019-04-04T14:24:14Z

siiiigh....

Thanks @dpiddockcmp - that was my problem. i failed to catch that. I went back to the documentation and it clearly states that.

I really appreciate all your help.

max-rocket-internet · 2019-04-04T14:31:30Z

We've all made these mistakes. Glad you got it sorted 🙂

github-actions · 2022-12-01T02:28:12Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

pkelleratwork closed this as completed Apr 4, 2019

github-actions bot locked as resolved and limited conversation to collaborators Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node autoscaling not scaling #324

node autoscaling not scaling #324

pkelleratwork commented Mar 27, 2019 •

edited

max-rocket-internet commented Mar 28, 2019

pkelleratwork commented Mar 28, 2019 •

edited

pkelleratwork commented Mar 28, 2019 •

edited

pkelleratwork commented Mar 28, 2019

max-rocket-internet commented Apr 3, 2019

pkelleratwork commented Apr 3, 2019

pkelleratwork commented Apr 3, 2019 •

edited

dpiddockcmp commented Apr 4, 2019

pkelleratwork commented Apr 4, 2019

max-rocket-internet commented Apr 4, 2019

github-actions bot commented Dec 1, 2022

node autoscaling not scaling #324

node autoscaling not scaling #324

Comments

pkelleratwork commented Mar 27, 2019 • edited

I have issues

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already. nope.

Environment details

Any other relevant info

max-rocket-internet commented Mar 28, 2019

pkelleratwork commented Mar 28, 2019 • edited

pkelleratwork commented Mar 28, 2019 • edited

pkelleratwork commented Mar 28, 2019

max-rocket-internet commented Apr 3, 2019

pkelleratwork commented Apr 3, 2019

pkelleratwork commented Apr 3, 2019 • edited

dpiddockcmp commented Apr 4, 2019

pkelleratwork commented Apr 4, 2019

max-rocket-internet commented Apr 4, 2019

github-actions bot commented Dec 1, 2022

pkelleratwork commented Mar 27, 2019 •

edited

pkelleratwork commented Mar 28, 2019 •

edited

pkelleratwork commented Mar 28, 2019 •

edited

pkelleratwork commented Apr 3, 2019 •

edited