Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster autoscaler cannot scale mixed nodegroup on AWS #1012

Closed
alexei-led opened this issue Jul 10, 2019 · 2 comments

Comments

@alexei-led
Copy link
Contributor

commented Jul 10, 2019

What happened?
Created a nodegroup with mixed spot instances, tried to set instanceType to mixed and leave it empty.
Cluster autoscaler reports the following error

...
Unable to build proper template node for eksctl-moon-kube-nodegroup-gpu-spot-ng-a-NodeGroup-17OCXU2R4F8ES: unable to find instance type within launch template
...

Looking at cluster autoscaler code, I see that it uses non-empty and valid instanceType in LaunchTemplate and fails if it's not specified (see error above)

Cluster autoscaler code

What you expected to happen?

I expect that a mixed nodegroup created with eksctl will support working with cluster autoscaler

How to reproduce it?
Create a mixed NG with spot instances. Deploy a workload that forces CA to scale up, see above error.

Anything else we need to know?
MacOS
EKS cluster
Cluster autoscaler k8s.gcr.io/cluster-autoscaler:v1.15.0

Versions
Please paste in the output of these commands:

$ eksctl version
version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.39"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-eks-c57ff8", GitCommit:"c57ff8e35590932c652433fab07988da79265d5b", GitTreeState:"clean", BuildDate:"2019-06-07T20:43:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Logs
Include the output of the command line when running eksctl. If possible, eksctl should be run with debug logs. For example:
eksctl get clusters -v 4
Make sure you redact any sensitive information before posting.
If the output is long, please consider a Gist.

@alexei-led alexei-led added the kind/bug label Jul 10, 2019

alexei-led added a commit to alexei-led/eksctl that referenced this issue Jul 10, 2019

@errordeveloper errordeveloper added this to the 0.2.0 milestone Jul 11, 2019

@scottyhq

This comment has been minimized.

Copy link

commented Jul 15, 2019

This new setup is a very welcome feature! Just want to add that we ran into a similar error trying to use autoscaling from 0 nodes with this new setup (eksctl 1.40, kubernetes 1.13, autoscaler 1.13.5):

W0714 03:27:33.258081       1 aws_manager.go:194] Found multiple availability zones for ASG "eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH"; using us-west-2a
E0714 03:27:33.258106       1 utils.go:291] Unable to build proper template node for eksctl-pangeo-esip-nodegroup-dask-worker-NodeGroup-6UG2LS4KNTPH: Unable to get instance type from launch config or launch template

node-config:

  - name: dask-worker
    minSize: 0
    maxSize: 100
    instancesDistribution:
      instanceTypes: ["r5.2xlarge", "r5a.2xlarge", "r4.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 10
      spotInstancePools: 3
    volumeSize: 100
    volumeType: gp2
    labels:
      node-role.kubernetes.io/role: dask-worker
      k8s.dask.org/node-purpose: worker
    taints:
      k8s.dask.org/dedicated: 'worker:NoSchedule'
    desiredCapacity: 0
    ami: auto
    amiFamily: AmazonLinux2
    ssh:
      publicKeyPath: eks-pangeo-esip-us-west-2.pub
    iam:
      withAddonPolicies:
          autoScaler: true
          efs: true

Might want add a bit to documentation about use with autoscaler https://eksctl.io/usage/spot-instances/.

It seems like this should work (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#using-autoscalinggroup-mixedinstancespolicy). But it also seems like there are still many open issues (for example kubernetes/autoscaler#1754, aws/containers-roadmap#144 and others...)

errordeveloper added a commit that referenced this issue Jul 17, 2019
Merge pull request #1013 from alexei-led/master
fixing #1012, cluster autoscaler requres a valid LT intance type
@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 18, 2019

Closed via #1013.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.