Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when creating nodegroup with --node-volume-type set to io1 #1006

Closed
martilea opened this issue Jul 9, 2019 · 5 comments · Fixed by #1016

Comments

@martilea
Copy link

commented Jul 9, 2019

What happened?
I was trying to create an EKS cluster with a node group using EBS volume type set to io1, but I got the following error message:

...
[✖]  AWS::AutoScaling::AutoScalingGroup/NodeGroup: CREATE_FAILED – "You must use a valid fully-formed launch template. The parameter iops must be specified for io1 volumes. (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: 328705c4-a225-11e9-b5e3-8797aee52925)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
...

What you expected to happen?
To create an EKS cluster with a node group using EBS Provisioned IOPS SSD (io1).

How to reproduce it?
Try to create an EKS cluster and one nodegroup by specifying the parameter "--node-volume-type=io1"

eksctl create cluster --node-volume-size=50 --node-volume-type=io1 --nodes 1 --region eu-west-1
...
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::AutoScaling::AutoScalingGroup/NodeGroup: CREATE_FAILED – "You must use a valid fully-formed launch template. The parameter iops must be specified for io1 volumes. (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: 328705c4-a225-11e9-b5e3-8797aee52925)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
...

As you can see, there is a missing parameter "Iops" when using volume type set to io1. Checking the CloudFormation template generated by eksctl, I was able to confirm the following configuration for the NodeGroupLaunchTemplat resource:

...
"NodeGroupLaunchTemplate": {
            "Type": "AWS::EC2::LaunchTemplate",
            "Properties": {
                "LaunchTemplateData": {
                    "BlockDeviceMappings": [
                        {
                            "DeviceName": "/dev/xvda",
                            "Ebs": {
                                "Encrypted": false,
                                "VolumeSize": 50,
                                "VolumeType": "io1"
                            }
                        }
                    ],
...

Basically eksctl is not setting a value for the "Iops" property when using VolumeType set to "io1". Checking the CloudFormation documentation, you must specify a value for "Iops" when creating io1 volumes.

Anything else we need to know?
Checking the code of eksctl, I wasn't able to find where the "Iops" parameter is specified, so it seems to be a bug when using io1 volumes.

Versions
Please paste in the output of these commands:

$ eksctl version
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.38"}

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-27T15:15:05Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.6-eks-d69f1b", GitCommit:"d69f1bf3669bf00b7f4a758e978e0e7a1e3a68f7", GitTreeState:"clean", BuildDate:"2019-02-28T20:26:10Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Logs

eksctl create cluster --node-volume-size 50 --node-volume-type io1 --nodes 1 --region eu-west-1
[ℹ]  using region eu-west-1
[ℹ]  setting availability zones to [eu-west-1b eu-west-1c eu-west-1a]
[ℹ]  subnets for eu-west-1b - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ]  subnets for eu-west-1c - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ]  subnets for eu-west-1a - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ]  nodegroup "ng-860a3cf2" will use "ami-091fc251b67b776c3" [AmazonLinux2/1.12]
[ℹ]  creating EKS cluster "scrumptious-sheepdog-1562660811" in "eu-west-1" region
[ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-west-1 --name=scrumptious-sheepdog-1562660811'
[ℹ]  2 sequential tasks: { create cluster control plane "scrumptious-sheepdog-1562660811", create nodegroup "ng-860a3cf2" }
[ℹ]  building cluster stack "eksctl-scrumptious-sheepdog-1562660811-cluster"
[ℹ]  deploying stack "eksctl-scrumptious-sheepdog-1562660811-cluster"
[ℹ]  building nodegroup stack "eksctl-scrumptious-sheepdog-1562660811-nodegroup-ng-860a3cf2"
[ℹ]  --nodes-min=1 was set automatically for nodegroup ng-860a3cf2
[ℹ]  --nodes-max=1 was set automatically for nodegroup ng-860a3cf2
[ℹ]  deploying stack "eksctl-scrumptious-sheepdog-1562660811-nodegroup-ng-860a3cf2"
[✖]  unexpected status "ROLLBACK_IN_PROGRESS" while waiting for CloudFormation stack "eksctl-scrumptious-sheepdog-1562660811-nodegroup-ng-860a3cf2"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[✖]  AWS::AutoScaling::AutoScalingGroup/NodeGroup: CREATE_FAILED – "You must use a valid fully-formed launch template. The parameter iops must be specified for io1 volumes. (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: 328705c4-a225-11e9-b5e3-8797aee52925)"
[ℹ]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-west-1 --name=scrumptious-sheepdog-1562660811'
[✖]  waiting for CloudFormation stack "eksctl-scrumptious-sheepdog-1562660811-nodegroup-ng-860a3cf2" to reach "CREATE_COMPLETE" status: ResourceNotReady: failed waiting for successful resource state
[✖]  failed to create cluster "scrumptious-sheepdog-1562660811"

@martilea martilea added the kind/bug label Jul 9, 2019

@cPu1

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

Nice find, thanks for creating an issue.

The node group schema doesn't have a field to specify the IOPS and a sensible default can't be chosen here as if you're choosing an io1 volume type, you probably know what IOPS you want provisioned.

We should add a VolumeIOPS type to the NodeGroup schema to allow configuring IOPS for io1 volumes but it'll be ignored for other volume types.

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

@cPu1 this sounds good to me, let's fix this for 0.2.0! It makes sense to include, as this is a "feature" that we've supported for a long time, but clearly we've never tested it properly.

@errordeveloper errordeveloper added this to the 0.2.0 milestone Jul 9, 2019

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

I think we should only add a config field and avoid adding a flag that will depend on another flag. So when user sets --node-volume-type=io1, we will need to say they should use config file.

We should also make sure we document all of volume* fields properly in a new section in the docs.

With regards to default value, I think it'd be fine to pick the maximum, as long as we document it.

@cPu1

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

I think we should only add a config field and avoid adding a flag that will depend on another flag. So when user sets --node-volume-type=io1, we will need to say they should use config file.

Agreed since we have been trying to minimise usage of flags.

With regards to default value, I think it'd be fine to pick the maximum, as long as we document it.

I'm not sure I agree with using the maximum value as that'd cost more $ and it also depends on the instance type as documented here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html#EBSVolumeTypes_piops

You can provision from 100 IOPS up to 64,000 IOPS per volume on Nitro-based Instances instances and up to 32,000 on other instances

This means we'd have to set a default value based on the instance type and maintain a list of such instances that'd get updated over time.

I think we should make that field required and not set a default value.

@cPu1 cPu1 self-assigned this Jul 9, 2019

@errordeveloper

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

I think we should make that field required and not set a default value.

Sure, let's do that. I didn't realise it depends on the instance type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.