Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to create new cluster with service role error #2182

Closed
s-tokutake opened this issue May 14, 2020 · 17 comments
Closed

Fail to create new cluster with service role error #2182

s-tokutake opened this issue May 14, 2020 · 17 comments
Labels

Comments

@s-tokutake
Copy link

s-tokutake commented May 14, 2020

What happened?

Fail to create new cluster.

Error messge is Role with arn: arn:aws:iam::xxxxxxxxxxxxx:role/eksctl-prd-cluster-ServiceRole-xxxxxxxxx, could not be assumed because it does not exist or the trusted entity is not correct (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: fc9b7780-6fb4-4620-8943-b523bxxxxxxx)

What you expected to happen?

To create cluster successfully.

How to reproduce it?

exec eksctl create cluster -f cluster.yaml

cluster.yaml is below.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: prd-cluster
  region: ap-northeast-1
  version: "1.16"

vpc:
  id: "vpc-xxxxxxx"
  cidr: "10.30.0.0/16"
  subnets:
    private:
      ap-northeast-1a:
        id: "subnet-xxxxxxx"
        cidr: "10.30.80.0/22"

      ap-northeast-1c:
        id: "subnet-yyyyyyy"
        cidr: "10.30.84.0/22"

      ap-northeast-1d:
        id: "subnet-zzzzzzz"
        cidr: "10.30.88.0/22"

nodeGroups:
  - name: ng
    labels: {role: workers}
    tags: {Stack: production, Site: ikyucom, Role: eks-node, k8s.io/cluster-autoscaler/prd-cluster: owned, k8s.io/cluster-autoscaler/enabled: "true"}
    instanceType: c5.large
    desiredCapacity: 4
    maxSize: 8
    privateNetworking: true
    securityGroups:
      attachIDs: [sg-xxxxxx,sg-zzzzzzz]
      withShared: true
    ssh:
      allow: true
      publicKeyPath: publickey
    iam:
      instanceProfileARN: "arn:aws:iam::xxxxxxxx:instance-profile/eks-node-instance-role"
      instanceRoleARN: "arn:aws:iam::xxxxxxxx:role/eks-node-instance-role"
    kubeletExtraConfig:
      kubeReserved:
        cpu: "250m"
        memory: "250Mi"
        ephemeral-storage: "1Gi"
      systemReserved:
        cpu: "250m"
        memory: "250Mi"
        ephemeral-storage: "1Gi"

Anything else we need to know?

  • This error occurs with version 0.18 and 0.19 .
  • Using eksctl version 0.17, exec the same command with the same yaml succeed.
  • As far as i see the debug log( --verbose 5), the ServiceRole is created successfully, but Create ControlPlane by using the ServiceRole fail.
<member>
        <EventId>ControlPlane-CREATE_FAILED-2020-05-14T01:40:14.585Z</EventId>
        <PhysicalResourceId/>
        <ResourceStatus>CREATE_FAILED</ResourceStatus>
        <ResourceStatusReason>Role with arn: arn:aws:iam::xxxxxxxx:role/eksctl-prd-cluster-ServiceRole-xxxxxxxx, could not be assumed because it does not exist or the trusted entity is not correct (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 8bf4847f-169b-4a94-9e5d-24eff9xxx)</ResourceStatusReason>
        <ResourceProperties>{&quot;Version&quot;:&quot;1.16&quot;,&quot;ResourcesVpcConfig&quot;:{&quot;SecurityGroupIds&quot;:[&quot;sg-xxxxxxxx&quot;],&quot;SubnetIds&quot;:[&quot;subnet-xxxxxxxx&quot;,&quot;subnet-yyyyyyyy&quot;,&quot;subnet-zzzzzzzz&quot;]},&quot;RoleArn&quot;:&quot;arn:aws:iam::xxxxxxxx:role/eksctl-prd-cluster-ServiceRole-xxxxxxxx&quot;,&quot;Name&quot;:&quot;prd-cluster&quot;}</ResourceProperties>
        <StackId>arn:aws:cloudformation:**************:xxxxxxxx:stack/eksctl-prd-cluster/b67b74e0-9583-11ea-86f6-0e42406983d0</StackId>
        <StackName>eksctl-prd-cluster</StackName>
        <LogicalResourceId>ControlPlane</LogicalResourceId>
        <Timestamp>2020-05-14T01:40:14.585Z</Timestamp>
        <ResourceType>AWS::EKS::Cluster</ResourceType>
      </member>

....

<member>
        <EventId>ServiceRole-CREATE_COMPLETE-2020-05-14T01:40:07.002Z</EventId>
        <PhysicalResourceId>eksctl-prd-cluster-ServiceRole-xxxxxxxx</PhysicalResourceId>
        <ResourceStatus>CREATE_COMPLETE</ResourceStatus>
        <ResourceProperties>{&quot;ManagedPolicyArns&quot;:[&quot;arn:aws:iam::aws:policy/AmazonEKSClusterPolicy&quot;],&quot;AssumeRolePolicyDocument&quot;:{&quot;Version&quot;:&quot;2012-10-17&quot;,&quot;Statement&quot;:[{&quot;Action&quot;:[&quot;sts:AssumeRole&quot;],&quot;Effect&quot;:&quot;Allow&quot;,&quot;Principal&quot;:{&quot;Service&quot;:[&quot;eks.amazonaws.com&quot;,&quot;eks-fargate-pods.amazonaws.com&quot;]}}]}}</ResourceProperties>
        <StackId>arn:aws:cloudformation:**************:xxxxxxxx:stack/eksctl-prd-cluster/b67b74e0-9583-11ea-86f6-0e424069xxx</StackId>
        <StackName>eksctl-prd-cluster</StackName>
        <LogicalResourceId>ServiceRole</LogicalResourceId>
        <Timestamp>2020-05-14T01:40:07.002Z</Timestamp>
        <ResourceType>AWS::IAM::Role</ResourceType>
      </member>

Versions

  • eksctl version 0.19.0
  • kubectl version 1.18.2

Logs

[ℹ]  eksctl version 0.19.0
[ℹ]  using region **************
[✔]  using existing VPC (vpc-f94db49d) and subnets (private:[subnet-xxxxxxxxsubnet-yyyyyyyysubnet-zzzzzzzz] public:[])
[!]  custom VPC/subnets will be used; if resulting cluster doesn't function as expected, make sure to review the configuration of VPC/subnets
[ℹ]  nodegroup "ng" will use "ami-0ca8e5c318b118092" [AmazonLinux2/1.16]
[ℹ]  using EC2 key pair %!!(MISSING)q(*string=<nil>)
[ℹ]  using Kubernetes version 1.16
[ℹ]  creating EKS cluster "prd-cluster" in "**************" region with un-managed nodes
[ℹ]  1 nodegroup (ng) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s)
[ℹ]  will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=************** --cluster=prd-cluster'
[ℹ]  CloudWatch logging will not be enabled for cluster "prd-cluster" in "**************"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=************** --cluster=prd-cluster'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "prd-cluster" in "**************"
[ℹ]  2 sequential tasks: { create cluster control plane "prd-cluster", create nodegroup "ng" }
[ℹ]  building cluster stack "eksctl-prd-cluster"
[ℹ]  deploying stack "eksctl-prd-cluster"
[✖]  unexpected status "ROLLBACK_COMPLETE" while waiting for CloudFormation stack "eksctl-prd-cluster"
[ℹ]  fetching stack events in attempt to troubleshoot the root cause of the failure
[!]  AWS::IAM::Role/ServiceRole: DELETE_IN_PROGRESS
[!]  AWS::EC2::SecurityGroup/ClusterSharedNodeSecurityGroup: DELETE_IN_PROGRESS
[!]  AWS::EC2::SecurityGroup/ControlPlaneSecurityGroup: DELETE_IN_PROGRESS
[!]  AWS::IAM::Policy/PolicyNLB: DELETE_IN_PROGRESS
[!]  AWS::IAM::Policy/PolicyCloudWatchMetrics: DELETE_IN_PROGRESS
[!]  AWS::EC2::SecurityGroupIngress/IngressInterNodeGroupSG: DELETE_IN_PROGRESS
[✖]  AWS::IAM::Policy/PolicyNLB: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::IAM::Policy/PolicyCloudWatchMetrics: CREATE_FAILED – "Resource creation cancelled"
[✖]  AWS::EKS::Cluster/ControlPlane: CREATE_FAILED – "Role with arn: arn:aws:iam::xxxxxxxx:role/eksctl-prd-cluster-ServiceRole-xxxxxxxx, could not be assumed because it does not exist or the trusted entity is not correct (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: fc9b7780-6fb4-4620-8943-b523b192xxxx)"
[!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ]  to cleanup resources, run 'eksctl delete cluster --region=************** --name=prd-cluster'
[✖]  waiting for CloudFormation stack "eksctl-prd-cluster": ResourceNotReady: failed waiting for successful resource state
Error: failed to create cluster "prd-cluster"
@mikemountjoy
Copy link

Hit this error today.

I added the following to my IAM Policy to successfully create the required role:

    {
            "Sid": "VisualEditor6",
            "Effect": "Allow",
            "Action": [
                "iam:CreateInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:RemoveRoleFromInstanceProfile",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "iam:ListInstanceProfiles",
                "iam:AddRoleToInstanceProfile",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:CreateServiceLinkedRole",
                "iam:DetachRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:DeleteServiceLinkedRole",
                "iam:GetRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/eksctl-*",
                "arn:aws:iam::*:instance-profile/eksctl-*"
            ]
            }

@superbspeed
Copy link

Hit this error today.

I added the following to my IAM Policy to successfully create the required role:

    {
            "Sid": "VisualEditor6",
            "Effect": "Allow",
            "Action": [
                "iam:CreateInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:RemoveRoleFromInstanceProfile",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "iam:ListInstanceProfiles",
                "iam:AddRoleToInstanceProfile",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:CreateServiceLinkedRole",
                "iam:DetachRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:DeleteServiceLinkedRole",
                "iam:GetRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/eksctl-*",
                "arn:aws:iam::*:instance-profile/eksctl-*"
            ]
            }

I have these permissions in place but still got the error.

@lgg42
Copy link

lgg42 commented May 22, 2020

Same thing over here, same permissions and still have the error.

@lgg42
Copy link

lgg42 commented May 22, 2020

I also tried with these suggested IAM permissions from another thread, same results as yours.

#204 (comment)
#204 (comment)

@lgg42
Copy link

lgg42 commented May 22, 2020

I just tried right now temporally setting the AWS Managed policy "AdministratorAccess" to the eksctl IAM user and everything worked as expected, so, maybe we can say this is permissions configuration issue 🤷‍♂️ point is... which ones are we missing...

@mikemountjoy
Copy link

mikemountjoy commented May 22, 2020

The following policy allows me to deploy an EKS cluster using ec2 spot instances using eksctl version 0.19.0

IAM Polcy

{
"Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor100",
            "Effect": "Allow",
            "Action": [
                "ssm:PutParameter",
                "ssm:DeleteParameter",
                "ssm:GetParameterHistory",
                "ssm:GetParametersByPath",
                "ssm:GetParameters",
                "ssm:GetParameter",
                "ssm:DeleteParameters"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor101",
            "Effect": "Allow",
            "Action": "ssm:DescribeParameters",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "autoscaling:CreateLaunchConfiguration",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:DeleteLaunchConfiguration",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:DeleteAutoScalingGroup",
                "autoscaling:CreateAutoScalingGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "cloudformation:*",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "ec2:DeleteInternetGateway",
            "Resource": "arn:aws:ec2:*:*:internet-gateway/*"
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:DeleteSubnet",
                "ec2:DescribeAddresses",
                "ec2:DeleteTags",
                "ec2:CreateNatGateway",
                "ec2:CreateVpc",
                "ec2:AttachInternetGateway",
                "ec2:DescribeVpcAttribute",
                "ec2:DeleteRouteTable",
                "ec2:AssociateRouteTable",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeAvailabilityZones",
                "ec2:CreateRoute",
                "ec2:CreateInternetGateway",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:CreateSecurityGroup",
                "ec2:ModifyVpcAttribute",
                "ec2:DeleteInternetGateway",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeRouteTables",
                "ec2:ReleaseAddress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:DeleteLaunchTemplate",
                "ec2:ImportKeyPair",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeTags",
                "ec2:CreateTags",
                "ec2:DeleteRoute",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:CreateRouteTable",
                "ec2:RunInstances",
                "ec2:DetachInternetGateway",
                "ec2:DescribeNatGateways",
                "ec2:DisassociateRouteTable",
                "ec2:AllocateAddress",
                "ec2:DescribeSecurityGroups",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DescribeImages",
                "ec2:CreateLaunchTemplate",
                "ec2:DescribeVpcs",
                "ec2:DescribeImageAttribute",
                "ec2:DeleteSecurityGroup",
                "ec2:DeleteNatGateway",
                "ec2:DeleteVpc",
                "ec2:CreateSubnet",
                "ec2:DescribeSubnets",
                "ec2:ModifySubnetAttribute"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor4",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:GetDownloadUrlForLayer",
                "ecr:InitiateLayerUpload",
                "ecr:ListImages",
                "ecr:PutImage",
                "ecr:UploadLayerPart"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor41",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken" 
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor5",
            "Effect": "Allow",
            "Action": "eks:*",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor6",
            "Effect": "Allow",
            "Action": [
                "iam:CreateInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:RemoveRoleFromInstanceProfile",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "iam:ListInstanceProfiles",
                "iam:AddRoleToInstanceProfile",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:CreateServiceLinkedRole",
                "iam:DetachRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:DeleteServiceLinkedRole",
                "iam:GetRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/eksctl-*",
                "arn:aws:iam::*:instance-profile/eksctl-*"
            ]
        }
    ]
}

@lgg42
Copy link

lgg42 commented May 25, 2020

The following policy allows me to deploy an EKS cluster using ec2 spot instances using eksctl version 0.19.0

IAM Polcy

{
"Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor100",
            "Effect": "Allow",
            "Action": [
                "ssm:PutParameter",
                "ssm:DeleteParameter",
                "ssm:GetParameterHistory",
                "ssm:GetParametersByPath",
                "ssm:GetParameters",
                "ssm:GetParameter",
                "ssm:DeleteParameters"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor101",
            "Effect": "Allow",
            "Action": "ssm:DescribeParameters",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "autoscaling:CreateLaunchConfiguration",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:DeleteLaunchConfiguration",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:DeleteAutoScalingGroup",
                "autoscaling:CreateAutoScalingGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "cloudformation:*",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "ec2:DeleteInternetGateway",
            "Resource": "arn:aws:ec2:*:*:internet-gateway/*"
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:DeleteSubnet",
                "ec2:DescribeAddresses",
                "ec2:DeleteTags",
                "ec2:CreateNatGateway",
                "ec2:CreateVpc",
                "ec2:AttachInternetGateway",
                "ec2:DescribeVpcAttribute",
                "ec2:DeleteRouteTable",
                "ec2:AssociateRouteTable",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeAvailabilityZones",
                "ec2:CreateRoute",
                "ec2:CreateInternetGateway",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:CreateSecurityGroup",
                "ec2:ModifyVpcAttribute",
                "ec2:DeleteInternetGateway",
                "ec2:DescribeKeyPairs",
                "ec2:DescribeRouteTables",
                "ec2:ReleaseAddress",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:DeleteLaunchTemplate",
                "ec2:ImportKeyPair",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeTags",
                "ec2:CreateTags",
                "ec2:DeleteRoute",
                "ec2:DescribeLaunchTemplateVersions",
                "ec2:CreateRouteTable",
                "ec2:RunInstances",
                "ec2:DetachInternetGateway",
                "ec2:DescribeNatGateways",
                "ec2:DisassociateRouteTable",
                "ec2:AllocateAddress",
                "ec2:DescribeSecurityGroups",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DescribeImages",
                "ec2:CreateLaunchTemplate",
                "ec2:DescribeVpcs",
                "ec2:DescribeImageAttribute",
                "ec2:DeleteSecurityGroup",
                "ec2:DeleteNatGateway",
                "ec2:DeleteVpc",
                "ec2:CreateSubnet",
                "ec2:DescribeSubnets",
                "ec2:ModifySubnetAttribute"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor4",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:GetDownloadUrlForLayer",
                "ecr:InitiateLayerUpload",
                "ecr:ListImages",
                "ecr:PutImage",
                "ecr:UploadLayerPart"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor41",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken" 
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor5",
            "Effect": "Allow",
            "Action": "eks:*",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor6",
            "Effect": "Allow",
            "Action": [
                "iam:CreateInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:GetRole",
                "iam:GetInstanceProfile",
                "iam:RemoveRoleFromInstanceProfile",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "iam:ListInstanceProfiles",
                "iam:AddRoleToInstanceProfile",
                "iam:ListInstanceProfilesForRole",
                "iam:PassRole",
                "iam:CreateServiceLinkedRole",
                "iam:DetachRolePolicy",
                "iam:DeleteRolePolicy",
                "iam:DeleteServiceLinkedRole",
                "iam:GetRolePolicy"
            ],
            "Resource": [
                "arn:aws:iam::*:role/eksctl-*",
                "arn:aws:iam::*:instance-profile/eksctl-*"
            ]
        }
    ]
}

Thank youuuuu!!! hehehe, this set of permissions worked for me. I'm also using Spot instances and public/private endpoint with IP whitelisting.

Did you got them with trial and error or from some doc/place?

@mikemountjoy
Copy link

You are very welcome. 95% of the work was done by these fine folks #204.

When you delete your cluster please double check the AWS Console and make sure the Cloudformation stacks which we created by eksctl are dropped cleanly.

I have been caught out in the past and been left with a bill I didn't expect! Cloudwatch billing events are essential as costs can run away with themselves.

Debugging these permissions was a case of watching Cloudformation Events , seeing the failures, understanding what was going on, updating my IAM Policy and going around the loop again.

I really wish eksctl.io would publish an IAM Policy on their site and this would have been a whole lot easier.

@lgg42
Copy link

lgg42 commented May 27, 2020

You are very welcome. 95% of the work was done by these fine folks #204.

When you delete your cluster please double check the AWS Console and make sure the Cloudformation stacks which we created by eksctl are dropped cleanly.

I have been caught out in the past and been left with a bill I didn't expect! Cloudwatch billing events are essential as costs can run away with themselves.

Debugging these permissions was a case of watching Cloudformation Events , seeing the failures, understanding what was going on, updating my IAM Policy and going around the loop again.

I really wish eksctl.io would publish an IAM Policy on their site and this would have been a whole lot easier.

Man that's a lot of work! (trial and error with EKS), which doesn't bootstrap as fast as a kops cluster. Thanks for that 😉

I've been checking and I always got the Cloudformation stack correctly deleted, thanks for the reminder!

@martina-if
Copy link
Contributor

Hi, thanks for reporting this! It seems this is needed quite a bit. I will work on documenting the policies in the coming days (tracked via #204).

@ryanvade
Copy link

I've tried all of the IAM policies discussed here and in #204 and still get this error

@martina-if
Copy link
Contributor

Hi @ryanvade is it the exact same error what you are getting? Can you give us more details like logs and a redacted version of the config file you used?

@ryanvade
Copy link

eksctl Version: 0.24.0

eksctl create cluster --region us-east-1 --zones=us-east-1a,us-east-1b,us-east-1c --name=test

ends up with Role with arn: arn:aws:iam::xxxxxxxx:role/eksctl-test-cluster-ServiceRole-xxxxxxxx, could not be assumed because it does not exist or the trusted entity is not correct given the different policies mentioned in this and other threads.

@martina-if
Copy link
Contributor

Hi @ryanvade I can't reproduce this error with my accounts. Can you please run the same command with -v 4 and post all the logs?

@jasonnance
Copy link

In case anyone else encounters this, I got the same error as @ryanvade with eksctl 0.27.0 and noticed the following error in CloudTrail logs:

CreateServiceLinkedRole
AccessDenied 
User: arn:aws:iam::xxx:user/MyUser is not authorized to perform: iam:CreateServiceLinkedRole on resource:
arn:aws:iam::xxx:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS

It worked after adding the below permission to the "IamLimitedAccess" policy listed in the docs after substituting our account number for "xxx":

{
    "Effect": "Allow",
    "Action": "iam:CreateServiceLinkedRole",
    "Resource": "arn:aws:iam::xxx:role/aws-service-role/eks*"
}

Redacted config file, in case it's helpful for reproducing:

kind: ClusterConfig
apiVersion: eksctl.io/v1alpha5

metadata:
  name: my-cluster
  region: us-east-1
  version: "1.17"
  tags:
    MyTag: "Tag Value"

nodeGroups:
  - name: persistent
    instanceType: m5.large
    desiredCapacity: 2
    volumeSize: 80
    preBootstrapCommands:
      - echo "preBootstrap"

  - name: compute-cpu
    minSize: 0
    maxSize: 5
    volumeSize: 80
    instancesDistribution:
      instanceTypes: ["m5.large"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
    preBootstrapCommands:
      - echo "preBootstrap"

  - name: compute-gpu
    minSize: 0
    maxSize: 1
    volumeSize: 100
    instancesDistribution:
      instanceTypes: ["g4dn.xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
    preBootstrapCommands:
      - echo "preBootstrap"

git:
  repo:
    url: <my git repo>
    branch: master
    paths:
      - base
    fluxPath: flux/
    user: gitops
    email: <my email>
  operator:
    commitOperatorManifests: true
    namespace: "flux"
    withHelm: true
  bootstrapProfile:
    source: app-dev
    revision: master
    outputPath: base/

@raphaelauv
Copy link

my EKS role was missing in

Trust relationships

    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }

@hanamurayuki
Copy link

hanamurayuki commented Dec 3, 2023

It seems error response is bad.
could not be assumed because it does not exist or the trusted entity is not correct
should be
Unable to proceed, cannot describe custom KMS key.

Releted to
aws/containers-roadmap#1533

The following KMS key's policy solved it for me.

{
            "Sid": "Allow EKS cluster role to view the key during the updates",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::XXX:role/XXX"
            },
            "Action": "kms:DescribeKey",
            "Resource": "*"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants