NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. #2149

arnav13081994 · 2022-07-06T15:38:22Z

Description

I followed the docs and have exhausted all the resources online but still not able to create an EKS cluster with EKS Managed Nodes. I always get the following error:

│ Error: error waiting for EKS Node Group (eks-dev-eks-cluster:default_node_group-2022070609081040940000000f) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 2 errors occurred:
│       * eks-default_node_group-2022070609081040940000000f-ecc0e962-29c0-7802-83d8-213eca9d1cd7: AsgInstanceLaunchFailures: You've reached your quota for maximum Fleet Requests for this account. Launching EC2 instance failed.
│       * DUMMY_04f2c42f-98d6-428c-aed2-95deada02ad2, DUMMY_46fee16c-8052-4fc7-a170-522943edc191, DUMMY_4ff890be-596d-4370-85eb-56146cc1b5ea, DUMMY_c94e1a5e-9bce-42c2-bc7b-7b24db9216f5, DUMMY_d36b7c25-3716-4b50-92e7-ac48c400e33a, DUMMY_fa1ffb54-1a20-4a9e-b302-e31db512548c: NodeCreationFailure: Instances failed to join the kubernetes cluster

Versions

Terraform version: ~> 1.2.3

Provider version(s):

aws = {
  version = "~> 4.21.0"
}
kubernetes = {
  version = "~>2.12.0"
}

Reproduction Code [Required]

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.21.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~>2.12.0"
    }
  }
  required_version = "~> 1.2.3"
}


provider "aws" {
  profile = "..."
  region  = local.region

  default_tags {
    tags = {
      Environment = "Staging"
      Terraform   = "True"
    }
  }
}

#
# Housekeeping
#

locals {
  project_name    = "eks-dev"
  cluster_name    = "${local.project_name}-eks-cluster"
  cluster_version = "1.21"
  region          = "us-west-1"
}


/*
The following 2 data resources are used get around the fact that we have to wait
for the EKS cluster to be initialised before we can attempt to authenticate.
*/

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}
#############################################################################################
#############################################################################################

# Create EKS Cluster
#############################################################################################
#############################################################################################
# Create VPC for EKS Cluster
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.0"

  name = local.cluster_name
  cidr = "10.0.0.0/16"

  azs             = ["${local.region}a", "${local.region}b"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnets  = ["10.0.3.0/24", "10.0.4.0/24"]


  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  enable_flow_log                      = true
  create_flow_log_cloudwatch_iam_role  = true
  create_flow_log_cloudwatch_log_group = true

  public_subnet_tags = {
    "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                      = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"             = "1"
  }
}


resource "aws_security_group" "additional" {
  name_prefix = "${local.cluster_name}-additional"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"
    cidr_blocks = [
      "10.0.0.0/8",
      "172.16.0.0/12",
      "192.168.0.0/16",
    ]
  }
}




module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.17.0"

  cluster_name    = local.cluster_name
  cluster_version = local.cluster_version

  cluster_endpoint_public_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets


  eks_managed_node_group_defaults = {
    ami_type                              = "AL2_x86_64"
    disk_size                             = 50
    attach_cluster_primary_security_group = true
    vpc_security_group_ids                = [aws_security_group.additional.id]
  }
  eks_managed_node_groups = {
    first = {
      desired_size = 1
      max_size     = 1
      min_size     = 1
    }
  }
}

Steps to reproduce the behavior:

Just run terraform apply --auto-approve and after waiting for about 20 minutes you will see the aforementioned error.

Expected behavior

The eks cluster with 1 EKS managed group gets created.

Actual behavior

The following error is thrown:

│ Error: error waiting for EKS Node Group (eks-dev-eks-cluster:default_node_group-2022070609081040940000000f) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 2 errors occurred:
│       * eks-default_node_group-2022070609081040940000000f-ecc0e962-29c0-7802-83d8-213eca9d1cd7: AsgInstanceLaunchFailures: You've reached your quota for maximum Fleet Requests for this account. Launching EC2 instance failed.
│       * DUMMY_04f2c42f-98d6-428c-aed2-95deada02ad2, DUMMY_46fee16c-8052-4fc7-a170-522943edc191, DUMMY_4ff890be-596d-4370-85eb-56146cc1b5ea, DUMMY_c94e1a5e-9bce-42c2-bc7b-7b24db9216f5, DUMMY_d36b7c25-3716-4b50-92e7-ac48c400e33a, DUMMY_fa1ffb54-1a20-4a9e-b302-e31db512548c: NodeCreationFailure: Instances failed to join the kubernetes cluster

Additional context

I have read other similar issues and have experimented with iam_role_attach_cni_policy = true but still get the same issue. Any help would be greatly appreciated. This has been extremely frustrating for me.

The text was updated successfully, but these errors were encountered:

tanvp112 · 2022-07-11T02:45:51Z

AsgInstanceLaunchFailures: You've reached your quota for maximum Fleet Requests for this account.

Maybe you need to raise the fleet quota.

arnav13081994 · 2022-07-11T03:18:01Z

@tanvp112 Im not sure if this is about any quota increase as I'm creating just 1 node.

Have you faced the same issue?

sebastianmacarescu · 2022-08-03T13:52:42Z

I have the same issue. Anybody knows why?

arnav13081994 · 2022-08-03T14:02:32Z

@sebastianmacarescu

The following config worked for me. I still don't know why it worked though. There seems to be some race condition

terraform {
  required_version = "~> 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}


provider "aws" {
  region  = "us-east-1"
  profile = "ADD NAME OF AWS PROFILE OR SET CREDS EXPLICITLY"
}

data "aws_eks_cluster" "default" {
  name = module.eks_default.cluster_id
  depends_on = [
    module.eks_default.cluster_id,
  ]
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks_default.cluster_id
  depends_on = [
    module.eks_default.cluster_id,
  ]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.default.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.default.token
  }
}

################################################################################
# Common Locals
################################################################################

locals {
  # Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
  partition = data.aws_partition.current.partition
}

################################################################################
# Common Data
################################################################################

data "aws_partition" "current" {}
data "aws_caller_identity" "current" {}

################################################################################
# Common Modules
################################################################################

module "tags" {
  # tflint-ignore: terraform_module_pinned_source
  source = "github.com/clowdhaus/terraform-tags"

  application = "someclustername"
  environment = "nonprod"
  repository  = "https://github.com/clowdhaus/eks-reference-architecture"
}


################################################################################
# EKS Modules
################################################################################

module "vpc" {
  # https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.12"

  name = "someclustername"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = true
  one_nat_gateway_per_az = false

  enable_dns_hostnames = true

  manage_default_network_acl    = true
  default_network_acl_tags      = { Name = "someclustername-default" }
  manage_default_route_table    = true
  default_route_table_tags      = { Name = "someclustername-default" }
  manage_default_security_group = true
  default_security_group_tags   = { Name = "someclustername-default" }

  public_subnet_tags = {
    "kubernetes.io/cluster/someclustername-default" = "shared"
    "kubernetes.io/role/elb"                    = 1
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/someclustername-default" = "shared"
    "kubernetes.io/role/internal-elb"           = 1
  }

  tags = module.tags.tags
}




module "eks_default" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.26"

  cluster_name    = "someclustername-default"
  cluster_version = "1.22"

  # EKS Addons
  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }
    kube-proxy = {}
    vpc-cni = {
      resolve_conflicts = "OVERWRITE"
    }
  }

  # Encryption key
  create_kms_key = true
  cluster_encryption_config = [{
    resources = ["secrets"]
  }]

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    default = {
      # By default, the module creates a launch template to ensure tags are propagated to instances, etc.,
      # so we need to disable it to use the default template provided by the AWS EKS managed node group service
      create_launch_template = false
      launch_template_name   = ""

      # list of pods per instance type: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt
      # or run: kubectl get node -o yaml | grep pods
      instance_types = ["t2.xlarge"]
      disk_size      = 50

      # Is deprecated and will be removed in v19.x
      create_security_group = false

      min_size     = 1
      max_size     = 3
      desired_size = 1

      update_config = {
        max_unavailable_percentage = 33
      }
    }
  }

  tags = module.tags.tags
}

AmitKulkarni9 · 2022-09-13T15:08:52Z

@arnav13081994
I am getting the same error.
Error: error waiting for EKS Node Group (devopsthehardway-cluster:devopsthehardway-workernodes) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 2 errors occurred:
│ * eks-devopsthehardway-workernodes-74c19ba3-f519-395b-e417-a16c178036c0: AsgInstanceLaunchFailures: You've reached your quota for maximum Fleet Requests for this account. Launching EC2 instance failed.
│ * DUMMY_085e351a-269f-4d54-b838-916f649a9cce, DUMMY_187d8572-0d47-4f5b-8986-4bfd680b3b93, DUMMY_2dc892c6-fce4-4c83-a29b-9b1f714e5adf, DUMMY_a4f7ff66-b607-4c59-9585-a3be5dd0cdf5, DUMMY_a53dbd59-1a80-4207-af6a-ab72e6421fe1: NodeCreationFailure: Instances failed to join the kubernetes cluster

Below is my code
`terraform {
backend "s3" {
bucket = "terraform-state-amtoyadevopsthehardway"
key = "eks-terraform-workernodes.tfstate"
region = "ap-southeast-2"
}
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}

IAM Role for EKS to have access to the appropriate resources

resource "aws_iam_role" "eks-iam-role" {
name = "devopsthehardway-eks-iam-role"

path = "/"

assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF

}

Attach the IAM policy to the IAM role

resource "aws_iam_role_policy_attachment" "AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks-iam-role.name
}
resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerRegistryReadOnly-EKS" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks-iam-role.name
}

Create the EKS cluster

resource "aws_eks_cluster" "devopsthehardway-eks" {
name = "devopsthehardway-cluster"
role_arn = aws_iam_role.eks-iam-role.arn

vpc_config {
subnet_ids = [var.subnet_id_1, var.subnet_id_2]
}

depends_on = [
aws_iam_role.eks-iam-role,
]
}

Worker Nodes

resource "aws_iam_role" "workernodes" {
name = "eks-node-group-example"

assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}

resource "aws_iam_role_policy_attachment" "AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "EC2InstanceProfileForImageBuilderECRContainerBuilds" {
policy_arn = "arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilderECRContainerBuilds"
role = aws_iam_role.workernodes.name
}

resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.workernodes.name
}

resource "aws_eks_node_group" "worker-node-group" {
cluster_name = aws_eks_cluster.devopsthehardway-eks.name
node_group_name = "devopsthehardway-workernodes"
node_role_arn = aws_iam_role.workernodes.arn
subnet_ids = [var.subnet_id_1, var.subnet_id_2]
instance_types = ["t3.xlarge"]

scaling_config {
desired_size = 1
max_size = 1
min_size = 1
}

depends_on = [
aws_iam_role_policy_attachment.AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.AmazonEKS_CNI_Policy,
#aws_iam_role_policy_attachment.AmazonEC2ContainerRegistryReadOnly,
]
}`

Chakki1301 · 2022-09-16T01:02:39Z

Same error. It's new AWS account with very few EC2. Something else is wrong when done via TF automation or eksctl.

unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 2 errors occurred:
│ * eks-managed-ondemand-20220916000118777000000009-90c1a1cc-c40a-4d69-60fa-d40ad3479549: AsgInstanceLaunchFailures: You've reached your quota for maximum Fleet Requests for this account. Launching EC2 instance failed.

lauren-themis · 2022-09-21T20:58:00Z

Same error - tried on 4.24.0 and 4.31.0. Why is this closed?

Jaysins · 2022-10-11T22:53:04Z

Anyone figured this out?

chinchalinchin · 2022-10-23T12:20:39Z

I am receiving this error as well. In the CloudTrail logs for the RunInstances API call that EKS makes when provisioning new nodes, it appears this related to how the EC2 Instance Profile is attached to the Node,

{
"errorCode": "Client.InvalidParameterValue",
"errorMessage": "Value (eks-xxxx) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name
}

A possible workaround is creating your own EC2 launch template and then using that in the node_group definition; however, you would need to replicate the launch template EKS uses by default: https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html

I have not yet been able to do this.

danvau7 · 2022-11-10T16:32:52Z

Getting the same error as well today. Currently looking into it.

Jaysins · 2022-11-10T19:48:32Z

Be sure you're not creating in a private subnet that was the issue for me.

esoxjem · 2022-11-14T11:02:32Z

[FIXED] Run the automated runbook to see the actual issue
https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html

In our case, it was an issue with security group and user data script.

danvau7 · 2022-11-15T17:21:43Z

Getting the same error as well today. Currently looking into it.

The issue was that I had restricted the cluster_endpoint_public_access_cidrs to a specific subnet. This limited the ability of the Nodes to talk to the API Endpoint. Allowing them to access the API Endpoint via their local IPs solved this issue. Thus I just needed to add the following code to make this error go away:

cluster_endpoint_private_access = true

swananddhole · 2022-12-06T18:22:58Z

@danvau7 I'm getting error even after setting the cluster_endpoint_private_access to true. Can anyone help out here, it's really frustrating.

github-actions · 2023-01-08T02:14:53Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

arnav13081994 changed the title ~~NodeCreationFailure: Instances failed to join the kubernetes cluster~~ NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. Jul 7, 2022

bryantbiggs added the question label Jul 11, 2022

arnav13081994 closed this as completed Aug 3, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. #2149

NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. #2149

arnav13081994 commented Jul 6, 2022

tanvp112 commented Jul 11, 2022

arnav13081994 commented Jul 11, 2022

sebastianmacarescu commented Aug 3, 2022

arnav13081994 commented Aug 3, 2022

AmitKulkarni9 commented Sep 13, 2022

Chakki1301 commented Sep 16, 2022

lauren-themis commented Sep 21, 2022

Jaysins commented Oct 11, 2022

chinchalinchin commented Oct 23, 2022 •

edited

danvau7 commented Nov 10, 2022

Jaysins commented Nov 10, 2022

esoxjem commented Nov 14, 2022 •

edited

danvau7 commented Nov 15, 2022

swananddhole commented Dec 6, 2022

github-actions bot commented Jan 8, 2023

NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. #2149

NodeCreationFailure: Instances failed to join the kubernetes cluster. This is happening on a fresh cluster. #2149

Comments

arnav13081994 commented Jul 6, 2022

Description

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

Additional context

tanvp112 commented Jul 11, 2022

arnav13081994 commented Jul 11, 2022

sebastianmacarescu commented Aug 3, 2022

arnav13081994 commented Aug 3, 2022

AmitKulkarni9 commented Sep 13, 2022

IAM Role for EKS to have access to the appropriate resources

Attach the IAM policy to the IAM role

Create the EKS cluster

Worker Nodes

Chakki1301 commented Sep 16, 2022

lauren-themis commented Sep 21, 2022

Jaysins commented Oct 11, 2022

chinchalinchin commented Oct 23, 2022 • edited

danvau7 commented Nov 10, 2022

Jaysins commented Nov 10, 2022

esoxjem commented Nov 14, 2022 • edited

danvau7 commented Nov 15, 2022

swananddhole commented Dec 6, 2022

github-actions bot commented Jan 8, 2023

chinchalinchin commented Oct 23, 2022 •

edited

esoxjem commented Nov 14, 2022 •

edited