Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: The configmap "aws-auth" does not exist when deploying an EKS cluster with manage_aws_auth_configmap = true #2009

Closed
1 task done
ezzatron opened this issue Apr 11, 2022 · 73 comments
Labels

Comments

@ezzatron
Copy link

Description

When deploying an EKS cluster using manage_aws_auth_configmap = true, the deploy fails with the error:

Error: The configmap "aws-auth" does not exist
  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: ~> 18.20.1

  • Terraform version: Terraform Cloud

  • Provider version(s):

    • aws v4.9.0
    • cloudinit v2.2.0
    • kubernetes v2.10.0
    • tls v3.3.0

Reproduction Code [Required]

data "aws_availability_zones" "available" {}

module "vpc_example" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.14.0"

  name = "example"
  cidr = "10.10.0.0/16"

  azs                = slice(data.aws_availability_zones.available.names, 0, 3)
  enable_nat_gateway = true

  private_subnets  = ["10.10.1.0/24", "10.10.2.0/24", "10.10.3.0/24"]
  public_subnets   = ["10.10.11.0/24", "10.10.12.0/24", "10.10.13.0/24"]
  database_subnets = ["10.10.21.0/24", "10.10.22.0/24", "10.10.23.0/24"]

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

data "aws_iam_roles" "sso_breakglass" {
  name_regex  = "AWSReservedSSO_BreakGlass_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/"
}
data "aws_iam_roles" "sso_readall" {
  name_regex  = "AWSReservedSSO_ReadAll_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/"
}

module "eks_main" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.1"

  cluster_name                    = "main"
  cluster_enabled_log_types       = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc_example.vpc_id
  subnet_ids = module.vpc_example.private_subnets

  eks_managed_node_groups = {
    spot = {
      create_launch_template = false
      launch_template_name   = ""

      capacity_type = "SPOT"
      instance_types = [
        "m4.large",
        "m5.large",
        "m5a.large",
        "m6i.large",
        "t2.large",
        "t3.large",
        "t3a.large",
      ],
    }
  }

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_breakglass.names)}"
      username = "sso-breakglass:{{SessionName}}"
      groups   = ["sso-breakglass"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_readall.names)}"
      username = "sso-readall:{{SessionName}}"
      groups   = ["sso-readall"]
    },
  ]
}

Steps to reproduce the behavior:

  • Attempt to deploy an EKS cluster with EKS managed node groups and manage_aws_auth_configmap = true
  • See the above error

In our case, we're using Terraform Cloud, but I'm unsure if that actually affects anything here. We were simply trying to create a new EKS cluster, and noticed this new setting. In the past we've had to use complex hacks to manage the aws-auth ConfigMap, so this seemed like a better approach, but it doesn't seem to work.

It's worth noting that running another apply doesn't fix the issue, so I don't think it's a timing issue.

Expected behavior

The cluster is created without error, and the aws-auth ConfigMap contains the expected content.

Actual behavior

The above error.

Additional context

Screen Shot 2022-04-12 at 09 53 54

@ezzatron ezzatron changed the title Error: The configmap "aws-auth" does not exist when deploying a new EKS cluster with manage_aws_auth_configmap = true Error: The configmap "aws-auth" does not exist when deploying an EKS cluster with manage_aws_auth_configmap = true Apr 11, 2022
@bryantbiggs
Copy link
Member

thats odd that its stating that the aws-auth configmap doesn't exist when you are using an EKS managed node group - EKS managed node groups automatically create/update the configmap with the role used by the node group (same with Fargate profiles). However, self managed node groups do not create the configmap so we do have a variable to handle this

# Self managed node groups will not automatically create the aws-auth configmap so we need to
create_aws_auth_configmap = true
manage_aws_auth_configmap = true

create_aws_auth_configmap = true
manage_aws_auth_configmap = true

@ezzatron
Copy link
Author

Yeah, I assumed based on the docs that we shouldn't be setting create_aws_auth_configmap = true because we're only using managed node groups. Should enable that setting as well?

@bryantbiggs
Copy link
Member

yes, its safe to enable this.

question: if the configmap doesn't exist, are your nodes connected to the control plane?

@ezzatron
Copy link
Author

As far as I can tell, the ConfigMap genuinely doesn't exist. In the console, the node group shows up under the cluster, which (I think) means that the nodes are connected to the control plane:

Screen Shot 2022-04-12 at 10 35 47

But since the config map doesn't exist, I can't see any of the resources inside the cluster of course:

Screen Shot 2022-04-12 at 10 35 06

@bryantbiggs
Copy link
Member

hmm, somethings off. if the configmap doesn't exist then the nodes won't register because they lack authorization to do so

@bryantbiggs
Copy link
Member

let me try your repro and see

@ezzatron
Copy link
Author

Thanks for taking a look, let me know if I can help out with any other info. In the meantime I'm going to try enabling create_aws_auth_configmap = true, and also seeing if the spot instance config has something to do with it (that's another thing we changed at the same time from a working cluster config).

@ezzatron
Copy link
Author

FWIW, adding create_aws_auth_configmap = true did change the error we get, but it didn't help us understand what's going on:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused

Screen Shot 2022-04-12 at 10 48 31

@bryantbiggs
Copy link
Member

hmm, it is there but its not recognizing or patching it. I'll have to file an issue with the upstream Kubernetes provider in the morning to have them take a look

@ezzatron
Copy link
Author

No worries, thanks for your help 🙏

@lrstanley
Copy link

Experiencing this as well, using 18.20.1 against Kubernetes 1.21. Fixed itself after another plan and apply. Wonder if this is another quirky EKS "feature" 🤦🏻‍♂️ where it says the cluster is ready, but it's actually not yet, and some restart/propagation still needs to happen as soon as the cluster is created, for aws-auth is populated?

╷
│ Error: The configmap "aws-auth" does not exist
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  428: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵
+ provider registry.terraform.io/hashicorp/aws v3.74.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/local v2.2.2
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/random v3.1.2
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/tls v3.3.0

on Terraform 1.1.6.

@lrstanley
Copy link

lrstanley commented Apr 12, 2022

Actually, nevermind. It succeeded, however it didn't actually apply it.

EDIT: it's because I didn't have the manage field set. With that enabled, I now get:

╷
│ Error: configmaps "aws-auth" already exists
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 411, in resource "kubernetes_config_map" "aws_auth":
│  411: resource "kubernetes_config_map" "aws_auth" {
│ 
╵

@MadsRC
Copy link

MadsRC commented Apr 12, 2022

I had the same issue, both the config map does not exist when managing the config map and the connection refused 127.0.0.1 when attempting to create it.

I'm using managed node groups as well.

The way I solved it was to add a kubernetes provider. This here should be enough:

/*
The following 2 data resources are used get around the fact that we have to wait
for the EKS cluster to be initialised before we can attempt to authenticate.
*/

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

It's also a great way to authenticate to the EKS cluster instead of the example in the repo that forces the use of awscli

@bryantbiggs
Copy link
Member

I had the same issue, both the config map does not exist when managing the config map and the connection refused 127.0.0.1 when attempting to create it.

I'm using managed node groups as well.

The way I solved it was to add a kubernetes provider. This here should be enough:

/*
The following 2 data resources are used get around the fact that we have to wait
for the EKS cluster to be initialised before we can attempt to authenticate.
*/

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

It's also a great way to authenticate to the EKS cluster instead of the example in the repo that forces the use of awscli

oy, good spot on not providing the provider and creds. I'll file a ticket for better error reporting on that upstream.

Regarding the token vs exec - exec is what is recommended by the provider itself https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

@bryantbiggs
Copy link
Member

yep, was able to confirm that was the issue and this now works as expected @ezzatron - we just forgot to put the provider auth

provider "aws" {
  region = "us-east-1"
}

provider "kubernetes" {
  host                   = module.eks_main.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks_main.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks_main.cluster_id]
  }
}

data "aws_availability_zones" "available" {}

data "aws_caller_identity" "current" {}

module "vpc_example" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 3.14.0"

  name = "example"
  cidr = "10.10.0.0/16"

  azs                = slice(data.aws_availability_zones.available.names, 0, 3)
  enable_nat_gateway = true

  private_subnets  = ["10.10.1.0/24", "10.10.2.0/24", "10.10.3.0/24"]
  public_subnets   = ["10.10.11.0/24", "10.10.12.0/24", "10.10.13.0/24"]
  database_subnets = ["10.10.21.0/24", "10.10.22.0/24", "10.10.23.0/24"]

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

module "eks_main" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.1"

  cluster_name                    = "main"
  cluster_enabled_log_types       = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc_example.vpc_id
  subnet_ids = module.vpc_example.private_subnets

  eks_managed_node_groups = {
    spot = {
      create_launch_template = false
      launch_template_name   = ""

      capacity_type = "SPOT"
      instance_types = [
        "m4.large",
        "m5.large",
        "m5a.large",
        "m6i.large",
        "t2.large",
        "t3.large",
        "t3a.large",
      ],
    }
  }

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/foo"
      username = "sso-breakglass:{{SessionName}}"
      groups   = ["sso-breakglass"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/bar"
      username = "sso-readall:{{SessionName}}"
      groups   = ["sso-readall"]
    },
  ]
}

@bryantbiggs
Copy link
Member

@ezzatron are we able to close this with the solution posted above?

@narenaryan
Copy link

✔️ For us, following setup worked while migrating EKS module from 17.22.0 to 18.20.2

data "aws_eks_cluster" "default" {
  name = local.cluster_name
}

data "aws_eks_cluster_auth" "default" {
  name = local.cluster_name
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

For data source fields 'aws_eks_cluster' and 'aws_eks_cluster_auth', name: module.eks.cluster_id didn't work for some reason, and was throwing connection errors as specified in the title of this ticket.

We got the provider block from HashiCorp terraform provider git: https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/_examples/eks/kubernetes-config/main.tf

Versions:

| Terraform | 1.1.8 |
| EKS Module | 18.20.2 |
| Kubernetes Provider | 2.10.0 |

@bryantbiggs
Copy link
Member

@narenaryan just be mindful that using that route the token can expire. The provider recommends the exec route if you can (requires awscli to be available where Terraform is execute) https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

@magnusseptim
Copy link

magnusseptim commented Apr 15, 2022

It seems to still be some issue.

Tried both @bryantbiggs and @narenaryan propostions, and yes, it sometimes works as intended.

Unfortunately from time to time I have :

Error: Get "https://***.***.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp <ip address here>:443: i/o timeout

on terraform plan stage.

May be that it is not a eks module / kubernetes provider resource, but rather some issue with my deployment machine, but this is what I got for now.

Was unable to find yet what was the underlying reason.

@bryantbiggs
Copy link
Member

@magnusseptim please feel free to investigate, add a 👍🏽 , or post a new issue upstream as there are well known issues with the Kubernetes provider https://github.com/hashicorp/terraform-provider-kubernetes/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+localhost

@ncjones
Copy link

ncjones commented Apr 16, 2022

When configuring the kubernetes provider using exec and no region, as per the advice above, I got the following error:

module.eks.kubernetes_config_map.aws_auth[0]: Creating...
│ Error: Post "https://9266dd6a08gr7.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps": getting credentials: exec: executable aws failed with exit code 255
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
│  414: resource "kubernetes_config_map" "aws_auth" {
│ 
Error: Process completed with exit code 1.

I think it's because it assumes the AWS CLI is configured with a region which is not true in my environment. If the region is not set then the AWS CLI will attempt to contact the instance metadata service (IMDS) to detect the region. The IMDS call also fails in my CI/CD environment.

Using the aws_eks_cluster_auth data source solves this issue.

data "aws_eks_cluster_auth" "default" {
  name = "my-eks-cluster-name"
}

provider "kubernetes" {
  host = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token = data.aws_eks_cluster_auth.default.token
}

@bryantbiggs
Copy link
Member

The exec command is sent to the awscli so you can set the region:

provider "kubernetes" {
  host                   = module.eks_main.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks_main.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks_main.cluster_id, "--region", "us-east-1"]
  }
}

@flomsk
Copy link

flomsk commented Jul 1, 2022

confirming its working with token and not with exec under provider configuration

provider "kubernetes" {
  alias                  = "eu-west-1"
  host                   = data.aws_eks_cluster.cluster_eu.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster_eu.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster_eu.token
  # exec {
  #   api_version = "client.authentication.k8s.io/v1beta1"
  #   args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster_eu.name]
  #   command     = "aws"
  # }
}

@duclm2609
Copy link

data "aws_eks_cluster_auth" "eks_auth" {
  name = module.eks.cluster_id
}

Me too. Using exec does not work.

@ricardo6142dh
Copy link

Still not working here :-(

Screenshot 2022-07-03 at 23 55 59

`module "eks" {

https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest

source = "terraform-aws-modules/eks/aws"
version = "18.26.2"

cluster_name = local.cluster_name
cluster_version = "1.21"

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.public_subnets

enable_irsa = true

create_cluster_security_group = false
create_node_security_group = false

manage_aws_auth_configmap = true

aws_auth_users = [
{
userarn = "arn:aws:iam::my_aws_account:user/my_user"
username = "lopes_becker"
groups = ["system:masters"]
}
]`

@zeevmoney
Copy link

in my case, I have several profiles, so I need to add "--profile" option.

Same thing happened to me, make sure you use the right profile.

@jeunii
Copy link

jeunii commented Jul 19, 2022

Im not sure what im doing wrong here. But my config looks like

provider "aws" {
  assume_role {
    role_arn = "arn:aws:iam::${var.aws_account_id}:role/pe-gitlab-assume_role"
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

data "terraform_remote_state" "vpc" {
  backend = "http"
  config = {
    address = "https://gitlab.com/api/v4/projects/37469473/terraform/state/core-net"
   }
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"

  cluster_name    = "build"
  cluster_version = "1.22"

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true

  cluster_addons = {
    coredns = {}
    kube-proxy = {}
    vpc-cni = {}
  }

  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.dev_cp_subnet_ids

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    disk_size      = 50
    instance_types = ["t3.medium"]
    subnet_ids = data.terraform_remote_state.vpc.outputs.dev_ng_subnet_ids
  }

  eks_managed_node_groups = {
    core = {
      min_size     = 2
      max_size     = 10
      desired_size = 2

      instance_types = ["t3.2xlarge"]
      capacity_type  = "SPOT"
    }
  }

  tags = {
    ManagedBy = "Terraform"
    Infra = "eks"
  }

  # aws-auth configmap
  create_aws_auth_configmap = true

}

But when I run this, I get

│ Error: Unauthorized
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 453, in resource "kubernetes_config_map" "aws_auth":
│  453: resource "kubernetes_config_map" "aws_auth" {
│ 
╵
Cleaning up project directory and file based variables

If I use just manage_aws_auth_configmap, I get

│ Error: The configmap "aws-auth" does not exist
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 470, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  470: resource "kubernetes_config_map_v1_data" "aws_auth" {

@bryantbiggs
Copy link
Member

@jeunii you are using a role in the AWS provider which becomes the EKS default system:master user in the cluster, but in your kubernetes provider you are using your default profile which is probably different from the role in the IAM provider and therefore that identity does not have cluster access. You either need to update the exec call for the role assumption or use the data source route such as

data "aws_eks_cluster_auth" "eks_auth" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.eks_auth.token
}

@bryantbiggs
Copy link
Member

tl;dr - your AWS provider identity needs to be the same identity in the Kubernetes/Helm/kubectl providers when using the exec() method, or use the data source for the static token based auth

@jeunii
Copy link

jeunii commented Jul 19, 2022

@bryantbiggs

thank you for your pointer. it worked. I was able to use manage_aws_auth_configmap = true to configure it.

Terraform has been successfully initialized!
module.eks.kubernetes_config_map_v1_data.aws_auth[0]: Creating...
module.eks.kubernetes_config_map_v1_data.aws_auth[0]: Creation complete after 0s [id=kube-system/aws-auth]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

when logged into the console, I see this message.

Screen Shot 2022-07-19 at 5 06 37 PM

via the console I cannot see any nodes or k8s objects.

also via the cli, I get

kubectl get nodes
error: You must be logged in to the server (Unauthorized)

I assume the next thing for me to do is to configure my own user ?

  aws_auth_users = [
    {
      userarn  = "arn:aws:iam::66666666666:user/user1"
      username = "user1"
      groups   = ["system:masters"]
    },
]

I myself am a federated user via SSO.

@bryantbiggs
Copy link
Member

@jeunii
Copy link

jeunii commented Jul 19, 2022

@bryantbiggs Thanks. For now I just want to statically configure this.

My SSO are is AWSReservedSSO_AdministratorAccess_05da3c667640d5fb/<firstName>.<lastName>

so for a simple case like this, shouldnt this suffice ?

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "AWSReservedSSO_AdministratorAccess_05da3c667640d5fb/<firstName>.<lastName>"
      username = "<firstName>.<lastName>"
      groups   = ["system:masters"]
    },
  ]

I already implemeted this but it did not seem to work

              + - "groups":
              +   - "system:masters"
              +   "rolearn": "AWSReservedSSO_AdministratorAccess_05da3c667640d5fb/<firstName>.<lastName>"
              +   "username": "sso_admin"

@yagehu
Copy link

yagehu commented Jul 20, 2022

exec did not work for me while token with data "aws_eks_cluster_auth" worked. Maybe in my cause it's because I'm provisioning resources from a different account.

@EnriqueHormilla
Copy link

As @bryantbiggs said it, my problem was my AWS provider is using a custom profile but my Kubernetes provider not.

provider "aws" {
  region  = var.region
  profile = var.aws_cli_profile  
}

provider "kubernetes" {
  host                   = module.k8s.cluster_endpoint
  cluster_ca_certificate = base64decode(module.k8s.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.k8s.cluster_id,  "--region", var.region, "--profile", var.aws_cli_profile]
  }
}

After do this I can manage aws-auth with the terraform module AWESOME!!!!!

@bryantbiggs
Copy link
Member

@jeunii you are trying to pass an entity of a role where the role ARN is required. See the example provided since this is what you are trying to accomplish. If you need to limit it further, create a custom SSO group and use that instead of the default Admin group

alexlogy added a commit to alexlogy/terraform-eks-cluster that referenced this issue Jul 22, 2022
@ajinkyasurya
Copy link

ajinkyasurya commented Jul 28, 2022

Going to compile list of issues that I came across -

╷                                                                                                                                                                                                                                        
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused                                                                                                    
│                                                                                                                                                                                                                                        
│   with kubernetes_config_map_v1_data.aws_auth[0],                                                                                                                                                                                      
│   on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":                                                                                                                                                         
│  428: resource "kubernetes_config_map_v1_data" "aws_auth" {                                                                                                                                                                            
│                                                                                                                                                                                                                                        
╵       

is very generic & masks multiple issues.

  1. As others have mentioned the only thing you need to get around this is following which didn't work
provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "var.profile"]
  }
}

I also tried with following which didn't work

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

Both the above approaches use module.eks.cluster_id which is not working in my case for some reason & the error masks that issue.
What did work was -

data "aws_eks_cluster" "default" {
  name = local.cluster_name
}

data "aws_eks_cluster_auth" "default" {
  name = local.cluster_name
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}
  1. You'll get the same error if you have not configured cluster_security_group_additional_rules so make sure your cluster has correct security group to accept connections from where you are running terraform. This is the second issue that's being masked.

Try adding following to your eks module to make sure this isn't the case.

  cluster_security_group_additional_rules = {
    ingress = {
      description           = "To node 1025-65535"
      type                       = "ingress"
      from_port             = 0
      to_port                  = 0
      protocol                = -1
      cidr_blocks           = ["0.0.0.0/0"] # CHANGE_ME
      source_node_security_group = false
    }
  }
  1. If you have changed cidr_blocks in 2 and you are not in the IP range you have mentioned, you might end up getting a connection timeout error -
╷
│ Error: Get "https://REDACT.gr7.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 10.X.X.X:443: i/o timeout
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 470, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  470: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵

Make sure you are on VPN or in the network you want to access cluster from. I'm sure there are more issues but just wanted to point out some of them that I came across.
Hopefully this saves you time!

@ctroyp
Copy link

ctroyp commented Jul 28, 2022

Folks, just wanted to report back that after dealing with the "Error: The configmap "aws-auth" does not exist" error for several days, I can concur that it most likely (in my case and as experienced by others) that it was an issue in connecting to the endpoint during the process. In my specific case, the source of my problem was with my corporate proxy where it would not accept the root certificate and terraform would report the aws-auth error and AWS CLI would report that I was using a self-signed cert which is not the case.

Hoping this will help others or at least provide some insight...

@shashidhar087
Copy link

Going to compile list of issues that I came across -

╷                                                                                                                                                                                                                                        
│ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused                                                                                                    
│                                                                                                                                                                                                                                        
│   with kubernetes_config_map_v1_data.aws_auth[0],                                                                                                                                                                                      
│   on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":                                                                                                                                                         
│  428: resource "kubernetes_config_map_v1_data" "aws_auth" {                                                                                                                                                                            
│                                                                                                                                                                                                                                        
╵       

is very generic & masks multiple issues.

  1. As others have mentioned the only thing you need to get around this is following which didn't work
provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "var.profile"]
  }
}

I also tried with following which didn't work

data "aws_eks_cluster" "default" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "default" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

Both the above approaches use module.eks.cluster_id which is not working in my case for some reason & the error masks that issue. What did work was -

data "aws_eks_cluster" "default" {
  name = local.cluster_name
}

data "aws_eks_cluster_auth" "default" {
  name = local.cluster_name
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}
  1. You'll get the same error if you have not configured cluster_security_group_additional_rules so make sure your cluster has correct security group to accept connections from where you are running terraform. This is the second issue that's being masked.

Try adding following to your eks module to make sure this isn't the case.

  cluster_security_group_additional_rules = {
    ingress = {
      description           = "To node 1025-65535"
      type                       = "ingress"
      from_port             = 0
      to_port                  = 0
      protocol                = -1
      cidr_blocks           = ["0.0.0.0/0"] # CHANGE_ME
      source_node_security_group = false
    }
  }
  1. If you have changed cidr_blocks in 2 and you are not in the IP range you have mentioned, you might end up getting a connection timeout error -
╷
│ Error: Get "https://REDACT.gr7.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 10.X.X.X:443: i/o timeout
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 470, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  470: resource "kubernetes_config_map_v1_data" "aws_auth" {
│ 
╵

Make sure you are on VPN or in the network you want to access cluster from. I'm sure there are more issues but just wanted to point out some of them that I came across. Hopefully this saves you time!

My issue solved. Thanks !

@shake76
Copy link

shake76 commented Aug 18, 2022

Thanks all for sharing this information and workarounds , actually I'm having the following issue "https://68E51D9223F166C1D6B7BDCDFD370F1F.gr7.region.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 10.x.x.x:443: i/o timeout", it happens when run Terraform Cloud plan against eks module after turn on private endpoints in my EKS cluster, I already configured a bastion host that is able to reach our cluster, from my local I've configured a proxy that use ssm, even more I was able to run kubectl commands without problem to interact with the cluster but the issue is coming when run terraform cloud plan, Do you know if is there any workaround to pass an https_proxy variable with Terraform Cloud? I'll appreciate your comments in advance

@yagehu
Copy link

yagehu commented Sep 12, 2022

Another cause: I had set endpoint_public_access = false. In that case I would need an SSH tunnel.

@VladoPortos
Copy link

This is a constant headache:

│ Error: configmaps "aws-auth" already exists
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 453, in resource "kubernetes_config_map" "aws_auth":
│  453: resource "kubernetes_config_map" "aws_auth" {

The map is there because I have:

  manage_aws_auth_configmap = true
  create_aws_auth_configmap = true

But it looks like it want to create the aws map on every run. I did not have yet successful run in creating EKS cluster with Terraform. This issue always pops up. Only solution is to import the map, and then re run again... not very automation friendly if I have to do manual task :(

@yagehu
Copy link

yagehu commented Sep 13, 2022

This is a constant headache:

│ Error: configmaps "aws-auth" already exists
│ 
│   with module.eks.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 453, in resource "kubernetes_config_map" "aws_auth":
│  453: resource "kubernetes_config_map" "aws_auth" {

The map is there because I have:

  manage_aws_auth_configmap = true
  create_aws_auth_configmap = true

But it looks like it want to create the aws map on every run. I did not have yet successful run in creating EKS cluster with Terraform. This issue always pops up. Only solution is to import the map, and then re run again... not very automation friendly if I have to do manual task :(

I think you just need to set create_aws_auth_configmap to false. This is because aws-auth config map is automatically created by EKS.

@yagehu
Copy link

yagehu commented Sep 15, 2022

I've encountered and fixed this issue multiple times. This is mainly because we provision with Terraform from a different account. The crux of the issue is:

  1. On creation, EKS only allows the IAM entity that created the cluster access.
  2. If provisioning cross-account, the entity that created the cluster is the role that is assumed.
  3. You have to use that role to change the aws-auth config map or perform any post cluster creation Terraform actions.

To know definitively what entity it is, check your CloudTrail events and search for event name CreateCluster, for example:

{
    // snip
    "userIdentity": {
        "type": "AssumedRole",
        // snip
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                // snip
                "arn": "arn:aws:iam::0123456789:role/xxxxxx",  <------- This is the role
                // snip
            },
        }
    }
}

@dantman
Copy link

dantman commented Sep 17, 2022

For users running into this issue and having issues with the exec provider. Also try using aws eks update-kubeconfig ... and connecting to your cluster directly. Or just check your aws cli version.

You could be running into issues like aws/aws-cli#6920 which can be caused, among other things, by having v1 of the aws cli installed from an apt package instead of a recent v2 version.

The kubernetes provider doesn't seem to let you know when it couldn't connect because your aws cli is out of date.

@VladoPortos
Copy link

I needed to add this to get it to work, finally,

resource "null_resource" "kubectl-init" {
  provisioner "local-exec" {
    command = "aws eks --region ${var.aws_region} update-kubeconfig --name ${var.cluster_name}"
  }
  depends_on = [module.eks.cluster_id]
}

This needs to run after eks creation, before module installation.
This will populate the cubeconfig needed by exec later on. After this, it worked ok

@kingbj940429
Copy link

kingbj940429 commented Sep 29, 2022

If you've tried all of the above methods, but you couldn't, try like this.

I had same problem

Error: The configmap "aws-auth" does not exist

But I SOVLED this way !!

First, I created EKS Cluster and API Endpoint was PRIVATE

As it says in AWS Docs, you can only access EKS Cluster API-Server in Same VPC. If a API Endpoint is PRIVATE

My Company is using TGW. so I Can Access VPC where created EKS Cluster.

Anyway, the reason that occur a error is you couldn't access EKS Api-server !!

Solve

I proceed thinking that you are in the same VPC as EKS Cluster

스크린샷 2022-09-29 오후 7 23 44

click additional security group !!

스크린샷 2022-09-29 오후 7 25 20

and add rule like image !! allow 443 port

 # Extend cluster security group rules
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "EKS Cluster allows 443 port to get API call"
      type                       = "ingress"
      from_port                  = 443
      to_port                    = 443
      protocol                   = "TCP"
      cidr_blocks                = ["0.0.0.0/0"]
      source_node_security_group = false
    }
  }

Even if it's private, you can access eks cluster Api-server and can get aws-auth in kube-system !!

By opening the 443 port, it allows you to connect to the EKS Api-server.

Based on the EKS Api-server, 443 ports were closed, so you couldn't access it, and this error occurred.

to sum up,
Allow 443 port for EKS Cluster to access API server !!

@dejongm
Copy link

dejongm commented Oct 7, 2022

We resolved our issue with a depends_on in the aws_eks_cluster and aws_eks_cluster_auth data blocks. I believe the K8s credentials are expiring while the node groups are being created. We are using managed node groups so our scenario involves setting manage_aws_auth_configmap = true and create_aws_auth_configmap = false. We don't use the terraform-aws-eks module but our data blocks looks like this:

module "eks_node_group" {
...
}

data "aws_eks_cluster" "main" {
  depends_on = [
    module.eks_node_group
  ]
  name = aws_eks_cluster.main.id
}

data "aws_eks_cluster_auth" "main" {
  depends_on = [
    module.eks_node_group
  ]
  name = aws_eks_cluster.main.id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.main.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.main.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.main.token
}


provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.main.endpoint
    cluster_ca_certificate = base64decode(aws_eks_cluster.main.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.main.token
  }
}

@tulanian
Copy link

tulanian commented Oct 12, 2022

UPDATE: This works great for standing up a cluster, but breaks sometime later if you try another terraform apply. Removing the depends_on fixes that.

Similar to this comment we solved this by adding depndencies. The data sources aren't populated until after all of the node groups have been created, so the token is fresher.

data "aws_eks_cluster" "this" {
  name = module.eks.cluster_id
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

data "aws_eks_cluster_auth" "this" {
  name = module.eks.cluster_id
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.this.endpoint
  token                  = data.aws_eks_cluster_auth.this.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.this.certificate_authority.0.data)
}

@gartemiev
Copy link

I've face with the same issue. However - I had outdated aws cli version and aws-iam-authenticator.
Once I updated aws cli and aws-iam-authenticator along with kube confing - the issue has resolved.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests