Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple security groups are created with the kubernetes.io/cluster tag #1986

Closed
1 task done
blakepettersson opened this issue Apr 4, 2022 · 28 comments
Closed
1 task done

Comments

@blakepettersson
Copy link

Description

The issue is exactly the same as #1810, but for EKS managed node groups, with the same workaround (and presumably the same fix)

  • ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • 18.10.1

  • Terraform version:

Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v3.72.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.3.2
+ provider registry.terraform.io/hashicorp/local v2.2.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0
+ provider registry.terraform.io/oboukili/argocd v2.2.8
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1

Your version of Terraform is out of date! The latest version is 1.1.7. You can update by downloading from https://www.terraform.io/downloads.html

Reproduction Code [Required]

Steps to reproduce the behaviour:

Create a cluster with this module, using default values and containing at least one eks_managed_node_group

Possible workaround:

Disable creation of either the node_group or shared node group security groups through module variables and create a separate security group outside this module.
@bryantbiggs
Copy link
Member

hi @blakepettersson - the EKS managed node group sub-module security group does not have these tags currently

@blakepettersson
Copy link
Author

@bryantbiggs so it seems 🤦. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the kubernetes.io/cluster/${var.cluster_name} set, which conflicts with the tag which is set by aws_security_group.node. I'll setup a proper repro for this.

@blakepettersson
Copy link
Author

blakepettersson commented Apr 5, 2022

Here's a minimal repro:

locals {
  vpc_id          = "a vpc id"
  aws_region      = data.aws_region.current.name
  current_account = data.aws_caller_identity.current.account_id
}

provider "aws" {
  region = "eu-north-1"
}

data "aws_region" "current" {}

data "aws_caller_identity" "current" {}

data "aws_subnets" "all" {
  filter {
    name   = "vpc-id"
    values = [local.vpc_id]
  }
}

module "eks" {
  source                          = "terraform-aws-modules/eks/aws"
  version                         = "18.10.1"
  cluster_name                    = "test"
  cluster_version                 = "1.21"
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  vpc_id                          = local.vpc_id
  subnet_ids                      = data.aws_subnets.all.ids
  eks_managed_node_groups = {
    default = {
      medium = {
        instance_types = ["t3.medium"]
        desired_size   = 3
        max_size       = 10
        min_size       = 3
        disk_type      = "gp3"
      }
    }
  }
  enable_irsa = true
}

Running terraform apply will create 3 security groups, 2 of which are created by this module and 1 by EKS itself.

Screenshot from 2022-04-05 15-17-25

Screenshot from 2022-04-05 15-17-45

Both of these security groups have the tag kubernetes.io/cluster/test: owned.

@leiarenee
Copy link

leiarenee commented Apr 23, 2022

Modified from examples/complete removing all external modules and keeping 1 inline managed node group only.

nodegroup configuration

module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.5"
 .
 .
 .

  # EKS Managed Node Group(s)

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    disk_size      = 10
    instance_types = ["t2.small", "t3.small", "t3a.small"]

    attach_cluster_primary_security_group = true
    vpc_security_group_ids                = [aws_security_group.additional.id]
  }

  eks_managed_node_groups = {
    green = {
      min_size     = 1
      max_size     = 10
      desired_size = 1

      instance_types = ["t3.small"]
      capacity_type  = "SPOT"
      labels = {
        Environment = "test"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }

      taints = {
        # dedicated = {
        #   key    = "dedicated"
        #   value  = "gpuGroup"
        #   effect = "NO_SCHEDULE"
        # }
      }

      update_config = {
        max_unavailable_percentage = 50 # or set `max_unavailable`
      }

      tags = {
        ExtraTag = "example"
      }
    }
  }

kubectl describe service

Events:
  Type     Reason                  Age                  From                Message
  ----     ------                  ----                 ----                -------
  Normal   EnsuringLoadBalancer    97s (x6 over 4m16s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  96s (x6 over 4m14s)  service-controller  Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-07d843b5953ca7412; ensure only the k8s security group is tagged; the tagged groups were sg-0bc8599e71f8a54a9(my-testing-k8s-node-20220419213507462100000009) sg-0b189fb8969ff5bd4(eks-cluster-sg-my-testing-k8s-987542792)

This workaround solves the problem. Pass it to the module "eks"

  # Temp workaround for bug : double owned tag 
  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1810
  node_security_group_tags = {
  "kubernetes.io/cluster/${local.name}" = null
  }

Notice that taint configuration is also disabled since it caused rejection in scheduler with core-dns deployment.

      taints = {
        # dedicated = {
        #   key    = "dedicated"
        #   value  = "gpuGroup"
        #   effect = "NO_SCHEDULE"
        # }
      }

@gohmc
Copy link

gohmc commented Apr 23, 2022

@bryantbiggs so it seems 🤦. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the kubernetes.io/cluster/${var.cluster_name} set, which conflicts with the tag which is set by aws_security_group.node. I'll setup a proper repro for this.

"kubernetes.io/cluster/${var.cluster_name}" = "owned"

@kaykhancheckpoint
Copy link

I'm also experiencing an issue with this after installing the aws-load-balancer-controller. I get the following error from there.

{"level":"error","ts":1651137191.6512694,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-echoserv-echoserv-aaf854977c","namespace":"echoserver","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/geeiq-test-k8s for eni eni-0605625aa92fe670e, got: [sg-0cc7f9db258d4e3f8 sg-0e95a58276d2c4ae5]"}

After checking the console. i can see i have 2 sgs with the same tag "kubernetes.io/cluster/<cluster_name>" = "owned".

Im also using eks managed node groups.

@diginc
Copy link

diginc commented Apr 28, 2022

@kaykhancheckpoint check the last couple posts for the workaround from leiarenee

@leiarenee
Copy link

leiarenee commented Apr 28, 2022

This workaround solved it for me. (#1810) (Repeating for brevity)

Module 'eks'

  node_security_group_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

@kaykhancheckpoint
Copy link

@diginc @leiarenee yup, thank you

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@rcmurphy
Copy link

rcmurphy commented Jun 3, 2022

This workaround did not work for us.

@bryantbiggs
Copy link
Member

@rcmurphy / @mr-kelly can you try:

  cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Where ${local.name} is the name of your cluster

@rcmurphy
Copy link

rcmurphy commented Jun 3, 2022

I can confirm we've tried that. It did not work.

@bryantbiggs
Copy link
Member

I can confirm we've tried that. It did not work.

apologies - does that mean that it does work for your case or not?

@rcmurphy
Copy link

rcmurphy commented Jun 4, 2022 via email

@mr-kelly
Copy link

mr-kelly commented Jun 4, 2022

@rcmurphy / @mr-kelly can you try:

  cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Where ${local.name} is the name of your cluster

In my Karpenter Example, this doesn't work.

image

And these stop create security group doesn't work so:

  create_node_security_group = false
  create_cluster_security_group = false

I am still trying....

@mr-kelly
Copy link

mr-kelly commented Jun 4, 2022

In this offical doc:
https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html

AWS says :

image

And in my karpenter example,
kubernetes.io/cluster/cluster-name=owned appears twice.

@bryantbiggs
Copy link
Member

After digging in to this, here is what I found:

If you use attach_cluster_primary_security_group = true, then you will want to set

node_security_group_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Why - the primary security group created by the EKS service automatically tags the cluster's primary security group. I tried to use

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

To "hide" this tag and remove the conflict, but this route does not work. Its safe to say, if you want to attach the cluster's primary security group to your nodes, you need to disable the tag on the node shared security group.

Its not a matter of just having multiple security groups with the tag key:value combo, its only a concern when multiple security groups with the tag key:value combo are ATTACHED to your node(s).

You can refer to this example which did not encounter any issues provisioning the ALB controller chart https://github.com/clowdhaus/eks-reference-architecture/tree/main/karpenter/us-east-1

Let me know if there is anything else I missed or any other questions pertaining to this issue.

@axkng
Copy link

axkng commented Jun 21, 2022

If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with "kubernetes.io/cluster/$clustername" = owned.
And that would mean you basically have to use attach_cluster_primary_security_group = true, because the clusters primary security group will always have the problematic tag, since you state that

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

does not work.

An alternative would be to set create_security_group = falsein the node group, which seems like the equivalent of setting attach_cluster_primary_security_group = true.

Do I understand this correctly?

@bryantbiggs
Copy link
Member

If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with "kubernetes.io/cluster/$clustername" = owned. And that would mean you basically have to use attach_cluster_primary_security_group = true, because the clusters primary security group will always have the problematic tag, since you state that

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

does not work.

An alternative would be to set create_security_group = falsein the node group, which seems like the equivalent of setting attach_cluster_primary_security_group = true.

Do I understand this correctly?

No - the security group and tag in question only come into play when they are attached to nodes. So you can have 40+ security groups in your account with this tag without issue, but as soon as you attach more than one of those security groups to node(s) in your cluster, then its a conflict

@axkng
Copy link

axkng commented Jun 21, 2022

That is very interesting.
My clusters were built without attach_cluster_primary_security_group = true.
Yet the aws-lb-controller complains as soon as I add the tag to any second security group.

@ashishjullia
Copy link

Hi, @kaykhancheckpoint how did you actually deploy and enabled the "load balancer controller"?

@tareks
Copy link

tareks commented Jul 6, 2022

Running into what appears to be a similar issue, I think I've been able to work around it by using create_cluster_primary_security_group_tags = false.

The problem appears to be that there are more than 1 security group that are tagged with "kubernetes.io/cluster/${var.cluster_name}" = "owned". Setting that variable to false seems to at least stop Karpenter nodes from picking up the additional security group which stops things like the aws-lb-controller from complaining about ENIs with multiple security groups.

This may be a clue when running an apply:

module.eks.aws_ec2_tag.cluster_primary_security_group["karpenter.sh/discovery"] will be destroyed

Note that existing Karpenter instances need to be replaced for this tag change to take effect.

@github-actions
Copy link

github-actions bot commented Aug 6, 2022

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Aug 6, 2022
@github-actions
Copy link

This issue was automatically closed because of stale in 10 days

@robertlagrant
Copy link

I think this issue has probably been closed prematurely; I just hit it as well with this template.

@uwburn
Copy link

uwburn commented Sep 1, 2022

+1

Stepped over this as well with 18.26.6, using attach_cluster_primary_security_group = true.

Results was the inability to have the ELB working, having two security groups with "kubernetes.io/cluster/<cluster_name>" = "owned".

Using create_cluster_primary_security_group_tags = false, but honestly atm i'm not able to say if this is a proper setup or just a workaround.

@github-actions
Copy link

github-actions bot commented Nov 8, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests