Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Forces cluster outputs to wait until access entries are complete #3000

Conversation

lorengordon
Copy link
Contributor

@lorengordon lorengordon commented Apr 8, 2024

Description

This changeset adds depends_on to a few cluster outputs. This will tell terraform that these cluster outputs should not be made "known" and available until all the EKS access entries have completed. The outputs selected in this PR are limited to those used for authenticating to the EKS cluster. If it turns out more are needed, it is easy enough to add them, and I am not aware of any further repercussions to this particular usage of depends_on.

Motivation and Context

Without this change, it can be difficult to order the terraform lifecycle on create and destroy when also using the helm, kubectl, and/or kubernetes providers. It requires using state rm and -target to get things to complete successfully. With this change, you can just apply and destroy normally. See also the comment and issue thread here:

#2923 (comment)

Breaking Changes

None

How Has This Been Tested?

  • I have updated at least one of the examples/* to demonstrate and validate my change(s)
  • I have tested and validated these changes using one or more of the provided examples/* projects
  • I have executed pre-commit run -a on my pull request

I also ran terraform apply -auto-approve && terraform destroy -auto-approve on the "complete" example in the blueprints-addons project, three times successfully, without using any state rm or target steps.

@lorengordon
Copy link
Contributor Author

lorengordon commented Apr 8, 2024

Bloody pre-commit hooks, I hates them so much. Fine, hold on, waiting on a test of the change to finish up (EKS is so slow!)...

Edit: Ok, so it was just the pr title lol...

@lorengordon lorengordon changed the title Forces cluster outputs to wait until access entries are complete fix: Forces cluster outputs to wait until access entries are complete Apr 8, 2024
@bryantbiggs
Copy link
Member

could you also remove these commands from the example READMEs since they aren't needed anymore (thank goodness)

# Necessary to avoid removing Terraform's permissions too soon before its finished
# cleaning up the resources it deployed inside the cluster
terraform state rm 'module.eks.aws_eks_access_entry.this["cluster_creator"]' || true
terraform state rm 'module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' || true

Copy link
Member

@bryantbiggs bryantbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@bryantbiggs bryantbiggs merged commit e2a39c0 into terraform-aws-modules:master Apr 8, 2024
19 checks passed
antonbabenko pushed a commit that referenced this pull request Apr 8, 2024
## [20.8.5](v20.8.4...v20.8.5) (2024-04-08)

### Bug Fixes

* Forces cluster outputs to wait until access entries are complete ([#3000](#3000)) ([e2a39c0](e2a39c0))
@antonbabenko
Copy link
Member

This PR is included in version 20.8.5 🎉

@lorengordon lorengordon deleted the outputs-with-dependencies branch April 8, 2024 23:27
@llavaud
Copy link

llavaud commented Apr 10, 2024

It seems permissions are not completely propagated when Terraform say "Creation completed", I had to set an explicit dependency on the EKS module to try creating kubernetes resources after the 30s wait time...

@lorengordon
Copy link
Contributor Author

lorengordon commented Apr 10, 2024

@llavaud I had seen some funny business on apply when using this eks module in conjunction with the eks-blueprints-addons module, and specifically the helm_release resources. I'm not sure the cause is necessarily the same problem as what this patch fixed though. I think that might be a separate issue, perhaps pre-existing, perhaps revealed by fixing this one. Since destroy is now clean, but somehow apply sometimes requires a second try. Not consistently though. Some kind of race condition, maybe. Or perhaps the helm provider is mishandling its own attributes and authenticating waaaaay early? Or perhaps some attribute is not being threaded through the config as needed for ordering correctly.

Module-level depends_on should be avoided at all cost, since it will cause the dependency to be recreated on any change to the module. Better to just apply twice, until we figure out what is really happening...

@bryantbiggs
Copy link
Member

access entry creation is asynchronous and therefore should be treated as eventually consistent. However, Terraform operates in very much a synchronous manner. Deploying applications via Terraform is not recommended, but the 2nd best alternative is for the providers to implement retries similar to the kubectl provider

@lorengordon
Copy link
Contributor Author

@bryantbiggs Any idea if there is an issue for that on the helm provider? Do you have a ref to how the kubectl provider is doing it?

@bryantbiggs
Copy link
Member

Most likely, and here is the argument https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs#apply_retry_count

@lorengordon
Copy link
Contributor Author

Ahhh, that's what that argument is doing for me (in this config I inherited)! 😁

@llavaud
Copy link

llavaud commented Apr 10, 2024

@lorengordon I just create somes resources using the kubernetes provider
@bryantbiggs make sense and explain my issue, thanks !

@lorengordon
Copy link
Contributor Author

can the kubectl provider apply helm charts? hmm....

@lorengordon
Copy link
Contributor Author

Here's an issue to follow on retry support in the helm provider... hashicorp/terraform-provider-helm#1058

@lorengordon
Copy link
Contributor Author

Ok, I think the problem is that the cluster outputs are still becoming available before fargate and node group resources are complete, and providers that deploy resources into those node groups are starting waaay too early. I've got a branch I'm testing that just adds those modules to the same outputs we did in this PR. If that works cleanly a few times, I'll open a PR. Probably get to it Monday...

@bryantbiggs
Copy link
Member

becoming available before fargate and node group resources are complete

Thats fine - Kubernetes handles that. Terraform, however, does not when you have things like wait = true on the Helm resource. We most likely will not accept changes related to this because:

  1. Hashicorp advises against chaining providers
  2. Terraform is not an app deployment tool and will always fall short here
  3. Too many eggs in one basket (cluster and apps) is not recommended - large blast radius and non-ideal workflow

@lorengordon
Copy link
Contributor Author

lorengordon commented Apr 11, 2024

That's all well and good and technically correct I suppose. Hard to argue with. However, the eks-blueprints project, and the examples in this project, both really need to discuss how to actually do that then. It feels like there's a lot of configs that are really misleading about how they work and whether they're even a good idea. Or we could just fix it so it actually works.

Anyway, while my other hypothesis may be one problem. There is another race condition causing the node group to be created before the ENI Config is ready, which results in this error in the EKS console:

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

And this error in /var/log/aws-routed-eni/ipamd.log:

{"level":"info","ts":"2024-04-11T23:03:08.919Z","caller":"ipamd/ipamd.go:848","msg":"Found ENI Config Name: us-east-1b"}
{"level":"error","ts":"2024-04-11T23:03:09.020Z","caller":"ipamd/ipamd.go:848","msg":"error while retrieving eniconfig: ENIConfig.crd.k8s.amazonaws.com \"us-east-1b\" not found"}
{"level":"error","ts":"2024-04-11T23:03:09.020Z","caller":"ipamd/ipamd.go:824","msg":"Failed to get pod ENI config"}
{"level":"debug","ts":"2024-04-11T23:03:09.020Z","caller":"ipamd/ipamd.go:566","msg":"Error trying to allocate ENI: eniconfig: eniconfig is not available"}
{"level":"error","ts":"2024-04-11T23:03:09.020Z","caller":"aws-k8s-agent/main.go:42","msg":"Initialization failure: Failed to attach any ENIs for custom networking"}

That's even with the vpc-cni configured for "before_compute"...

  cluster_addons = {
    vpc-cni = {
      addon_version  = "v1.18.0-eksbuild.1"
      before_compute = true

      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }

@bryantbiggs
Copy link
Member

If you are using the ENIConfig for secondary CIDR, we just shipped https://aws.amazon.com/about-aws/whats-new/2024/04/amazon-vpc-cni-automatic-subnet-discovery/ which I believe removes the need to set/use the ENIConfig

@lorengordon
Copy link
Contributor Author

I'm reading that article, and the blog post, and trying to understand how to make it actionable in the context of this module, but I think I'm really too green with EKS/k8s to understand what it's saying. I inherited this EKS config, and I'm just trying to make it work more reliably. I know terraform really quite well, and know how to manipulate the DAG to make these implicit dependencies more explicit. But EKS/k8s I do not know at all. :(

@bryantbiggs
Copy link
Member

tl;dr - before you might create ENIConfigs like:

apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata: 
  name: $az_1
spec: 
  securityGroups: 
    - $cluster_security_group_id
  subnet: $new_subnet_id_1

But now, all you need to do is tag your secondary subnets with:

  tags = {
    "kubernetes.io/role/cni" = 1
  }

And the VPC CNI will discover these subnets and start utilizing them

@lorengordon
Copy link
Contributor Author

Aha, that was the clue I needed. Ok, thanks @bryantbiggs, sorry I was being so dense! Now I see what is causing the dependency ordering issue. The config I inherited is using the kubectl provider to create that ENIConfig, and the change I was proposing is preventing it from starting until after the node group is created. But the node group config requires that resource before it starts in order to complete successfully. Something like this, then:

  1. Create Cluster (aws)
  2. Create CNI addon (aws)
  3. Create ENIconfig (kubectl)
  4. Create Node Group (aws)
  5. Create everything else, including remaining addons and helm releases from the blueprints-addons module (aws/helm/kubernetes)

I'll see what I can do using the terraform_data resource in my own config to create the dependencies on the provider config inputs, instead of modifying the output dependencies here. I'm pretty sure I can force it to order everything correctly. Only explicit dependency missing then would be some way to create the graph edge from the kubectl resource to the node group resource. It seems currently that is handled just with a waiter (dataplane_wait_duration), rather than an attribute of some kind. But I think I have enough to get myself unblocked at least!

@bryantbiggs
Copy link
Member

Just to be clear, you do not need the ENIconfig if you use the recently released subnet discoverability feature as defined in this and its subsequent links

@lorengordon
Copy link
Contributor Author

lorengordon commented Apr 15, 2024

Yep understood. It may be a little academic at this point, but since I thought of it now I just want to see if I can influence the timing of the providers and their resources using terraform_data lol. And perhaps I can also create an explicit graph edge for the node group by threading a tag value from the kubectl resource to the node group tags...

@lorengordon
Copy link
Contributor Author

Nifty, yeah this seems to work just fine... For the kubectl provider, just use the module outputs since they will be available before the node group. And for other providers, thread the inputs through terraform_data in a way that links them to the compute option used in the config (or just use all of them, as I did below). Then all resources that depend on the node group will be created after the node group is ready. And if you need some things to kick off before compute and others after compute, for the same provider type, just use two providers with a provider alias.

# Use terraform_data inputs to create resources after compute
provider "kubernetes" {
  host                   = terraform_data.eks_cluster_after_compute.input.cluster_endpoint
  cluster_ca_certificate = base64decode(terraform_data.eks_cluster_after_compute.input.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.this.token
}

# Use terraform_data inputs to create resources after compute
provider "helm" {
  kubernetes {
    host                   = terraform_data.eks_cluster_after_compute.input.cluster_endpoint
    cluster_ca_certificate = base64decode(terraform_data.eks_cluster_after_compute.input.cluster_certificate_authority_data)
    token                  = data.aws_eks_cluster_auth.this.token
  }
}

# Use module outputs directly to create resources before compute
provider "kubectl" {
  apply_retry_count      = 10
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  load_config_file       = false
  token                  = data.aws_eks_cluster_auth.this.token
}

# Force a dependency between the EKS cluster auth inputs, and the compute resources
resource "terraform_data" "eks_cluster_after_compute" {
  input = {
    cluster_endpoint                   = module.eks.cluster_endpoint
    cluster_certificate_authority_data = module.eks.cluster_certificate_authority_data

    fargate_profiles         = module.eks.fargate_profiles
    eks_managed_node_groups  = module.eks.eks_managed_node_groups
    self_managed_node_groups = module.eks.self_managed_node_groups
  }
}

And it does actually work to thread tags back through from the kubectl_manifest to the node group. This will create the graph edge between the kubectl resource and the node group, so terraform can order actions correctly.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.8.5"
  # ...
  eks_managed_node_groups = {
    foo = {
      # ...
      tags = merge(
        { for label, config in kubectl_manifest.eni_config : "eni-config/${label}" => config.id },
        local.tags
      )
    }
  }
}

Though, for the moment, I chose not to thread the tags through that way because of the potential for recursive references, instead just relying on dataplane_wait_duration. If I really wanted to establish this graph edge, I would probably not use the top-level node-group/compute options in the module, and instead separately call the node-group submodule. Makes the dependencies easier to visualize and manipulate.

Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Access Entries are deleted before Helm charts so helm charts stays orphan in the terraform state
4 participants