Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

Closed
gwvandesteeg opened this issue Mar 10, 2021 · 3 comments

Comments

@gwvandesteeg
Copy link

Description

The null_resource wait_for_cluster uses a for loop in the script provided by default via input variable wait_for_cluster_cmd.
The for loop calls both wget and curl without specifying maximum timeouts on their operations meaning that if the cluster endpoint is not accessible, but is DNS resolve-able, to let's say a private IP address because the cluster was configured with a private only endpoint the command will run for.. quite some time. Since for loop iterates a maximum of 60 times, and each command is relying on the default timeouts for wget (900 seconds) and curl (3600 seconds) the resource can sit there for upwards of 60+hrs waiting for the resource to be created. The commands should be altered to include an explicit maximum timeout period to ensure it doesn't sit there attempting to accelerate the heat death of the universe.

For wget this is achieved via the addition of the -t 60 CLI option, and for curl the --max-time 60 CLI option.

Versions

  • Terraform: 0.14.7

Reproduction

Steps to reproduce the behavior:

  • create a VPC with an RFC1918 address block
  • create the minimal EKS cluster as per the example and disable the public endpoint

Code Snippet to Reproduce

Expected behavior

The null_resource creation should fail after 60 (ish) minutes (or less if desired)

Actual behavior

The null_resource continues to try for hours if left to it

Terminal Output Screenshot(s)

module.eks.null_resource.wait_for_cluster: Still creating... [56m40s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [56m50s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m0s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m10s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m20s elapsed]

Additional context

@tfhartmann
Copy link
Collaborator

@gwvandesteeg I think this issue may have been submitted to the wrong repo/module. Was the issue you were having on the Transit Gateway module, or the EKS module?

@gwvandesteeg
Copy link
Author

Ah yes, correct.. looks like the wrong module. definately the EKS module.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants