null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

gwvandesteeg · 2021-03-10T13:05:14Z

Description

The null_resource wait_for_cluster uses a for loop in the script provided by default via input variable wait_for_cluster_cmd.
The for loop calls both wget and curl without specifying maximum timeouts on their operations meaning that if the cluster endpoint is not accessible, but is DNS resolve-able, to let's say a private IP address because the cluster was configured with a private only endpoint the command will run for.. quite some time. Since for loop iterates a maximum of 60 times, and each command is relying on the default timeouts for wget (900 seconds) and curl (3600 seconds) the resource can sit there for upwards of 60+hrs waiting for the resource to be created. The commands should be altered to include an explicit maximum timeout period to ensure it doesn't sit there attempting to accelerate the heat death of the universe.

For wget this is achieved via the addition of the -t 60 CLI option, and for curl the --max-time 60 CLI option.

Versions

Terraform: 0.14.7

Reproduction

Steps to reproduce the behavior:

create a VPC with an RFC1918 address block
create the minimal EKS cluster as per the example and disable the public endpoint

Code Snippet to Reproduce

Expected behavior

The null_resource creation should fail after 60 (ish) minutes (or less if desired)

Actual behavior

The null_resource continues to try for hours if left to it

Terminal Output Screenshot(s)

module.eks.null_resource.wait_for_cluster: Still creating... [56m40s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [56m50s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m0s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m10s elapsed]
module.eks.null_resource.wait_for_cluster: Still creating... [57m20s elapsed]

Additional context

The text was updated successfully, but these errors were encountered:

tfhartmann · 2021-05-14T17:38:17Z

@gwvandesteeg I think this issue may have been submitted to the wrong repo/module. Was the issue you were having on the Transit Gateway module, or the EKS module?

gwvandesteeg · 2021-05-15T11:37:57Z

Ah yes, correct.. looks like the wrong module. definately the EKS module.

github-actions · 2022-10-28T02:43:50Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tfhartmann added wontfix and removed wontfix labels May 14, 2021

tfhartmann closed this as completed May 17, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

gwvandesteeg commented Mar 10, 2021

tfhartmann commented May 14, 2021

gwvandesteeg commented May 15, 2021

github-actions bot commented Oct 28, 2022

null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

null_resource.wait_for_cluster can run for hours if the endpoint is resolve-able but not accessible #39

Comments

gwvandesteeg commented Mar 10, 2021

Description

Versions

Reproduction

Code Snippet to Reproduce

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

tfhartmann commented May 14, 2021

gwvandesteeg commented May 15, 2021

github-actions bot commented Oct 28, 2022