Cluster never comes available after moving to 9.0.0 #757

robgeiner · 2020-02-29T14:48:58Z

After moving to 9.0.0, the cluster availability check in null_resource.wait_for_cluster fails to detect when the cluster becomes available. The result is the is that it spins forever waiting. This appears to be related to 750

Workaround is to override the wait_for_cluster_cmd and use the default value prior to 750 e.g.
wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

I'm submitting a...

[ x] bug report
feature request
support request - read the FAQ first!
kudos, thank you, warm fuzzy

What is the current behavior?

module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [20s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [30s elapsed]
...
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m42s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m52s elapsed]
eventually times out

If this is a bug, how to reproduce? Please include a code sample if relevant.

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "9.0.0"
cluster_name = local.eks_cluster_name
cluster_version = var.eks_k8s_version
subnets = var.private_subnet_ids
vpc_id = var.vpc_id
enable_irsa = true
tags = merge(var.eks_tags,local.env_tags)
cluster_enabled_log_types = var.cluster_enabled_log_types
cluster_log_retention_in_days = var.cluster_log_retention_in_days
workers_additional_policies = concat(["${aws_iam_policy.alb_ingress_node_policy.id}","arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy","arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"],var.workers_additional_policies)
workers_group_defaults = var.workers_group_defaults
worker_groups_launch_template = var.worker_groups_launch_template
node_groups_defaults = var.node_groups_defaults
node_groups = var.node_groups

manage_aws_auth = true
map_roles = [
{
rolearn = data.aws_iam_role.sso_admin.arn
username = "sso-admin"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_pu.arn
username = "sso-pu"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_ro.arn
username = "sso-ro"
groups = ["system:authenticated"]
},

]
}

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Affected module version: 9.0.0
OS: os-x
Terraform version: 0.12.21

Any other relevant info

robgeiner · 2020-02-29T16:22:06Z

wget --no-check-certificate -O - $ENDPOINT/healthz
--2020-02-29 11:20:14-- https://bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)... 3.229.39.55, 52.3.37.102
Connecting to bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)|3.229.39.55|:443... connected.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.

curl -k -s $ENDPOINT/healthz
ok%

barryib · 2020-02-29T22:06:51Z

What is your wget version ? can you please share your wget output in debug mode --debug?

robgeiner · 2020-02-29T22:16:43Z

wget --debug --no-check-certificate -O - https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Setting --check-certificate (checkcertificate) to 0
Setting --output-document (outputdocument) to -
DEBUG output created by Wget 1.17.1 on darwin14.5.0.

Reading HSTS entries from /Users/geiner/.wget-hsts
URI encoding = ‘UTF-8’
--2020-02-29 17:12:50-- https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)... 3.225.XXX.XXX, 52.23.XXX.XXX
Caching 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com => 3.225.XXX.XXX 52.23.XXX.XXX
Connecting to 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)|3.225.XXX.XXX|:443... connected.
Created socket 6.
Releasing 0x00007fccd7001940 (new refcount 1).
Initiating SSL handshake.
SSL handshake failed.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Closed fd 6
Unable to establish SSL connection.
Saving HSTS entries to /Users/geiner/.wget-hsts

daroga0002 · 2020-03-03T10:42:31Z

It looks that AWS API endpoint accepts only TLS 1.2 protocol which is case here.

Your Wget version looks to be 1.17.1 which is from 2015, so quite old. I have checked GNU Wget 1.18 and newer which are working as expected.

So you should update wget or you can return to previous method via overwriting default value for wait_for_cluster_cmd.

barryib · 2020-03-04T10:08:15Z

Yes as @daroga0002 mentioned, you're trying to do TLS 1.0 instead of TLS 1.2. Upgrade your wget or use curl please.

@robgeiner Closing this. Feel free to reopen this issue if #757 (comment) doesn't help.

hiteshjoshi1 · 2020-05-03T14:47:46Z

@robgeiner I am a terraform noob here. I am following this -
https://learn.hashicorp.com/terraform/kubernetes/provision-eks-cluster
And hitting the wget issue. Where should I add this

wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

to make it work without failing for wget. Thanks

daroga0002 · 2020-05-04T06:55:32Z

@hiteshjoshi1 for example into this line of example main.tf file:

terraform-aws-eks/examples/basic/main.tf

Line 129 in 7afecf6

robgeiner · 2020-05-04T12:27:47Z

Yep, what @daroga0002 said. For example,

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "11.0.0"
  cluster_name    = local.eks_cluster_name
  wait_for_cluster_cmd          = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"
  ...
}

hiteshjoshi1 · 2020-05-04T13:52:01Z

Thanks it worked.

github-actions · 2022-11-26T02:16:07Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

robgeiner changed the title ~~Cluster never come available after moving to 9.0.0~~ Cluster never comes available after moving to 9.0.0 Feb 29, 2020

barryib closed this as completed Mar 4, 2020

robgeiner mentioned this issue Mar 4, 2020

Fails to create cluster and gets stuck in wget loop #760

Closed

4 tasks

daroga0002 mentioned this issue Mar 12, 2020

Deployment never completes but cluster is active #777

Closed

4 tasks

supermodo mentioned this issue May 21, 2020

Cluster availability check in null_resource.wait_for_cluster fails to detect when the cluster becomes available jenkins-x/terraform-aws-eks-jx#52

Closed

MEOWMEOW114 mentioned this issue Jul 27, 2020

Problem provisioning (wget issue) hashicorp/learn-terraform-provision-eks-cluster#14

Closed

nickmelis mentioned this issue Oct 29, 2020

Timeout error running on Mac hashicorp/learn-terraform-provision-eks-cluster#22

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster never comes available after moving to 9.0.0 #757

Cluster never comes available after moving to 9.0.0 #757

robgeiner commented Feb 29, 2020

robgeiner commented Feb 29, 2020 •

edited

Loading

barryib commented Feb 29, 2020

robgeiner commented Feb 29, 2020

daroga0002 commented Mar 3, 2020 •

edited

Loading

barryib commented Mar 4, 2020

hiteshjoshi1 commented May 3, 2020

daroga0002 commented May 4, 2020 •

edited

Loading

robgeiner commented May 4, 2020

hiteshjoshi1 commented May 4, 2020

github-actions bot commented Nov 26, 2022

Cluster never comes available after moving to 9.0.0 #757

Cluster never comes available after moving to 9.0.0 #757

Comments

robgeiner commented Feb 29, 2020

After moving to 9.0.0, the cluster availability check in null_resource.wait_for_cluster fails to detect when the cluster becomes available. The result is the is that it spins forever waiting. This appears to be related to 750

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

robgeiner commented Feb 29, 2020 • edited Loading

barryib commented Feb 29, 2020

robgeiner commented Feb 29, 2020

daroga0002 commented Mar 3, 2020 • edited Loading

barryib commented Mar 4, 2020

hiteshjoshi1 commented May 3, 2020

daroga0002 commented May 4, 2020 • edited Loading

robgeiner commented May 4, 2020

hiteshjoshi1 commented May 4, 2020

github-actions bot commented Nov 26, 2022

robgeiner commented Feb 29, 2020 •

edited

Loading

daroga0002 commented Mar 3, 2020 •

edited

Loading

daroga0002 commented May 4, 2020 •

edited

Loading