Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster never comes available after moving to 9.0.0 #757

Closed
3 tasks
robgeiner opened this issue Feb 29, 2020 · 10 comments
Closed
3 tasks

Cluster never comes available after moving to 9.0.0 #757

robgeiner opened this issue Feb 29, 2020 · 10 comments

Comments

@robgeiner
Copy link

After moving to 9.0.0, the cluster availability check in null_resource.wait_for_cluster fails to detect when the cluster becomes available. The result is the is that it spins forever waiting. This appears to be related to 750

Workaround is to override the wait_for_cluster_cmd and use the default value prior to 750 e.g.
wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

I'm submitting a...

  • [ x] bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [20s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [30s elapsed]
...
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m42s elapsed]
module.eks.module.cluster.null_resource.wait_for_cluster[0]: Still creating... [1h0m52s elapsed]
eventually times out

If this is a bug, how to reproduce? Please include a code sample if relevant.

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "9.0.0"
cluster_name = local.eks_cluster_name
cluster_version = var.eks_k8s_version
subnets = var.private_subnet_ids
vpc_id = var.vpc_id
enable_irsa = true
tags = merge(var.eks_tags,local.env_tags)
cluster_enabled_log_types = var.cluster_enabled_log_types
cluster_log_retention_in_days = var.cluster_log_retention_in_days
workers_additional_policies = concat(["${aws_iam_policy.alb_ingress_node_policy.id}","arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy","arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"],var.workers_additional_policies)
workers_group_defaults = var.workers_group_defaults
worker_groups_launch_template = var.worker_groups_launch_template
node_groups_defaults = var.node_groups_defaults
node_groups = var.node_groups

manage_aws_auth = true
map_roles = [
{
rolearn = data.aws_iam_role.sso_admin.arn
username = "sso-admin"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_pu.arn
username = "sso-pu"
groups = ["system:masters"]
},
{
rolearn = data.aws_iam_role.sso_ro.arn
username = "sso-ro"
groups = ["system:authenticated"]
},

]
}

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: 9.0.0
  • OS: os-x
  • Terraform version: 0.12.21

Any other relevant info

@robgeiner robgeiner changed the title Cluster never come available after moving to 9.0.0 Cluster never comes available after moving to 9.0.0 Feb 29, 2020
@robgeiner
Copy link
Author

robgeiner commented Feb 29, 2020

wget --no-check-certificate -O - $ENDPOINT/healthz
--2020-02-29 11:20:14-- https://bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)... 3.229.39.55, 52.3.37.102
Connecting to bf589asd1d6166cb4d8cc7243625e4e.gr7.us-east-1.eks.amazonaws.com (bf589dec1d6166cb4d8cc770df625e4e.gr7.us-east-1.eks.amazonaws.com)|3.229.39.55|:443... connected.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Unable to establish SSL connection.

curl -k -s $ENDPOINT/healthz
ok%

@barryib
Copy link
Member

barryib commented Feb 29, 2020

What is your wget version ? can you please share your wget output in debug mode --debug?

@robgeiner
Copy link
Author

wget --debug --no-check-certificate -O - https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Setting --check-certificate (checkcertificate) to 0
Setting --output-document (outputdocument) to -
DEBUG output created by Wget 1.17.1 on darwin14.5.0.

Reading HSTS entries from /Users/geiner/.wget-hsts
URI encoding = ‘UTF-8’
--2020-02-29 17:12:50-- https://2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com/healthz
Resolving 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)... 3.225.XXX.XXX, 52.23.XXX.XXX
Caching 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com => 3.225.XXX.XXX 52.23.XXX.XXX
Connecting to 2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com (2157EAD4C8AB1C95957XXXXXXXXXXXXX.gr7.us-east-1.eks.amazonaws.com)|3.225.XXX.XXX|:443... connected.
Created socket 6.
Releasing 0x00007fccd7001940 (new refcount 1).
Initiating SSL handshake.
SSL handshake failed.
OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Closed fd 6
Unable to establish SSL connection.
Saving HSTS entries to /Users/geiner/.wget-hsts

@daroga0002
Copy link
Contributor

daroga0002 commented Mar 3, 2020

It looks that AWS API endpoint accepts only TLS 1.2 protocol which is case here.

Your Wget version looks to be 1.17.1 which is from 2015, so quite old. I have checked GNU Wget 1.18 and newer which are working as expected.

So you should update wget or you can return to previous method via overwriting default value for wait_for_cluster_cmd.

@barryib
Copy link
Member

barryib commented Mar 4, 2020

Yes as @daroga0002 mentioned, you're trying to do TLS 1.0 instead of TLS 1.2. Upgrade your wget or use curl please.

@robgeiner Closing this. Feel free to reopen this issue if #757 (comment) doesn't help.

@hiteshjoshi1
Copy link

@robgeiner I am a terraform noob here. I am following this -
https://learn.hashicorp.com/terraform/kubernetes/provision-eks-cluster
And hitting the wget issue. Where should I add this

wait_for_cluster_cmd = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"

to make it work without failing for wget. Thanks

@daroga0002
Copy link
Contributor

daroga0002 commented May 4, 2020

@hiteshjoshi1 for example into this line of example main.tf file:

@robgeiner
Copy link
Author

Yep, what @daroga0002 said. For example,

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "11.0.0"
  cluster_name    = local.eks_cluster_name
  wait_for_cluster_cmd          = "until curl -k -s $ENDPOINT/healthz >/dev/null; do sleep 4; done"
  ...
}  

@hiteshjoshi1
Copy link

Thanks it worked.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants