Skip to content

Commit

Permalink
modules/aws/vpc/sg-etcd: Add ingress 10250 from master
Browse files Browse the repository at this point in the history
Patterned on the existing worker_ingress_kubelet_insecure_from_master
from b620c16 (modules/aws: tighten security groups, 2017-04-19,
coreos/tectonic-installer#264).

This should address errors like:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/150/pull-ci-origin-installer-e2e-aws/546/artifacts/e2e-aws/nodes/ip-10-0-52-134.ec2.internal/journal.gz | zcat >journal
  $ journalread journal | grep 'current config label' | head -n3
  2018-08-20T20:30:08.000827895Z I0820 20:30:08.826840       1 tnc.go:375] Node ip-10-0-134-147.ec2.internal does not have a current config label
  2018-08-20T20:30:08.00082814Z  I0820 20:30:08.826860       1 tnc.go:375] Node ip-10-0-153-195.ec2.internal does not have a current config label
  2018-08-20T20:30:08.000828371Z I0820 20:30:08.826866       1 tnc.go:375] Node ip-10-0-166-239.ec2.internal does not have a current config label

on the master node:

  $ journalread journal | grep -A15 'Starting Ignition' | grep -v INFO
  2018-08-20T20:21:40.00097323Z  Starting Ignition (files)...
  2018-08-20T20:21:40.000991225Z DEBUG    : parsed url from cmdline: ""
  2018-08-20T20:21:41.000010266Z DEBUG    : parsing config: {
  2018-08-20T20:21:41.0000122Z   "ignition": {
  2018-08-20T20:21:41.000014165Z "config": {
  2018-08-20T20:21:41.000016186Z "append": [
  2018-08-20T20:21:41.000018193Z {
  2018-08-20T20:21:41.000023296Z "source": "http://ci-op-imi5mbig-68485-tnc.origin-ci-int-aws.dev.rhcloud.com:80/config/master",

which were resulting in:

  $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/150/pull-ci-origin-installer-e2e-aws/546/build-log.txt | grep Ginkgo
    |  Ginkgo timed out waiting for all parallel nodes to report back!  |
  Ginkgo ran 1 suite in 10m6.626061944s

Inbound 10250 is the kubelet API used by the control plane [1].
Clayton suspects the e2e-aws tests are trying to get metrics from the
kubelets, and hanging on the etcd kubelet because this rule was
missing.  I'm not clear why we've only been seeing this issue for the
last week though.

In the commands above, journalread is from [2].

[1]: https://kubernetes.io/docs/setup/independent/install-kubeadm/#worker-node-s
[2]: https://github.com/smarterclayton/journalread
  • Loading branch information
wking committed Aug 20, 2018
1 parent 6f3f522 commit 8e751d7
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions modules/aws/vpc/sg-etcd.tf
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,13 @@ resource "aws_security_group_rule" "etcd_ingress_peer" {
to_port = 2380
self = true
}

resource "aws_security_group_rule" "etcd_ingress_kubelet_insecure_from_master" {
type = "ingress"
security_group_id = "${aws_security_group.etcd.id}"
source_security_group_id = "${aws_security_group.master.id}"

protocol = "tcp"
from_port = 10250
to_port = 10250
}

0 comments on commit 8e751d7

Please sign in to comment.