Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingress: Skipper sometimes take long to become ready. #647

Closed
mikkeloscar opened this issue May 15, 2018 · 1 comment · Fixed by #649 or zalando-incubator/kubernetes-on-aws#1042
Closed

Comments

@mikkeloscar
Copy link
Member

I have noticed that skipper sometimes takes minutes to get ready on a new node.

Here's an example showing that it took skipper 2 min. to become ready from the first health-check to the first successful one:

kubectl --namespace kube-system logs -f skipper-ingress-8lp6m                                                                          [0]
[APP]time="2018-05-15T10:04:56Z" level=info msg="register sigterm handler"
[APP]time="2018-05-15T10:04:56Z" level=info msg="enabled ratelimiters true: []"
[APP]time="2018-05-15T10:04:56Z" level=info msg="Expose metrics in all format"
[APP]time="2018-05-15T10:04:56Z" level=info msg="support listener on :9911"
[APP]time="2018-05-15T10:04:56Z" level=info msg="proxy listener on :9999"
[APP]time="2018-05-15T10:04:56Z" level=info msg="certPathTLS or keyPathTLS not found, defaulting to HTTP"
172.31.3.55 - - [15/May/2018:10:05:08 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:05:18 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:05:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:05:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:05:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:05:28 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:05:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:05:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:05:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:05:38 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:05:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:05:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:05:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:05:48 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:05:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:05:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:05:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:05:58 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:08 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:17 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:17 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:17 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:18 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:27 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:28 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:37 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:38 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:47 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:48 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
172.31.3.206 - - [15/May/2018:10:06:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:06:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:06:57 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:06:58 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
[APP]time="2018-05-15T10:07:06Z" level=error msg="failed to load all: Get https://10.3.0.1:443/apis/extensions/v1beta1/ingresses: dial tcp 10.3.0.1:443: connect: connection timed out"
[APP]time="2018-05-15T10:07:06Z" level=error msg="error while receiveing initial data;Get https://10.3.0.1:443/apis/extensions/v1beta1/ingresses: dial tcp 10.3.0.1:443: connect: connection timed out"
172.31.3.206 - - [15/May/2018:10:07:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.15.192 - - [15/May/2018:10:07:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.19.12 - - [15/May/2018:10:07:07 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "ELB-HealthChecker/2.0" 0 172.31.3.55:9999 
172.31.3.55 - - [15/May/2018:10:07:08 +0000] "GET /kube-system/healthz HTTP/1.1" 404 10 "" "kube-probe/1.9" 0 172.31.3.55:9999 
[APP]time="2018-05-15T10:07:10Z" level=info msg="route settings, reset, route: kube_default__mlarsen__mlarsen_teapot_zalan_do____mlarsen: Host(/^mlarsen[.]teapot[.]zalan[.]do$/) -> \"http://10.2.12.18:8080\""
[APP]time="2018-05-15T10:07:10Z" level=info msg="route settings, reset, route: kube__healthz: Path(\"/kube-system/healthz\") && SourceFromLast(\"10.0.0.0/8\", \"192.168.0.0/16\", \"172.16.0.0/12\", \"127.0.0.1/8\", \"fd00::/8\", \"::1/128\") -> status(200) -> <shunt>"
[APP]time="2018-05-15T10:07:10Z" level=info msg="route settings, reset, route: kube__redirect: PathRegexp(/.*/) && Header(\"X-Forwarded-Proto\", \"http\") && HeaderRegexp(\"X-Forwarded-Port\", /.*/) -> redirectTo(301, \"https:\") -> <shunt>"
[APP]time="2018-05-15T10:07:10Z" level=info msg="route settings received"
[APP]time="2018-05-15T10:07:10Z" level=info msg="filterRoutes incoming=3 outgoing=3"
[APP]time="2018-05-15T10:07:10Z" level=info msg="route settings applied"

I imagine the problem is skipper racing with kube-proxy calling the kubernetes service IP before the iptable rules has been setup. And because of a long timeout in the client that calls the kube-apiserver, this leads to a long recovery time.

@szuecs
Copy link
Member

szuecs commented May 15, 2018

@mikkeloscar Thanks for creating the, this will be caused by the http.Transport being used in the kubernetes dataclient and AWS using DISCARD to drop the packet, instead of sending back a TCP packet with RST flags set.

szuecs added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 15, 2018
adds unverifiedAuditLog() filter
add oauthTokeninfo*() filters for svc-to-svc authnz
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
mikkeloscar added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 25, 2018
aermakov-zalando added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 28, 2018
mikkeloscar added a commit to zalando-incubator/kubernetes-on-aws that referenced this issue May 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants