server not able to start #164

babatundebusari · 2018-10-11T18:44:14Z

having issues with the server starting up properly
i am getting the below errors

$ kubectl logs -f po/kiam-server-whqxp

{"level":"info","msg":"starting server","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"detecting arn prefix","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"using detected prefix: arn:aws:iam::12345678:role/","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"will serve on 0.0.0.0:443","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 0","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 1","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 2","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 3","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 4","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 5","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 6","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 7","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started cache controller","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started namespace cache controller","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"stopping server","time":"2018-10-11T18:42:54Z"}
{"level":"info","msg":"stopping prometheus metric listener","time":"2018-10-11T18:42:54Z"}
{"level":"info","msg":"stopped","time":"2018-10-11T18:42:54Z"}

and not sure how to get a more descriptive error than that...no insight to why it is failing

The text was updated successfully, but these errors were encountered:

babatundebusari · 2018-10-15T13:30:49Z

not sure if this will ever get help
@tasdikrahman did this work for you? I am getting this error and am sure it should be something am missing somewhere if it works for everyone

thanks

tasdikrahman · 2018-10-17T07:33:45Z

increasing the verbosity of the logs should be able to tell us more about what's happening, check this comment by @pingles here #17 (comment)

pingles · 2018-10-17T08:16:34Z

@babatundebusari could you add some detail as to why the server process is being stopped please? My guess is that it's your health check that's failing. As per #115, the gRPC lib only seems to expose more information when you set the environment variables specified.

babatundebusari · 2018-10-18T02:15:59Z

@tasdikrahman where do i add this in this file?
GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=8

https://github.com/uswitch/kiam/blob/master/deploy/server.yaml

@pingles
While troubleshooting when i run the server health check when inside the container after removing the liveness/readiness probe part so the container is tricked to running..then i get this error..i have also added other helpful outputs of other commands to help show state of what i have

$ kubectl get pods | grep kiam

kiam-agent-wg5zs                                              0/1       CrashLoopBackOff   345        1d
kiam-agent-xcmpb                                              0/1       CrashLoopBackOff   345        1d
kiam-agent-z7dlr                                              0/1       CrashLoopBackOff   346        1d
kiam-server-5t5bl                                             1/1       Running            0          1d
kiam-server-j5czm                                             1/1       Running            0          1d
kiam-server-wdzcw                                             1/1       Running            0          1d


$ kubectl exec kiam-server-wdzcw -- /kiam health --cert=/etc/kubernetes/certs/kiam-server.pem --key=/etc/kubernetes/certs/kiam-server-key.pem --ca=/etc/kubernetes/certs/kubernetes-ca.pem --server-address=127.0.0.1:443 --server-address-refresh=2s --timeout=5s --gateway-timeout-creation=50ms

time="2018-10-18T02:12:05Z" level=fatal msg="error creating server gateway: error dialing grpc server: context deadline exceeded"
command terminated with exit code 1


$ kubectl exec kiam-server-wdzcw -- netstat -antp

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:36124         127.0.0.1:443           TIME_WAIT   -
tcp        0      0 192.168.216.69:34410    192.168.0.1:443         ESTABLISHED 1/kiam
tcp        0      0 :::443                  :::*                    LISTEN      1/kiam
tcp        0      0 :::9620                 :::*                    LISTEN      1/kiam


$ kubectl exec kiam-server-wdzcw -- ps aux

PID   USER     TIME  COMMAND
    1 root      4:37 /kiam server --json-log --level=info --bind=0.0.0.0:443 --cert=/etc/kubernetes/certs/kiam-server.pem --key=/etc/kubernetes/certs/kiam-server-key.pem --ca=/etc/kubernetes/certs/kubernetes-ca.pem --role-base-arn-autodetect --session-duration=15m --sync=1m --prometheus-listen-addr=0.0.0.0:9620 --prometheus-sync-interval=5s
 2283 root      0:00 ps aux

P.S. i removed readiness/liveness probe sections to have the kiam-server show as running above

pingles · 2018-10-22T10:49:08Z

@babatundebusari if you add those environment variables so that they're picked up by your health check command- then hopefully your process will output some better information about the error.

If it's not a TLS issue (which those env vars would show) then I'd suspect you need to increase the --gateway-timeout-creation=50ms to a longer period: the gRPC client-side load balancing may take longer than that to initialise the pool data.

pingles · 2018-10-23T07:37:42Z

That's handled by a Kubernetes DNS component like Kube DNS- once the server processes sit behind a Service they should be resolvable within the cluster. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

…

On Mon, 22 Oct 2018 at 22:01, babatundebusari ***@***.***> wrote: @pingles <https://github.com/pingles> @tasdikrahman <https://github.com/tasdikrahman> can you please explain how the kiam-agent is able to resolve the server address by kiam-server:443? hos does it know or resolve the kiam-server is? issue is having the kiam-agent reolve the kiam-server hostname properly now — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEfhx4rKKRn_5I3JAwjQB3AzXIqa2Nks5unjI6gaJpZM4XYJRf> .

babatundebusari · 2018-10-23T17:46:57Z

@pingles

i have gotten to work now
issue was with the TLS; the cretificate dns name has to match well and the kiam-agent are only able to resolve kiam-server via dns so i had to have the server cert use kiam-server as dns name.

Now the issue am seeing now is in the liveness and readiness probes for the server daemonset yaml file. When i have this the server fails but when i take it out it works fine

Something is wrong with the health check commands. I have tried all i can to make it work even adjusting the gateway-timeout-creation duration and still not working.

babatundebusari · 2018-10-24T05:51:50Z

i will now close this BUT the server health checks in livenessprobe and readinessprobe are not working, will continue to work on fixing that and if no fix yet then maybe open in another issue

jaygorrell · 2019-07-14T02:06:14Z

For anyone else landing here, I had the same issue.

Because 127.0.0.1 was recently removed from the certs, I had to change the health checks to use --server-address=localhost:443 instead of 127.0.0.1.

babatundebusari mentioned this issue Oct 17, 2018

Improve TLS error reporting #115

Open

babatundebusari closed this as completed Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server not able to start #164

server not able to start #164

babatundebusari commented Oct 11, 2018 •

edited

babatundebusari commented Oct 15, 2018

tasdikrahman commented Oct 17, 2018

pingles commented Oct 17, 2018

babatundebusari commented Oct 18, 2018 •

edited

pingles commented Oct 22, 2018

pingles commented Oct 23, 2018 via email

babatundebusari commented Oct 23, 2018

babatundebusari commented Oct 24, 2018

jaygorrell commented Jul 14, 2019

server not able to start #164

server not able to start #164

Comments

babatundebusari commented Oct 11, 2018 • edited

babatundebusari commented Oct 15, 2018

tasdikrahman commented Oct 17, 2018

pingles commented Oct 17, 2018

babatundebusari commented Oct 18, 2018 • edited

pingles commented Oct 22, 2018

pingles commented Oct 23, 2018 via email

babatundebusari commented Oct 23, 2018

babatundebusari commented Oct 24, 2018

jaygorrell commented Jul 14, 2019

babatundebusari commented Oct 11, 2018 •

edited

babatundebusari commented Oct 18, 2018 •

edited