Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

server not able to start #164

Closed
babatundebusari opened this issue Oct 11, 2018 · 9 comments
Closed

server not able to start #164

babatundebusari opened this issue Oct 11, 2018 · 9 comments

Comments

@babatundebusari
Copy link

babatundebusari commented Oct 11, 2018

having issues with the server starting up properly
i am getting the below errors

$ kubectl logs -f po/kiam-server-whqxp

{"level":"info","msg":"starting server","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"detecting arn prefix","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"using detected prefix: arn:aws:iam::12345678:role/","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"will serve on 0.0.0.0:443","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 0","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 1","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 2","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 3","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 4","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 5","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 6","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"starting credential manager process 7","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started cache controller","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"started namespace cache controller","time":"2018-10-11T18:42:19Z"}
{"level":"info","msg":"stopping server","time":"2018-10-11T18:42:54Z"}
{"level":"info","msg":"stopping prometheus metric listener","time":"2018-10-11T18:42:54Z"}
{"level":"info","msg":"stopped","time":"2018-10-11T18:42:54Z"}

and not sure how to get a more descriptive error than that...no insight to why it is failing

@babatundebusari
Copy link
Author

not sure if this will ever get help
@tasdikrahman did this work for you? I am getting this error and am sure it should be something am missing somewhere if it works for everyone

thanks

@tasdikrahman
Copy link
Contributor

increasing the verbosity of the logs should be able to tell us more about what's happening, check this comment by @pingles here #17 (comment)

@pingles
Copy link
Contributor

pingles commented Oct 17, 2018

@babatundebusari could you add some detail as to why the server process is being stopped please? My guess is that it's your health check that's failing. As per #115, the gRPC lib only seems to expose more information when you set the environment variables specified.

@babatundebusari
Copy link
Author

babatundebusari commented Oct 18, 2018

@tasdikrahman where do i add this in this file?
GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=8

https://github.com/uswitch/kiam/blob/master/deploy/server.yaml

@pingles
While troubleshooting when i run the server health check when inside the container after removing the liveness/readiness probe part so the container is tricked to running..then i get this error..i have also added other helpful outputs of other commands to help show state of what i have

$ kubectl get pods | grep kiam

kiam-agent-wg5zs                                              0/1       CrashLoopBackOff   345        1d
kiam-agent-xcmpb                                              0/1       CrashLoopBackOff   345        1d
kiam-agent-z7dlr                                              0/1       CrashLoopBackOff   346        1d
kiam-server-5t5bl                                             1/1       Running            0          1d
kiam-server-j5czm                                             1/1       Running            0          1d
kiam-server-wdzcw                                             1/1       Running            0          1d


$ kubectl exec kiam-server-wdzcw -- /kiam health --cert=/etc/kubernetes/certs/kiam-server.pem --key=/etc/kubernetes/certs/kiam-server-key.pem --ca=/etc/kubernetes/certs/kubernetes-ca.pem --server-address=127.0.0.1:443 --server-address-refresh=2s --timeout=5s --gateway-timeout-creation=50ms

time="2018-10-18T02:12:05Z" level=fatal msg="error creating server gateway: error dialing grpc server: context deadline exceeded"
command terminated with exit code 1


$ kubectl exec kiam-server-wdzcw -- netstat -antp

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:36124         127.0.0.1:443           TIME_WAIT   -
tcp        0      0 192.168.216.69:34410    192.168.0.1:443         ESTABLISHED 1/kiam
tcp        0      0 :::443                  :::*                    LISTEN      1/kiam
tcp        0      0 :::9620                 :::*                    LISTEN      1/kiam


$ kubectl exec kiam-server-wdzcw -- ps aux

PID   USER     TIME  COMMAND
    1 root      4:37 /kiam server --json-log --level=info --bind=0.0.0.0:443 --cert=/etc/kubernetes/certs/kiam-server.pem --key=/etc/kubernetes/certs/kiam-server-key.pem --ca=/etc/kubernetes/certs/kubernetes-ca.pem --role-base-arn-autodetect --session-duration=15m --sync=1m --prometheus-listen-addr=0.0.0.0:9620 --prometheus-sync-interval=5s
 2283 root      0:00 ps aux

P.S. i removed readiness/liveness probe sections to have the kiam-server show as running above

@pingles
Copy link
Contributor

pingles commented Oct 22, 2018

@babatundebusari if you add those environment variables so that they're picked up by your health check command- then hopefully your process will output some better information about the error.

If it's not a TLS issue (which those env vars would show) then I'd suspect you need to increase the --gateway-timeout-creation=50ms to a longer period: the gRPC client-side load balancing may take longer than that to initialise the pool data.

@pingles
Copy link
Contributor

pingles commented Oct 23, 2018 via email

@babatundebusari
Copy link
Author

@pingles

i have gotten to work now
issue was with the TLS; the cretificate dns name has to match well and the kiam-agent are only able to resolve kiam-server via dns so i had to have the server cert use kiam-server as dns name.

Now the issue am seeing now is in the liveness and readiness probes for the server daemonset yaml file. When i have this the server fails but when i take it out it works fine

Something is wrong with the health check commands. I have tried all i can to make it work even adjusting the gateway-timeout-creation duration and still not working.

@babatundebusari
Copy link
Author

i will now close this BUT the server health checks in livenessprobe and readinessprobe are not working, will continue to work on fixing that and if no fix yet then maybe open in another issue

@jaygorrell
Copy link

For anyone else landing here, I had the same issue.

Because 127.0.0.1 was recently removed from the certs, I had to change the health checks to use --server-address=localhost:443 instead of 127.0.0.1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants