Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssh keys too permissive in docker image #1013

Closed
burnettk opened this issue Apr 26, 2019 · 16 comments

Comments

5 participants
@burnettk
Copy link

commented Apr 26, 2019

What were you trying to do?

swap out deployment, like this:

telepresence --swap-deployment goalkeeper --run-shell

What did you expect to happen?

deployment is successfully replaced in the k8s cluster.

What happened instead?

it seems like the proxy deployment failed to come up because the docker image's ssh key files (such as /etc/ssh/ssh_host_rsa_key) have linux permissions that are too permissive. chmod 0600 is desirable, but 0640 is found, and although it says "warning" a number of times, it ends up being fatal, it seems, because it can't find any keys. the docker image it was using was:

datawire/telepresence-k8s:0.99

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--swap-deployment', 'goalkeeper-qa8', '--run-shell']
Version: 0.99
Python version: 3.7.3 (default, Mar 29 2019, 07:52:52) [Clang 10.0.1 (clang-1001.0.46.3)]
kubectl version: Client Version: v1.11.0 // Server Version: v1.13.5
oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc')
OS: Darwin rslmac14810.example.com 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

Traceback (most recent call last):
  File "/usr/local/bin/telepresence/telepresence/cli.py", line 136, in crash_reporting
    yield
  File "/usr/local/bin/telepresence/telepresence/main.py", line 60, in main
    remote_info = start_proxy(runner)
  File "/usr/local/bin/telepresence/telepresence/proxy/__init__.py", line 95, in start_proxy
    run_id=run_id,
  File "/usr/local/bin/telepresence/telepresence/proxy/remote.py", line 202, in get_remote_info
    wait_for_pod(runner, remote_info)
  File "/usr/local/bin/telepresence/telepresence/proxy/remote.py", line 134, in wait_for_pod
    "Pod isn't starting or can't be found: {}".format(pod["status"])
RuntimeError: Pod isn't starting or can't be found: {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2019-04-26T20:08:06Z', 'status': 'True', 'type': 'Initialized'}, {'lastProbeTime': None, 'lastTransitionTime': '2019-04-26T20:08:06Z', 'message': 'containers with unready status: [goalkeeper-qa8]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'Ready'}, {'lastProbeTime': None, 'lastTransitionTime': '2019-04-26T20:08:06Z', 'message': 'containers with unready status: [goalkeeper-qa8]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'ContainersReady'}, {'lastProbeTime': None, 'lastTransitionTime': '2019-04-26T20:08:06Z', 'status': 'True', 'type': 'PodScheduled'}], 'containerStatuses': [{'containerID': 'docker://aa93b526f3065f567f1d18b813df07b0db56c8711d4bd4ecc43b037604d73c65', 'image': 'datawire/telepresence-k8s:0.99', 'imageID': 'docker-pullable://datawire/telepresence-k8s@sha256:f9b640fa6640a0437cbafb0a3a238b10daf86e117e5a3ca4d4cd4b07eee76f9c', 'lastState': {'terminated': {'containerID': 'docker://aa93b526f3065f567f1d18b813df07b0db56c8711d4bd4ecc43b037604d73c65', 'exitCode': 1, 'finishedAt': '2019-04-26T20:09:51Z', 'message': "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_rsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_rsa_key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_dsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_dsa_key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_ecdsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_ed25519_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_ed25519_key
sshd: no hostkeys available -- exiting.
@ark3

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

That's odd. I cannot reproduce this; the image runs fine in Docker and in Kubernetes, as far as I can see. The keys are generated using ssh-keygen -A and not touched after. We haven't chosen the permissions. I'm stumped.

$ docker run --rm -it datawire/telepresence-k8s@sha256:f9b640fa6640a0437cbafb0a3a238b10daf86e117e5a3ca4d4cd4b07eee76f9c sh
~ $ cat run.sh 
#!/usr/bin/env sh
set -e
/usr/sbin/sshd -e
exec env PYTHONPATH=/usr/src/app twistd --pidfile= -n -y ./forwarder.py
~ $ /usr/sbin/sshd -e
~ $ 
@burnettk

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

i followed up your docker run with this:

~ $ stat  /etc/ssh/ssh_host_ed25519_key
  File: /etc/ssh/ssh_host_ed25519_key
  Size: 411       	Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d	Inode: 1925062     Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-04-17 19:32:09.000000000
Modify: 2019-04-17 19:32:09.000000000
Change: 2019-04-29 17:13:54.000000000

notice how the permissions are 0640 in the output, i believe because of this line (line 44 as i write this) in k8s-proxy/Dockerfile:

Step 11/16 : RUN chmod -R g+r /etc/ssh &&     chmod -R g+w /usr/src/app &&     echo "telepresence::1000:0:Telepresence User:/usr/src/app:/bin/ash" >> /etc/passwd

so i assume the permissions are the same for you (0640), but i wish i knew why the later symptom doesn't appear for you but does for me (and my colleagues). i can't get telepresence to work at all (i believe because of this).

thanks!

@ark3

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

When you try this using docker run, can you launch sshd successfully as I did? Or does it spew warnings and kill the container as it did for you in Kubernetes?

Edit: By "kill the container" I really mean "kill the run.sh script" of course, which is to say, does /usr/sbin/sshd -e exit with a non-zero status?

@burnettk

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

that works:

~ % docker run --rm -it datawire/telepresence-k8s@sha256:f9b640fa6640a0437cbafb0a3a238b10daf86e117e5a3ca4d4cd4b07eee76f9c sh
~ $ /usr/sbin/sshd -e
~ $ echo $?
0
@ark3

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

Alright, so it behaves the same for you in Docker as it does for me in Docker and for me in Kubernetes. Next question: Why does it behave differently in Kubernetes for you and your colleagues?

Can you try the Kubernetes equivalent of the above?

$ kubectl run experiment -it --rm --image=datawire/telepresence-k8s@sha256:f9b640fa6640a0437cbafb0a3a238b10daf86e117e5a3ca4d4cd4b07eee76f9c --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
~ $ /usr/sbin/sshd -e
~ $ echo $?
0

Thanks for your help debuggging this.

@burnettk

This comment has been minimized.

Copy link
Author

commented Apr 29, 2019

~ %  kubectl run experiment -it --rm --image=datawire/telepresence-k8s@sha256:f9b640fa6640a0437cbafb0a3a238b10daf86e117e5a3ca4d4cd4b07eee76f9c --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
~ $ /usr/sbin/sshd -e
~ $ echo $?
0

i think i may have figured it out. some of our deployments work, but the one that doesn't is running as root via this k8s instruction:

      securityContext:
        fsGroup: 0
        runAsUser: 0

if it's easy to alert the user to this case, that would be great, but now that i know what's going on, it won't be a problem for us going forward.

@ark3

This comment has been minimized.

Copy link
Contributor

commented Apr 29, 2019

Telepresence is supposed to use the datawire/telepresence-k8s-priv image for that case. It does this automatically when it notices that it needs to expose a privileged port. You can force it to use the privileged image by exposing a low port, e.g., --expose 80.

Thank your for chasing this down with me. I appreciate your help.

(History I should have remembered: #723, #737, #848, #875, and probably others.)

@alexanderbuhler

This comment has been minimized.

Copy link

commented May 13, 2019

Same issue with telepresence 0.99 here compared to working 0.97. It's not in privileged mode though, the port is 8080. Is there anything else in the changes between the versions which could be affecting this?

@ark3

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

@alexanderbuhler May I see a telepresence.log for your crash (as a Gist, please)? This is also for a swap-deployment, right? Is there anything in the original Deployment that would cause the pod to run as root?

@alexanderbuhler

This comment has been minimized.

Copy link

commented May 13, 2019

Sorry, my bad. Has nothing to do with the telepresence version.
It even fails with 0.97:

https://gist.github.com/alexanderbuhler/227fa04daef6e274c6432ac898670cc7

...

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_ecdsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_ed25519_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_ed25519_key
sshd: no hostkeys available -- exiting.

...

The only thing i can think of is updating nodes to v1.13.6. We didn't change anything in the deployment files which were working in the last >6 weeks.

EDIT: still works no problem with same deployment config on a 1.13.5 cluster... I've studied the 1.13.6 changelog but don't see anything beeing related.

Thanks!

@ark3 ark3 added this to To do in Tel Tracker via automation May 13, 2019

@keatz55

This comment has been minimized.

Copy link

commented May 25, 2019

First of all, I just want to thank the telepresence devs for your amazing work on this project. Secondly, I have also been running into this issue. Telepresence worked great out of the box for the first few days and then without changing anything I get the following as well:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0640 for '/etc/ssh/ssh_host_ed25519_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
key_load_private: bad permissions
Could not load host key: /etc/ssh/ssh_host_ed25519_key
sshd: no hostkeys available -- exiting.

However, running telepresence --expose 80 fixes things for me by forcing usage of the datawire/telepresence-k8s-priv image as opposed to the datawire/telepresence-k8s image as mentioned above.

@ark3

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

To clarify, if I change the permissions of those files to 0600, then that Telepresence image will not work in the normal/common case where the pod is not running as root. The sshd that is launched will be unable to read its keys and will fail to start.

The correct solution involves replacing sshd with something that isn't as fussy.

@ark3 ark3 removed this from To do in Tel Tracker Jun 3, 2019

@ark3 ark3 added this to In progress in Tel Tracker Jun 6, 2019

@ark3

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

If you experience this issue without having a runAsUser: 0 in your deployment/pod, and can reproduce it consistently, can you please report on this possible fix?

env TELEPRESENCE_VERSION=0.99-19-g1af769a telepresence --run curl -k https://kubernetes.default/api

The key is to run with that version environment variable set to that value. Thank you!
@keatz55 @alexanderbuhler

Edit to add: This change will be in the next release. Hopefully we can close this issue after that. We still need to address #723, #737, etc. with the proper fix I mentioned in my prior comment.

@bill-within

This comment has been minimized.

Copy link

commented Jun 6, 2019

@ark3 I had the issue consistently (without runAsUser: 0) and your env set fixed it

@bill-within

This comment has been minimized.

Copy link

commented Jun 6, 2019

amazing tool btw!

@ark3

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2019

The change mentioned above is in release 0.100, which is available now. I'm closing this issue, but please re-open it if you're still running into issues.

@ark3 ark3 closed this Jun 10, 2019

Tel Tracker automation moved this from In progress to Done Jun 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.