Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swap deployment fails when securityContext contains unprivileged user #875

Closed
kunickiaj opened this issue Dec 15, 2018 · 20 comments
Closed
Labels
exploration stale Issue is stale and will be closed v2 Related to Telepresence 2 (2.y.z)

Comments

@kunickiaj
Copy link

What were you trying to do?

trying to use the swap-deployment feature with one of my deployments.

What did you expect to happen?

expected to expose two ports for a local process and have traffic directed to them

What happened instead?

telepresence died with the attached traceback
full log in gist: https://gist.github.com/kunickiaj/080328802f437cdc1fbb6722856de4ee

It seems that the root cause is the securityContext in the container I wished to swap.
Other (more privileged) containers do not have this issue. Was able to confirm that removing the following securityContext from the affected container allowed me to work around the issue:

securityContext:
  runAsNonRoot: true
  runAsUser: 500

Probably related to #617 #737 and #723
A possible fix might be to have telepresence replace the relevant parts of the security context if it does in fact need root (e.g. removing the runAsNonRoot). Would also suggest alerting the user to those kind of modifications.

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--swap-deployment', 'sch-control-hub-pipelinestore:pipelinestore', '--expose', '18631', '--expose', '18632']
Version: 0.96
Python version: 3.6.6 (default, Oct 4 2018, 20:50:27) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)]
kubectl version: Client Version: v1.13.0 // Server Version: v1.10.0
oc version: oc v3.11.0+0cbc58b // kubernetes v1.11.0+d4cacc0 // features: Basic-Auth // // Server https://192.168.37.162:8443 // kubernetes v1.10.0
OS: Darwin streamsam381331.nerdworld.xyz 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64

Traceback (most recent call last):
  File "/usr/local/bin/telepresence/telepresence/cli.py", line 131, in crash_reporting
    yield
  File "/usr/local/bin/telepresence/telepresence/main.py", line 70, in main
    socks_port, ssh = do_connect(runner, remote_info)
  File "/usr/local/bin/telepresence/telepresence/connect/connect.py", line 99, in do_connect
    return connect(runner_, remote_info, is_container_mode, args.expose)
  File "/usr/local/bin/telepresence/telepresence/connect/connect.py", line 57, in connect
    ssh.wait()
  File "/usr/local/bin/telepresence/telepresence/connect/ssh.py", line 82, in wait
    raise RuntimeError("SSH isn't starting.")
RuntimeError: SSH isn't starting.

Logs:

 20 | Handling connection for 52930
  48.2  60 | Connection to 127.0.0.1 closed by remote host.
  48.2 TEL | [60] exit 255 in 0.56 secs.
  48.4 TEL | [61] Running: ssh -F /dev/null -q -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -p 52930 telepresence@127.0.0.1 /bin/true
  48.5  20 | Handling connection for 52930
  49.0  61 | Connection to 127.0.0.1 closed by remote host.
  49.0 TEL | [61] exit 255 in 0.57 secs.
  49.3 TEL | [62] Running: ssh -F /dev/null -q -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -p 52930 telepresence@127.0.0.1 /bin/true
  49.3  20 | Handling connection for 52930
  49.8  62 | Connection to 127.0.0.1 closed by remote host.
  49.8 TEL | [62] exit 255 in 0.57 secs.
  51.1  19 | 2018-12-15T00:16:01+0000 [Poll#error] Failed to contact Telepresence client:
  51.1  19 | 2018-12-15T00:16:01+0000 [Poll#error] An error occurred while connecting: 99: Address not available.
  51.1  19 | 2018-12-15T00:16:01+0000 [Poll#warn] Perhaps it's time to exit?

@ark3
Copy link
Contributor

ark3 commented Dec 17, 2018

Thank you for the issue. Yup, this is #723. And thank you for the suggestions.

In fact, we can do better. If the original container didn't need root (to bind to low ports), then Telepresence doesn't need it either. The unprivileged Telepresence image is hard-coded to run as UID 1000. When the user wants to swap, Tel should notice that the original deployment has runAsUser and modify the swapped copy to request UID 1000. And yes, we should notify the user.

@david-l-riley
Copy link

I'm not sure having the swapped copy request UID 1000 is the solution; that overlaps with a lot of things, including the default initial user for CentOS (which is the "admin" user in a number of deployments, so we can't use it in live deployments on my side of things). Perhaps it would be better to modify so that the instance can run as other UIDs? This is not an unusual use case, at least until Kubernetes gets some sort of UID/GID namespacing capability.

@david-l-riley
Copy link

I'm also not sure quite what the problem is here; when I leave the deployment up and running, I can still connect via SSH to the server, though I haven't extensively probed to see what can be executed. In general, it's probably best to try to make sure that the provisioned executables can be executed under any UID/GID, since you don't have much way of controlling how people deploy them (and just assuming they can use the hardcoded one is, uh, a bit of an assumption).

@david-l-riley
Copy link

OK, I tracked it down: even if I relax the permissions in telepresence-k8s on the SSH host secrets and local directory a bit (which makes me uneasy anyway), the problem is that by default, the SSH daemon is running as the user we're expecting to log in. If the securityContext has a different runAsUser applied, the SSH connection is still trying to log in as telepresence, which has a fixed UID of 1000. The SSH server can't switch UID to 1000 from the other one, so it barfs.

Short of k8s supporting Docker's UID namespacing (which, IIRC, is still experimental and not likely to land in k8s anytime soon), this ends up being a core problem; I don't think there's any way to just run an SSH service that doesn't try to switch to a particular user, which would be the obvious solution here if it existed. The other way would be to change the UID of telepresence to that of the current user on first startup, which is messy at best and definitely more than a little risky from a security standpoint.

Thoughts?

@janosroden
Copy link

janosroden commented May 24, 2019

@david-l-riley UsePrivilegeSeparation no and libnsswrapper can solve the ssh daemon uid problems. Check this out: https://github.com/blacksaltIT/docker_ssh
I hope it helps

@ark3
Copy link
Contributor

ark3 commented May 24, 2019

@janosroden That approach requires modifying the root filesystem of the container on startup, before launching sshd. This is in fact what Telepresence used to do, but that caused all sorts of problems due to other restrictive Kubernetes setups. What we really need is an ssh server that doesn't rely on /etc/passwd and friends. We could build something using Twisted Conch or some Go stuff or whatever else. We'd love a PR addressing that.

@david-l-riley
Copy link

Does it really need to be SSH, strictly speaking? That is, would other tunneling solutions be acceptable? Or is it preferred to stick with SSH for Reasons?

@ark3
Copy link
Contributor

ark3 commented May 28, 2019

Telepresence uses SSH because it covers volumes (via sshfs), networking (via sshuttle), and port forwarding. Other solutions (combinations of tools, perhaps) could work too.

@david-l-riley
Copy link

Just checking. Removing it would remove some complexity, but it sounds like at the cost of adding significant other complexity. I'll see about opening a PR for either the Go approach (I do like the look of it, but it'll need to be built for all supported architectures) or the Twisted Conch approach (since we already use it).

Looks like I probably also need to look into how sshfs and sshuttle work to determine which solution is going to be optimal for those...

@dbazhal
Copy link

dbazhal commented Jun 12, 2019

Ok, if we can't set up ssh without allowing use of specific uid, can we add configuration option to select telepresence container serviceaccount? I'd like not to touch default sa, but instead tell telepresence to use it's own sa, with scc set up.

@ark3
Copy link
Contributor

ark3 commented Jun 12, 2019

@dbazhal That's a good idea! Can you please create an issue requesting that as a new feature? We can work out how to make it happen there. Thank you.

@dbazhal
Copy link

dbazhal commented Jul 1, 2019

@dbazhal That's a good idea! Can you please create an issue requesting that as a new feature? We can work out how to make it happen there. Thank you.

I'm proposing my pr for that: #1067 Could you please take a look at it?

@ReSearchITEng
Copy link

ReSearchITEng commented Sep 10, 2019

@david-l-riley @dbazhal @kunickiaj - you may want to try this new image which should solve this issue.
If you give it a try, please share your feedback in the PR #1114 or here.
Image out of the new Dockerfile.no_runasany_perms can be used from here:
docker.io/researchiteng/telepresence:0.101

The suggestion of @janosroden -> it's not enough. It used to be required for older versions of sshd, it's not required any longer and does not seem to help this issue.

@rb3ckers
Copy link

rb3ckers commented Sep 8, 2020

We have runAsUser, runAsGroup and runAsNonRoot defined on our containers. As a result we run into this issue with the exact same error as in #1398 .

I think we can work around it by dropping runAsUser and runAsGroup and modifying all our docker image builds to use a UID instead of a username such that Kubernetes can verify that the user is a non-root user. At least when testing this it works fine.

It is still pretty inconvenient so I would also be interested in a fix for telepresence itself.

@KeisukeYamashita
Copy link

I have a issue too when I have security context below:

      securityContext:
        runAsUser: 10111
        runAsGroup: 10111
        runAsNonRoot: true
        fsGroup: 10111

I failed that I can't mount my host keys.

@ptemmer
Copy link

ptemmer commented Dec 2, 2020

Same issue here

@ptemmer
Copy link

ptemmer commented Dec 3, 2020

I managed to get it working by changing runAsGroup to "0". (runAsUser was already 1000). However, I'm wondering whether setting runAsGroup to root isn't a security risk? Is there a specific reason for Telepresence setting it to "0", while setting the user to "1000" ?

@donnyyung
Copy link
Contributor

I think this still may be an issue in Telepresence 2, so we should do some investigation before we close this ticket.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and will be closed label Aug 16, 2024
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exploration stale Issue is stale and will be closed v2 Related to Telepresence 2 (2.y.z)
Projects
None yet
Development

No branches or pull requests

10 participants