Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telepresence crashes when insufficient permissions exist on the Kubernetes cluster #488

Open
ghshephard opened this issue Mar 2, 2018 · 10 comments
Labels
crash instead of error v2 Related to Telepresence 2 (2.y.z)
Projects

Comments

@ghshephard
Copy link

What were you trying to do?

(please tell us)

What did you expect to happen?

(please tell us)

What happened instead?

(please tell us - the traceback is automatically included, see below)

Automatically included information

Command line: ['/usr/bin/telepresence', '--verbose', '--namespace', 'mase-nagase-devel', '--swap-deployment', 'ma-etl-worker', '--docker-run', '--rm', '-it', '--cap-add=SYS_ADMIN', '-v', '/opt/sightmachine/ma:/opt/sightmachine/ma', 'registry-uw2-aws.int.sightmachine.com/sightmachine/ma:v4.19rc-4-g849008144-dev', 'bash', '-c', '/opt/sightmachine/ma/scripts/telepresence_config_swap.sh; bash']
Version: 0.73
Python version: 3.5.3 (default, Nov 23 2017, 11:34:05) [GCC 6.3.0 20170406]
kubectl version: Client Version: v1.9.3
oc version: (error: [Errno 2] No such file or directory: 'oc')
OS: Linux ghsxp15 4.13.0-041300-generic #201709031731 SMP Sun Sep 3 21:33:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Traceback:

Traceback (most recent call last):
  File "/usr/share/telepresence/libexec/lib/python3.5/site-packages/telepresence/cli.py", line 73, in call_f
    return f(*args, **kwargs)
  File "/usr/share/telepresence/libexec/lib/python3.5/site-packages/telepresence/main.py", line 480, in go
    runner, args
  File "/usr/share/telepresence/libexec/lib/python3.5/site-packages/telepresence/main.py", line 297, in start_proxy
    run_id=run_id,
  File "/usr/share/telepresence/libexec/lib/python3.5/site-packages/telepresence/remote.py", line 217, in get_remote_info
    format(deployment_name)
RuntimeError: Telepresence pod not found for Deployment 'ma-etl-worker'.

Logs:

294364192'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'component': 'worker-importexport', 'app': 'ma', 'pod-template-hash': '1205923109'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'component': 'mongodb', 'app': 'ma', 'pod-template-hash': '1768511666'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'app': 'nginx', 'pod-template-hash': '3738160008'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'component': 'postgresql', 'app': 'ma', 'pod-template-hash': '1736448758'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'component': 'pushgateway', 'app': 'ma', 'pod-template-hash': '1088728244'} (phase Running)...
 126.5 TL | Labels don't match.
 126.5 TL | Checking {'statefulset.kubernetes.io/pod-name': 'rabbitmq-0', 'component': 'rabbitmq', 'app': 'ma', 'controller-revision-hash': 'rabbitmq-66546c968c'} (phase Running)...
 126.5 TL | Labels don't match.

@ark3
Copy link
Contributor

ark3 commented Mar 2, 2018

Could you pass along the full telepresence.log as a gist, please?

Looking at the snippet of the log file included here, I'm curious. Are the pods listed from the correct namespace?

@ghshephard
Copy link
Author

ghshephard commented Mar 5, 2018

This turned out to be a permissions issue - I worked through it with our k8ns admin for a couple hours, and we kept adding permissions, one at a time, until eventually I was able to get in. Note -in addition to the obviouis ones, there are a ton of counter-intuitive permissions, such as "delete pod" that only make sense in hindsight once you grok what's going on with telepresence (turning your laptop into a remote-node). This issue can be closed - possibly avoided in the future documentation makes explicit which rights are required for telepresence (might already be there - I'm just a user, not a k8s admin)

Thanks!

@ark3
Copy link
Contributor

ark3 commented Mar 5, 2018

What permissions did you end up adding? Your list would help us construct that future documentation. Thanks.

@rhs rhs added this to Error Feedback in Buckets Mar 8, 2018
@richarddli richarddli changed the title Oopos. Telepresence crashes when insufficient permissions exist on the Kubernetes cluster Mar 13, 2018
@danryan
Copy link

danryan commented Mar 15, 2018

We had to add the following permissions. Note that we added these to an existing set of IAM credentials so more may have been required. Would you like the full list?

container.clusters.list
container.deployments.create
container.deployments.delete
container.deployments.update
container.nodes.list
container.replicaSets.create
container.replicaSets.delete
container.replicaSets.update
container.replicaSets.updateStatus
container.services.create
container.services.delete
container.services.update

@ark3
Copy link
Contributor

ark3 commented Mar 16, 2018

Yes, please! And thank you for the info.

Also, @plombardi89, I could use your help with turning this information into useful documentation.

@danryan
Copy link

danryan commented Mar 29, 2018

@ark3 greetings, this is the full list of permissions our devs have. This includes some that are beyond those required to get telepresence working, as I mentioned before. Cheers!

container.clusters.get
container.clusters.getCredentials
container.clusters.list
container.deployments.create
container.deployments.delete
container.deployments.get
container.deployments.getScale
container.deployments.list
container.deployments.update
container.namespaces.list
container.nodes.get
container.nodes.list
container.pods.delete
container.pods.exec
container.pods.get
container.pods.getLogs
container.pods.list
container.pods.portForward
container.replicaSets.create
container.replicaSets.delete
container.replicaSets.get
container.replicaSets.list
container.replicaSets.update
container.replicaSets.updateStatus
container.secrets.get
container.services.create
container.services.delete
container.services.get
container.services.list
container.services.update

@ark3 ark3 added a:docs Issue relates to documentation crash instead of error labels Mar 29, 2018
@ark3
Copy link
Contributor

ark3 commented Mar 29, 2018

Again, thank you for the info.

@ark3
Copy link
Contributor

ark3 commented Apr 6, 2018

Let's explore the set of permissions in #569 and then narrow/document them in #568. This issue about not crashing.

@ark3 ark3 removed the a:docs Issue relates to documentation label Apr 6, 2018
@ark3
Copy link
Contributor

ark3 commented Apr 13, 2018

Related to not crashing is #288, which would allow Tel to give feedback early.

@donnyyung
Copy link
Contributor

We should make this more apparent to the user in Telepresence 2, also noted on #288, since right now you can only tell you have insufficient permissions by telepresence failing and looking in the logs.

@donnyyung donnyyung added the v2 Related to Telepresence 2 (2.y.z) label Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash instead of error v2 Related to Telepresence 2 (2.y.z)
Projects
No open projects
Buckets
Error Feedback
Development

No branches or pull requests

4 participants