New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to swap-deployment #848

Open
tcf909 opened this Issue Dec 2, 2018 · 9 comments

Comments

Projects
None yet
2 participants
@tcf909

tcf909 commented Dec 2, 2018

What were you trying to do?

Run telepresence for the first time.

What did you expect to happen?

Telepresence to setup a remote pod and proxy the various "things".

What happened instead?

The pod was never successfully started. It was however created, just never finished starting.

The container logs :

kubectl logs platform-322b479a7c4945de80e89caee5a273e4-849954684-qkjqg
/usr/src/app/run.sh: line 7: can't create /etc/passwd: Permission denied

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--swap-deployment', 'platform', '--env-json', '.env']
Version: 0.94
Python version: 3.7.1 (default, Dec 2 2018, 12:51:52) [Clang 10.0.0 (clang-1000.11.45.5)]
kubectl version: Client Version: v1.12.3 // Server Version: v1.11.2-gke.18
oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc')
OS: Darwin Deimos.home 18.2.0 Darwin Kernel Version 18.2.0: Fri Oct 5 19:41:49 PDT 2018; root:xnu-4903.221.2~2/RELEASE_X86_64 x86_64
Traceback:

Traceback (most recent call last):
  File "/usr/local/bin/telepresence/telepresence/cli.py", line 130, in crash_reporting
    yield
  File "/usr/local/bin/telepresence/telepresence/main.py", line 68, in main
    remote_info = start_proxy(runner)
  File "/usr/local/bin/telepresence/telepresence/proxy/__init__.py", line 90, in start_proxy
    run_id=run_id,
  File "/usr/local/bin/telepresence/telepresence/proxy/remote.py", line 200, in get_remote_info
    wait_for_pod(runner, remote_info)
  File "/usr/local/bin/telepresence/telepresence/proxy/remote.py", line 132, in wait_for_pod
    "Pod isn't starting or can't be found: {}".format(pod["status"])
RuntimeError: Pod isn't starting or can't be found: {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2018-12-02T21:29:58Z', 'status': 'True', 'type': 'Initialized'}, {'lastProbeTime': None, 'lastTransitionTime': '2018-12-02T21:29:58Z', 'message': 'containers with unready status: [platform]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'Ready'}, {'lastProbeTime': None, 'lastTransitionTime': None, 'message': 'containers with unready status: [platform]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'ContainersReady'}, {'lastProbeTime': None, 'lastTransitionTime': '2018-12-02T21:29:58Z', 'status': 'True', 'type': 'PodScheduled'}], 'containerStatuses': [{'containerID': 'containerd://bbd9801706d07cf931d27014b6a629d122a9c0fe533cb932e58f11e8abfcf897', 'image': 'docker.io/datawire/telepresence-k8s:0.94', 'imageID': 'docker.io/datawire/telepresence-k8s@sha256:2a8485d8a27b4e84751f714ba2f84753a2684aced4abdcc0894e63ece8013a7d', 'lastState': {'terminated': {'containerID': 'containerd://bbd9801706d07cf931d27014b6a629d122a9c0fe533cb932e58f11e8abfcf897', 'exitCode': 1, 'finishedAt': '2018-12-02T21:31:33Z', 'message': "/usr/src/app/run.sh: line 7: can't create /etc/passwd: Permission denied\n", 'reason': 'Error', 'startedAt': '2018-12-02T21:31:33Z'}}, 'name': 'platform', 'ready': False, 'restartCount': 4, 'state': {'waiting': {'message': 'Back-off 1m20s restarting failed container=platform pod=platform-075428b1f6124d69826c31c640662a31-65b454f9d-26mss_default(6b5137af-f679-11e8-ac49-42010a800022)', 'reason': 'CrashLoopBackOff'}}}], 'hostIP': '10.128.0.4', 'phase': 'Running', 'podIP': '10.60.9.4', 'qosClass': 'Burstable', 'startTime': '2018-12-02T21:29:58Z'}

Logs:

31-65b454f9d-26mss -o json
 127.7 TEL | [213] Capturing: kubectl --context gke_digital-spaces_us-central1_cluster-1 --namespace default get pod platform-075428b1f6124d69826c31c640662a31-65b454f9d-26mss -o json
 128.3 TEL | [214] Capturing: kubectl --context gke_digital-spaces_us-central1_cluster-1 --namespace default get pod platform-075428b1f6124d69826c31c640662a31-65b454f9d-26mss -o json
 129.0 TEL | [215] Capturing: kubectl --context gke_digital-spaces_us-central1_cluster-1 --namespace default get pod platform-075428b1f6124d69826c31c640662a31-65b454f9d-26mss -o json
 129.6 TEL | [216] Capturing: kubectl --context gke_digital-spaces_us-central1_cluster-1 --namespace default get pod platform-075428b1f6124d69826c31c640662a31-65b454f9d-26mss -o json
 130.2 TEL | [217] Capturing: kubectl --context gke_digital-spaces_us-central1_cluster-1 --namespace de

Additional Info:

kubectl get pods
platform-322b479a7c4945de80e89caee5a273e4-849954684-qkjqg   0/1     CrashLoopBackOff   1          11skube
kubectl describe pod platform-322b479a7c4945de80e89caee5a273e4-849954684-qkjqg
Name:               platform-322b479a7c4945de80e89caee5a273e4-849954684-qkjqg
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-cluster-1-pool-2-a1fcc53a-kp0s/10.128.0.4
Start Time:         Sun, 02 Dec 2018 13:35:09 -0800
Labels:             name=platform
                    pod-template-hash=405510240
                    telepresence=322b479a7c4945de80e89caee5a273e4
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container platform
                    version: 2
Status:             Running
IP:                 10.60.9.6
Controlled By:      ReplicaSet/platform-322b479a7c4945de80e89caee5a273e4-849954684
Containers:
  platform:
    Container ID:  containerd://477722b89e1c58275df66f827a27716efa938ca960de1719e8cd56a7671c521c
    Image:         datawire/telepresence-k8s:0.94
    Image ID:      docker.io/datawire/telepresence-k8s@sha256:2a8485d8a27b4e84751f714ba2f84753a2684aced4abdcc0894e63ece8013a7d
    Ports:         8080/TCP, 50100/TCP, 55050/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Message:     /usr/src/app/run.sh: line 7: can't create /etc/passwd: Permission denied

      Exit Code:    1
      Started:      Sun, 02 Dec 2018 13:38:13 -0800
      Finished:     Sun, 02 Dec 2018 13:38:13 -0800
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:  100m
    Environment:
      PLATFORM_LOG_LEVEL:                debug
      NODE_OPTIONS:
      PLATFORM_NODESERVICE_REMOTE_URL:   tcp://platform-internal:50100
      PLATFORM_DATASTORE_GOOGLE_CONFIG:  <set to the key 'PLATFORM_DATASTORE_GOOGLE_CONFIG' in secret 'platform'>  Optional: false
      PLATFORM_CACHE_CREDENTIALS:        <set to the key 'PLATFORM_CACHE_CREDENTIALS' in secret 'platform'>        Optional: false
      RUNTIME_CACHE_CREDENTIALS:         <set to the key 'RUNTIME_CACHE_CREDENTIALS' in secret 'platform'>         Optional: false
      TELEPRESENCE_CONTAINER_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gxgmr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-gxgmr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gxgmr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                         Message
  ----     ------     ----                  ----                                         -------
  Normal   Scheduled  5m6s                  default-scheduler                            Successfully assigned default/platform-322b479a7c4945de80e89caee5a273e4-849954684-qkjqg to gke-cluster-1-pool-2-a1fcc53a-kp0s
  Normal   Pulled     3m30s (x5 over 5m5s)  kubelet, gke-cluster-1-pool-2-a1fcc53a-kp0s  Container image "datawire/telepresence-k8s:0.94" already present on machine
  Normal   Created    3m30s (x5 over 5m5s)  kubelet, gke-cluster-1-pool-2-a1fcc53a-kp0s  Created container
  Normal   Started    3m30s (x5 over 5m5s)  kubelet, gke-cluster-1-pool-2-a1fcc53a-kp0s  Started container
  Warning  BackOff    4s (x25 over 5m3s)    kubelet, gke-cluster-1-pool-2-a1fcc53a-kp0s  Back-off restarting failed container
@tcf909

This comment has been minimized.

tcf909 commented Dec 2, 2018

Related to #828

@tcf909

This comment has been minimized.

tcf909 commented Dec 2, 2018

Looking through Dockerfile for the k8 proxy I'm not sure why you would:

RUN chmod -R g+r /etc/ssh && \
    chmod g+w /etc/passwd && \
    chmod -R g+w /usr/src/app

And then

USER 1000

Given that user 1000 will not be a part of the root group. It is bound to run in to a permission issue.

@tcf909

This comment has been minimized.

tcf909 commented Dec 3, 2018

This looks like a pretty simple permission issue, but since I don't see an easy way to specify deployment details for the proxy pod, or a way to specify a custom image I'm not sure what a work around is.

I see in you allow a specific version of the docker image to be used (for testing) via an environmental arg, but no way to override the image entirely using --swap-deployment.

@tcf909

This comment has been minimized.

tcf909 commented Dec 3, 2018

Might be related to #723, but I'm able to manually provision a telepresence pod and use securityContext: runAsUser: 0 I'm able to write to /etc/passwd fine.

@ark3

This comment has been minimized.

Contributor

ark3 commented Dec 3, 2018

Fixing #723 will remedy this and #828. Your investigation has lead you to reasonable questions. The Dockerfiles are not particularly sensible; I threw them together quickly while fixing #617. The goal then was to avoid "securityContext: runAsUser: 0" as the way of running as root.

I'll be working on this soon and may send a test image your way via this issue. If you can try it on your cluster, that'd be great. Regardless, we'll get this taken care of for 0.95 this week. Thank you for your help.

@tcf909

This comment has been minimized.

tcf909 commented Dec 3, 2018

@ark3 Happy to test anything you send.

@ark3

This comment has been minimized.

Contributor

ark3 commented Dec 3, 2018

@tcf909 Can you please try the 0.94-36-gf8e3b12 image off of the master branch? It (or a descendant based on feedback) will end up in 0.95.

For example:

TELEPRESENCE_VERSION=0.94-36-gf8e3b12 telepresence -m inject-tcp --run curl -sk https://kubernetes.default/api
@tcf909

This comment has been minimized.

tcf909 commented Dec 3, 2018

@ark3 Works great. No issues now. Appreciate the quick turnaround.

@ark3

This comment has been minimized.

Contributor

ark3 commented Dec 3, 2018

Excellent. Telepresence 0.95 will have this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment