New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot swap Pod (with Lifecycle PreStart hook) #587

Closed
alexei-led opened this Issue Apr 12, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@alexei-led

alexei-led commented Apr 12, 2018

What were you trying to do?

Swap node pod with shell, and after wanted to run node server/index.js, but it failed before

What did you expect to happen?

Work as it works with my Go project

What happened instead?

After 1 min doing nothing Teleprecence returns error bellow

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--swap-deployment', 'triggers-cfapi', '--namespace', 'triggers', '--method=vpn-tcp', '--expose=80', '--expose=40000', '--run', 'zsh']
Version: 0.82
Python version: 3.6.5 (default, Mar 30 2018, 06:41:53) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
kubectl version: Client Version: v1.7.6 // Server Version: v1.8.7-gke.1
oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc')
OS: Darwin gaia-mbp.local 17.5.0 Darwin Kernel Version 17.5.0: Mon Mar 5 22:24:32 PST 2018; root:xnu-4570.51.1~1/RELEASE_X86_64 x86_64
Traceback:

Traceback (most recent call last):
  File "/usr/local/Cellar/telepresence/0.82/libexec/lib/python3.6/site-packages/telepresence/cli.py", line 74, in call_f
    return f(*args, **kwargs)
  File "/usr/local/Cellar/telepresence/0.82/libexec/lib/python3.6/site-packages/telepresence/main.py", line 494, in go_too
    runner, args
  File "/usr/local/Cellar/telepresence/0.82/libexec/lib/python3.6/site-packages/telepresence/main.py", line 304, in start_proxy
    run_id=run_id,
  File "/usr/local/Cellar/telepresence/0.82/libexec/lib/python3.6/site-packages/telepresence/remote.py", line 214, in get_remote_info
    wait_for_pod(runner, remote_info)
  File "/usr/local/Cellar/telepresence/0.82/libexec/lib/python3.6/site-packages/telepresence/remote.py", line 135, in wait_for_pod
    "Pod isn't starting or can't be found: {}".format(pod["status"])
RuntimeError: Pod isn't starting or can't be found: {'conditions': [{'lastProbeTime': None, 'lastTransitionTime': '2018-04-12T09:04:39Z', 'status': 'True', 'type': 'Initialized'}, {'lastProbeTime': None, 'lastTransitionTime': '2018-04-12T09:04:39Z', 'message': 'containers with unready status: [triggers-cfapi]', 'reason': 'ContainersNotReady', 'status': 'False', 'type': 'Ready'}, {'lastProbeTime': None, 'lastTransitionTime': '2018-04-12T09:04:39Z', 'status': 'True', 'type': 'PodScheduled'}], 'containerStatuses': [{'containerID': 'docker://7b35045a5dc6c85e336eea9ff5d4c4120b1b329ff709be32348209c617703c08', 'image': 'datawire/telepresence-k8s:0.82', 'imageID': 'docker-pullable://datawire/telepresence-k8s@sha256:4ce9feb392c1f2a79d71e7cc9417b5038729cfd2e8a03bdc645d8a59b4e3a11c', 'lastState': {'terminated': {'containerID': 'docker://7b35045a5dc6c85e336eea9ff5d4c4120b1b329ff709be32348209c617703c08', 'exitCode': 137, 'finishedAt': '2018-04-12T09:06:10Z', 'message': "Listening...\n2018-04-12T09:05:30+0000 [-] Loading ./forwarder.py...\n2018-04-12T09:05:31+0000 [-] /etc/resolv.conf changed, reparsing\n2018-04-12T09:05:31+0000 [-] Resolver added ('10.151.240.10', 53) to server list\n2018-04-12T09:05:31+0000 [-] SOCKSv5Factory starting on 9050\n2018-04-12T09:05:31+0000 [socks.SOCKSv5Factory#info] Starting factory <socks.SOCKSv5Factory object at 0x7ffa4a5f76a0>\n2018-04-12T09:05:31+0000 [-] DNSDatagramProtocol starting on 9053\n2018-04-12T09:05:31+0000 [-] Starting protocol <twisted.names.dns.DNSDatagramProtocol object at 0x7ffa4a5f7a20>\n2018-04-12T09:05:31+0000 [-] Loaded.\n2018-04-12T09:05:31+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.9.0 (/usr/bin/python3.6 3.6.1) starting up.\n2018-04-12T09:05:31+0000 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.\n", 'reason': 'Error', 'startedAt': '2018-04-12T09:05:30Z'}}, 'name': 'triggers-cfapi', 'ready': False, 'restartCount': 1, 'state': {'waiting': {'message': 'Back-off 10s restarting failed container=triggers-cfapi pod=triggers-cfapi-b6bd5dd78-wk77z_triggers(87e8069f-3e30-11e8-81b4-42010a80010a)', 'reason': 'CrashLoopBackOff'}}}], 'hostIP': '10.240.0.11', 'phase': 'Running', 'podIP': '10.148.28.160', 'qosClass': 'BestEffort', 'startTime': '2
@alexei-led

This comment has been minimized.

alexei-led commented Apr 22, 2018

OK, I found the issue.

Telepresence cannot swap deployment if container has Lifecycle Hooks, especially PreStart.

If PreStart hook tries to invoke some command that obviously does not exist in telepresence image, Kubernetes will consider this Pod is in an invalid state and will try to "fix" the issue.

So, it's another problem related to the approach, taken by telepresence: swapping deployment.

IMHO, this issue can be resolved with PR, but need to check this.

@alexei-led alexei-led changed the title from Cannot replace Pod (standard node app with 2 ports) to Cannot replace Pod (with Lifecycle `PreStart` hook) Apr 22, 2018

@alexei-led alexei-led changed the title from Cannot replace Pod (with Lifecycle `PreStart` hook) to Cannot swap Pod (with Lifecycle `PreStart` hook) Apr 22, 2018

@alexei-led alexei-led changed the title from Cannot swap Pod (with Lifecycle `PreStart` hook) to Cannot swap Pod (with Lifecycle PreStart hook) Apr 22, 2018

@alexei-led

This comment has been minimized.

alexei-led commented Apr 22, 2018

@ark3 can you verify that your last PR fixes this issue too? It's supposed to fix it.

@ark3

This comment has been minimized.

Contributor

ark3 commented Apr 22, 2018

Telepresence removes some portions of the container definition when copying:

            for unneeded in [
                "command", "args", "livenessProbe", "readinessProbe",
                "workingDir"
            ]:

Perhaps we need to add lifecycle to that list. At the moment, I expect that the lifecycle section of the spec will be copied blindly, thereby making Telepresence fail in that case even with the PR's code path.

I'll need to learn more about lifecycle hooks to make a proper fix.

@ark3 ark3 added the bug label Apr 22, 2018

@alexei-led

This comment has been minimized.

alexei-led commented Apr 23, 2018

@ark3 maybe worth to make this a flag, so you can specify which sections from Deployment to skip.

@ark3 ark3 self-assigned this Apr 26, 2018

@ark3 ark3 closed this in 1fc79cd May 11, 2018

ark3 added a commit that referenced this issue May 11, 2018

Merge pull request #638 from datawire/strip-lifecycle
Strip out pod lifecycle hooks in swap-deployment. Fixes #587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment