Exiting second container breaks first opened container (MacOS) #735

jeff-minard-ck · 2018-08-13T23:09:27Z

What were you trying to do?

Attempting to follow the docker tutorial.

What did you expect to happen?

Expected the curl requests to go back to the qotm running older, cluster version.

What happened instead?

Something crashed and the alpine image started behaving very oddly (like not showing every other key stroke? Maybe the interactive prompt got overlaid weirdly or something.)

Steps

Started the pre-baked qotm k8s service:

$ kubectl get service qotm
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
qotm      ClusterIP   172.20.248.83   <none>        5000/TCP   1h

Then in terminal 1 ran the alpine container and curled:

$ telepresence --docker-run -i -t alpine /bin/sh
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

Password:
/ # apk add --no-cache curl
...
OK: 6 MiB in 15 packages
/ # curl http://qotm:5000/
{
  "hostname": "qotm-6795b8b47b-nkd2d",
  "ok": true,
  "quote": "A small mercy is nothing at all?",
  "time": "2018-08-14T05:41:30.541697",
  "version": "1.3"
}
/ #

In terminal 2, started the override deployment for qotm (note, I've already changed just the version number in the python source):

$ telepresence --swap-deployment qotm --docker-run --rm -it -v $(pwd):/service qotm-dev:latest
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

Password:
2018-08-13 22:43:07 QotM 4.8 INFO: initializing on 926f463babdd:5000
 * Serving Flask app "qotm" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
2018-08-13 22:43:07 QotM 4.8 INFO:  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2018-08-13 22:43:07 QotM 4.8 INFO:  * Restarting with stat
2018-08-13 22:43:07 QotM 4.8 INFO: initializing on 926f463babdd:5000
2018-08-13 22:43:07 QotM 4.8 WARNING:  * Debugger is active!
2018-08-13 22:43:07 QotM 4.8 INFO:  * Debugger PIN: 152-437-188

And back on terminal 1, curl it:

/ # curl http://qotm:5000/
{
  "hostname": "926f463babdd",
  "ok": true,
  "quote": "Abstraction is ever present.",
  "time": "2018-08-13T22:43:40.299915",
  "version": "4.8"
}

Perfect. As seen in terminal 2:

2018-08-13 22:43:40 QotM 4.8 DEBUG: GET /: session None, username None, handler statement
2018-08-13 22:43:40 QotM 4.8 INFO: 127.0.0.1 - - [13/Aug/2018 22:43:40] "GET / HTTP/1.1" 200 -

Now, when I ctrl-c the terminal 2 window:

^C $

No other output -- clean exit as far as I can see; terminal 1 also shows nothing. Then I issue another curl on terminal 2:

/ # curl http://qotm:5000/

Looks like there's a bug in our code. Sorry about that!

And then the trace gets wrapped all kinds of weird around the screen.

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--docker-run', '-i', '-t', 'alpine', '/bin/sh']
Version: 0.90
Python version: 3.7.0 (default, Jul 23 2018, 20:22:55) [Clang 9.1.0 (clang-902.0.39.2)]
kubectl version: Client Version: v1.11.1 // Server Version: v1.9.2
oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc')
OS: Darwin Jmina-mp 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
Traceback:

Traceback (most recent call last):
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cli.py", line 87, in call_f
    return f(*args, **kwargs)
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/main.py", line 115, in main
    wait_for_exit(runner, user_process, subprocesses)
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 113, in wait_for_exit
    dead_process = processes.any_dead()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 89, in any_dead
    self.killall()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 75, in killall
    killer()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/container.py", line 59, in kill
    runner.check_call(sudo + ["docker", "stop", "--time=1", name])
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/runner.py", line 168, in check_call
    track, "Running", "ran", out_cb, err_cb, args, **kwargs
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/runner.py", line 157, in run_command
    raise CalledProcessError(retcode, args)
subprocess.CalledProcessError: Command '['docker', 'stop', '--time=1', 'telepresence-1534200576-024105-25486']' returned non-zero exit status 1.

Full log:

telepresence.log

The text was updated successfully, but these errors were encountered:

ark3 · 2018-08-14T17:06:36Z

Thanks for the bug report!

This problem is specific to the MacOS implementation of the container method. The shutdown process of any container method session removes the magic loopback IP. This causes all other container method sessions to fail as soon as anything in the container tries to talk to anything in the cluster, starting with socat falling down.

  15.8  20 | 2018/08/13 15:50:22 socat[25605] E read(6, 0x7fe434001c00, 8192): Can't assign requested address

This is prompted by sshuttle trying to send data through its connection, which relies on socat to intermediate between itself running in the container and the kubectl port-forward running on the host machine bound to localhost.

We can fix this by addressing #224. We can avoid this and similar issues entirely by fixing #726. We can also work around this, sort of, by creating but not removing the magic loopback IP, in essence leaking it. Having an extra loopback IP address doesn't really cause any harm unless the user is running other weird, network-tweaking tools besides Telepresence. It will get cleaned up by a reboot.

jeff-minard-ck · 2018-08-14T17:16:23Z

Thanks for insight, I appreciate the fullness of the answer. For now, I'll just take things one container at a time :)

ark3 · 2018-12-06T22:15:41Z

Telepresence 0.95 includes #726, which I believe fixes this issue. Can you please give it a try?

ark3 · 2019-01-24T21:23:52Z

I believe this is fixed. If you run into it again, please re-open. Thanks!

ark3 changed the title ~~Exiting second container breaks first opened container~~ Exiting second container breaks first opened container (MacOS) Aug 14, 2018

ark3 added the bug Something isn't working label Aug 14, 2018

ark3 mentioned this issue Aug 14, 2018

Container method can avoid ifconfig magic and socat... #726

Closed

ark3 mentioned this issue Oct 23, 2018

Crash while exiting / starting at the same time #812

Closed

ark3 closed this as completed Jan 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exiting second container breaks first opened container (MacOS) #735

Exiting second container breaks first opened container (MacOS) #735

jeff-minard-ck commented Aug 13, 2018

ark3 commented Aug 14, 2018

jeff-minard-ck commented Aug 14, 2018

ark3 commented Dec 6, 2018

ark3 commented Jan 24, 2019

Exiting second container breaks first opened container (MacOS) #735

Exiting second container breaks first opened container (MacOS) #735

Comments

jeff-minard-ck commented Aug 13, 2018

What were you trying to do?

What did you expect to happen?

What happened instead?

Steps

Automatically included information

ark3 commented Aug 14, 2018

jeff-minard-ck commented Aug 14, 2018

ark3 commented Dec 6, 2018

ark3 commented Jan 24, 2019