Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exiting second container breaks first opened container (MacOS) #735

Closed
jeff-minard-ck opened this issue Aug 13, 2018 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@jeff-minard-ck
Copy link

What were you trying to do?

Attempting to follow the docker tutorial.

What did you expect to happen?

Expected the curl requests to go back to the qotm running older, cluster version.

What happened instead?

Something crashed and the alpine image started behaving very oddly (like not showing every other key stroke? Maybe the interactive prompt got overlaid weirdly or something.)

Steps

Started the pre-baked qotm k8s service:

$ kubectl get service qotm
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
qotm      ClusterIP   172.20.248.83   <none>        5000/TCP   1h

Then in terminal 1 ran the alpine container and curled:

$ telepresence --docker-run -i -t alpine /bin/sh
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

Password:
/ # apk add --no-cache curl
...
OK: 6 MiB in 15 packages
/ # curl http://qotm:5000/
{
  "hostname": "qotm-6795b8b47b-nkd2d",
  "ok": true,
  "quote": "A small mercy is nothing at all?",
  "time": "2018-08-14T05:41:30.541697",
  "version": "1.3"
}
/ #

In terminal 2, started the override deployment for qotm (note, I've already changed just the version number in the python source):

$ telepresence --swap-deployment qotm --docker-run --rm -it -v $(pwd):/service qotm-dev:latest
Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

Password:
2018-08-13 22:43:07 QotM 4.8 INFO: initializing on 926f463babdd:5000
 * Serving Flask app "qotm" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
2018-08-13 22:43:07 QotM 4.8 INFO:  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2018-08-13 22:43:07 QotM 4.8 INFO:  * Restarting with stat
2018-08-13 22:43:07 QotM 4.8 INFO: initializing on 926f463babdd:5000
2018-08-13 22:43:07 QotM 4.8 WARNING:  * Debugger is active!
2018-08-13 22:43:07 QotM 4.8 INFO:  * Debugger PIN: 152-437-188

And back on terminal 1, curl it:

/ # curl http://qotm:5000/
{
  "hostname": "926f463babdd",
  "ok": true,
  "quote": "Abstraction is ever present.",
  "time": "2018-08-13T22:43:40.299915",
  "version": "4.8"
}

Perfect. As seen in terminal 2:

2018-08-13 22:43:40 QotM 4.8 DEBUG: GET /: session None, username None, handler statement
2018-08-13 22:43:40 QotM 4.8 INFO: 127.0.0.1 - - [13/Aug/2018 22:43:40] "GET / HTTP/1.1" 200 -

Now, when I ctrl-c the terminal 2 window:

^C $

No other output -- clean exit as far as I can see; terminal 1 also shows nothing. Then I issue another curl on terminal 2:

/ # curl http://qotm:5000/

Looks like there's a bug in our code. Sorry about that!

And then the trace gets wrapped all kinds of weird around the screen.

Automatically included information

Command line: ['/usr/local/bin/telepresence', '--docker-run', '-i', '-t', 'alpine', '/bin/sh']
Version: 0.90
Python version: 3.7.0 (default, Jul 23 2018, 20:22:55) [Clang 9.1.0 (clang-902.0.39.2)]
kubectl version: Client Version: v1.11.1 // Server Version: v1.9.2
oc version: (error: [Errno 2] No such file or directory: 'oc': 'oc')
OS: Darwin Jmina-mp 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64
Traceback:

Traceback (most recent call last):
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cli.py", line 87, in call_f
    return f(*args, **kwargs)
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/main.py", line 115, in main
    wait_for_exit(runner, user_process, subprocesses)
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 113, in wait_for_exit
    dead_process = processes.any_dead()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 89, in any_dead
    self.killall()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/cleanup.py", line 75, in killall
    killer()
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/container.py", line 59, in kill
    runner.check_call(sudo + ["docker", "stop", "--time=1", name])
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/runner.py", line 168, in check_call
    track, "Running", "ran", out_cb, err_cb, args, **kwargs
  File "/usr/local/Cellar/telepresence/0.90/libexec/lib/python3.7/site-packages/telepresence/runner.py", line 157, in run_command
    raise CalledProcessError(retcode, args)
subprocess.CalledProcessError: Command '['docker', 'stop', '--time=1', 'telepresence-1534200576-024105-25486']' returned non-zero exit status 1.

Full log:

telepresence.log

@ark3 ark3 changed the title Exiting second container breaks first opened container Exiting second container breaks first opened container (MacOS) Aug 14, 2018
@ark3
Copy link
Contributor

ark3 commented Aug 14, 2018

Thanks for the bug report!

This problem is specific to the MacOS implementation of the container method. The shutdown process of any container method session removes the magic loopback IP. This causes all other container method sessions to fail as soon as anything in the container tries to talk to anything in the cluster, starting with socat falling down.

  15.8  20 | 2018/08/13 15:50:22 socat[25605] E read(6, 0x7fe434001c00, 8192): Can't assign requested address

This is prompted by sshuttle trying to send data through its connection, which relies on socat to intermediate between itself running in the container and the kubectl port-forward running on the host machine bound to localhost.

We can fix this by addressing #224. We can avoid this and similar issues entirely by fixing #726. We can also work around this, sort of, by creating but not removing the magic loopback IP, in essence leaking it. Having an extra loopback IP address doesn't really cause any harm unless the user is running other weird, network-tweaking tools besides Telepresence. It will get cleaned up by a reboot.

@ark3 ark3 added the bug Something isn't working label Aug 14, 2018
@jeff-minard-ck
Copy link
Author

Thanks for insight, I appreciate the fullness of the answer. For now, I'll just take things one container at a time :)

@ark3
Copy link
Contributor

ark3 commented Dec 6, 2018

Telepresence 0.95 includes #726, which I believe fixes this issue. Can you please give it a try?

@ark3
Copy link
Contributor

ark3 commented Jan 24, 2019

I believe this is fixed. If you run into it again, please re-open. Thanks!

@ark3 ark3 closed this as completed Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants