New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telepresence segfaults (sometimes) in inject-tcp mode #195

Open
itamarst opened this Issue Jun 26, 2017 · 6 comments

Comments

4 participants
@itamarst
Contributor

itamarst commented Jun 26, 2017

On my home network (but not at the office), the following will segfault somewhere inside torsocks (both v2.1 and v2.2) in inject-tcp mode:

$ python3 -c "import socket; socket.getfqdn('')"

@itamarst itamarst added the bug label Jun 26, 2017

@itamarst itamarst added this to In progress in Telepresence Jun 28, 2017

@itamarst itamarst moved this from In progress to Next in Telepresence Jun 28, 2017

@itamarst itamarst removed this from Next in Telepresence Jul 20, 2017

@exarkun

This comment has been minimized.

Contributor

exarkun commented Jan 11, 2018

Some investigation today reveals one explanation for this behavior (and explains the difference across different networks).

getfqdn('') gets the host's name with gethostname(). Then it uses gethostbyaddr on the resulting name. gethostbyaddr forward resolves the name to an address (using getaddrinfo). Then it reverse resolves the address to some names. Then some more logic that is not relevant here.

When torsocks is used, the forward resolution results in a request to the SOCKS proxy to do the resolve (command 0xF0 aka RESOLVE). Note that the name is resolved in in the SOCKS proxy context. As a first approximation, this requires the name be published in the public DNS if the resolution is to succeed. Many hostnames will not be published thus and so resolution will fail at this point. If the name cannot be resolved, an error is returned. getfqdn is perfectly well equipped to handle such a resolution failure and manages to continue on its way. If the name can be resolved, the result is returned and getfqn proceeds to the next step - reverse resolving that name to some addresses.

When torsocks is used, the reverse resolve results in the command 0xF1 aka RESOLVE-PTR being sent to the SOCKS proxy. This part of the protocol isn't implemented by the Telepresence SOCKS proxy. It returns an error. This causes torsocks to segfault:

(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:93
#1  0x00007ffff7bc5817 in tsocks_gethostbyaddr_r () from /usr/lib/x86_64-linux-gnu/torsocks/libtorsocks.so
#2  0x00000000005cf583 in socket_gethostbyaddr (self=<optimized out>, args=<optimized out>) at ../Modules/socketmodule.c:5211
#3  0x00000000004c4a3d in _PyCFunction_FastCallDict (kwargs=0x0, nargs=140737327740952, args=0xafeec8, func_obj=0x7ffff67359d8) at ../Objects/methodobject.c:234
#4  _PyCFunction_FastCallKeywords (func=func@entry=0x7ffff67359d8, stack=stack@entry=0xafeec8, nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at ../Objects/methodobject.c:294
#5  0x000000000054f3c4 in call_function (pp_stack=pp_stack@entry=0x7fffffffbb98, oparg=<optimized out>, kwnames=kwnames@entry=0x0) at ../Python/ceval.c:4824
#6  0x0000000000553aaf in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3322
#7  0x000000000054efc1 in PyEval_EvalFrameEx (throwflag=0, f=0xafed28) at ../Python/ceval.c:753
#8  _PyEval_EvalCodeWithName (_co=0x7ffff7e75420, globals=globals@entry=0x7ffff7ef24c8, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kwnames=0x0, kwargs=0xae8d50, kwcount=0, 
    kwstep=1, defs=0x7ffff66db5d8, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff7eebf10, qualname=0x7ffff7eebf10) at ../Python/ceval.c:4153
#9  0x000000000054f24d in fast_function (kwnames=0x0, nargs=1, stack=<optimized out>, func=0x7ffff668d7b8) at ../Python/ceval.c:4965
#10 call_function (pp_stack=pp_stack@entry=0x7fffffffbe38, oparg=<optimized out>, kwnames=kwnames@entry=0x0) at ../Python/ceval.c:4845
#11 0x0000000000553aaf in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3322
#12 0x000000000054efc1 in PyEval_EvalFrameEx (throwflag=0, f=0xae8bc8) at ../Python/ceval.c:753
#13 _PyEval_EvalCodeWithName (_co=_co@entry=0x7ffff7eec780, globals=globals@entry=0x7ffff7f31168, locals=locals@entry=0x7ffff7f31168, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, 
    kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4153
#14 0x000000000054ff73 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7ffff7f31168, globals=globals@entry=0x7ffff7f31168, 
    _co=_co@entry=0x7ffff7eec780) at ../Python/ceval.c:4174
#15 PyEval_EvalCode (co=co@entry=0x7ffff7eec780, globals=globals@entry=0x7ffff7f31168, locals=locals@entry=0x7ffff7f31168) at ../Python/ceval.c:730
#16 0x000000000042c37a in run_mod (arena=0x7ffff7f47258, flags=0x7fffffffbf5c, locals=0x7ffff7f31168, globals=0x7ffff7f31168, filename=0x7ffff7e4ce70, mod=<optimized out>) at ../Python/pythonrun.c:980
#17 PyRun_StringFlags (flags=0x7fffffffbf5c, locals=0x7ffff7f31168, globals=0x7ffff7f31168, start=257, str=0x7ffff7edc190 "import socket; socket.getfqdn(\"google.com\")\n") at ../Python/pythonrun.c:904
#18 PyRun_SimpleStringFlags (command=0x7ffff7edc190 "import socket; socket.getfqdn(\"google.com\")\n", flags=flags@entry=0x7fffffffc06c) at ../Python/pythonrun.c:421
#19 0x0000000000441178 in run_command (cf=0x7fffffffc06c, command=0xa8ed10 L"import socket; socket.getfqdn(\"google.com\")\n") at ../Modules/main.c:299
#20 Py_Main (argc=argc@entry=3, argv=argv@entry=0xa8e7a0) at ../Modules/main.c:747
#21 0x0000000000421f64 in main (argc=3, argv=<optimized out>) at ../Programs/python.c:69

This is probably a torsocks bug (I expect torsocks' author expects it to talk to Tor and doesn't expect this not-implemented response from Tor; but segfaulting still seems like the wrong result).

And the segfault only happens in some network configurations because it depends on a reverse lookup happening and getfqdn('') only causes that to happen if the system has a hostname that forward resolves (in the context of the Telepresence proxy pod).

This could probably be fixed in torsocks or our SOCKS proxy or both.

@ark3

This comment has been minimized.

Contributor

ark3 commented Jan 11, 2018

This is also why NodeJS doesn't work with method inject-tcp. We should fix our proxy.

@plombardi89 plombardi89 added this to Bugs in Roadmap Feb 20, 2018

@rhs rhs added this to Bug in Buckets Mar 8, 2018

@rhs rhs moved this from Bug to Robustness in Buckets Mar 8, 2018

@rhs rhs moved this from Robustness to Bug in Buckets Mar 8, 2018

@plombardi89

This comment has been minimized.

Contributor

plombardi89 commented Mar 13, 2018

@exarkun do you have any insight into this?

@exarkun

This comment has been minimized.

Contributor

exarkun commented Mar 14, 2018

Yea, we could fix this by adding RESOLVE-PTR to our socks proxy. Probably <1 day task. I can give it a shot after #400 if that makes sense. I guess being able to use NodeJS w/ inject-tcp might be a pretty high-impact improvement.

@ark3 ark3 closed this in #520 Mar 30, 2018

Roadmap automation moved this from Bugs to Completed Mar 30, 2018

@ark3

This comment has been minimized.

Contributor

ark3 commented Apr 2, 2018

Despite the work explained in #520 (comment) , the original reproducer (python3 -c "import socket; socket.getfqdn('')" ) still segfaults, at least on a Linux box with a reverse-resolvable hostname.

@ark3

This comment has been minimized.

Contributor

ark3 commented Apr 9, 2018

See also #423 for another crasher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment