Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtr 0.85 crash when using tcp mode #28

Closed
felixonmars opened this issue Aug 9, 2013 · 9 comments
Closed

mtr 0.85 crash when using tcp mode #28

felixonmars opened this issue Aug 9, 2013 · 9 comments

Comments

@felixonmars
Copy link

Steps to reproduce:

# mtr -i0.01 -T 8.8.8.8

It crashed in about a second (SIGABRT).

Backtrace:

(gdb) bt
#0  0x00007f471e2df1c9 in raise () from /usr/lib/libc.so.6
#1  0x00007f471e2e05c8 in abort () from /usr/lib/libc.so.6
#2  0x00007f471e31d037 in __libc_message () from /usr/lib/libc.so.6
#3  0x00007f471e3a4e67 in __fortify_fail () from /usr/lib/libc.so.6
#4  0x00007f471e3a3070 in __chk_fail () from /usr/lib/libc.so.6
#5  0x00007f471e3a4dd7 in __fdelt_warn () from /usr/lib/libc.so.6
#6  0x000000000040786c in net_process_fds ()
#7  0x000000000040bae5 in select_loop ()
#8  0x0000000000402b7b in main ()

OS is Arch Linux x86_64, with glibc 2.17

@rewolff
Copy link
Collaborator

rewolff commented Aug 9, 2013

Does it also crash for you when you specify a sensible value for the interval?

As written you're trying to send 100 packets per second to about 10 hosts, or about 1000 connection requests per second. I can imagine "unexpected" things to happen.

My mtr also bombed out. After about ten seconds. But you're running this "as root" to work around the "lower limit of 1.0 seconds per round". As the root user you can break your system by issueing the wrong commands or giving the wrong arguments.

On your computer things go wrong (a segfault) after about a second. On my computer it exits with "Socket: succes!" after 5 seconds or so. So the limit depends on the computer somehow. I could program in a new limit, that today works on your computer and on mine. But after a few years that limit will be too high as computers and networks have gotten faster. So I don't like such an arbitrary limit.

So... You are stress-testing your system and network by attempting to start 500 to 1500 connections per second. And lo-and-behold, something goes wrong.

I suspect the problem might be caused by some fundamental IPV4 limit: Maybe you're running out of socket numbers or something like that.

If it breaks, you get to keep both pieces.

If you can reproduce this with normal parameters, feel free to reopen this issue. It is a "won't fix" for me.

@rewolff rewolff closed this as completed Aug 9, 2013
@depaoli
Copy link

depaoli commented Apr 19, 2014

Hi!I have the same issue with mtr 0.85 installed through MacPorts: "bind(): Undefined error: 0".
Command used is "mtr --tcp www.google.it", but even with other parameters it fails everytime.
By doing "dtruss mtr --tcp www.google.com":

...
stat64("/usr/lib/libxar.1.dylib\0", 0x7FFF5xx31368, 0x7FFF5xx32200)      = 0 0
�[?1049h�[1;24r�(B�[m�[4l�[?7h�[H�[2J�[1;29H�(B�[0;1m�[1K My traceroute  [v0.85]
�(B�[mpc.lan (0.0.0.0)�[2;56HSat Apr 19 17:02:25 2014
Keys:  �(B�[0;1mH�(B�[melp   �(B�[0;1mD�(B�[misplay mode   �(B�[0;1mR�(B�[mestart statistics   �(B�[0;1mO�(B�[mrder of fields   �(B�[0;1mq�(B�[muit�[4;37H�(B�[0;1m   Packets�[15X�[4;62HPings
 Host�[5;37H Loss%   Snt   Last   Avg  Best  Wrst StDev�[H
�(B�[m�[24;1H�[?1049l
�[?1l�>�[?1049h�[1;24r�(B�[m�[4l�[?7h�[H�[2J�[1;29H�(B�[0;1m�[1K My traceroute  [v0.85]
�(B�[mpc.lan (0.0.0.0)�[2;56HSat Apr 19 17:02:25 2014
Keys:  �(B�[0;1mH�(B�[melp   �(B�[0;1mD�(B�[misplay mode   �(B�[0;1mR�(B�[mestart statistics   �(B�[0;1mO�(B�[mrder of fields   �(B�[0;1mq�(B�[muit�[4;37H�(B�[0;1m   Packets�[15X�[4;62HPings
 Host�[5;37H Loss%   Snt   Last   Avg  Best  Wrst StDev�[H
�(B�[mbind(): Inappropriate ioctl for device
...
bind(0xD, 0x7FFxxF2xx610, 0x80)      = -1 Err#22
...
ioctl(0x2, 0x8048xx15, 0x7FE8B84xx1E0)       = -1 Err#25
ioctl(0x2, 0x8048xx15, 0x7FFF5xx32500)       = -1 Err#25
...

Any thought about "[mbind(): Inappropriate ioctl for device"?

Thanks

@rewolff
Copy link
Collaborator

rewolff commented Apr 20, 2014

If you specify "-r" option the "truss" output will not be interspersed with escape sequences to handle the full-screen-output. Or otherwise send the truss output somewhere else. On Linux we have "strace" which does the same as truss. It has an -o option to save it to a file.

@depaoli
Copy link

depaoli commented Apr 25, 2014

It looks like "-r" is not available:

$ dtruss -r
/usr/bin/dtruss: illegal option -- r
USAGE: dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command }
          -p PID          # examine this PID
          -n name         # examine this process name
          -t syscall      # examine this syscall only
          -a              # print all details
          -c              # print syscall counts
          -d              # print relative times (us)
          -e              # print elapsed times (us)
          -f              # follow children
          -l              # force printing pid/lwpid
          -o              # print on cpu times
          -s              # print stack backtraces
          -L              # don't print pid/lwpid
          -b bufsize      # dynamic variable buf size
   eg,
       dtruss df -h       # run and examine "df -h"
       dtruss -p 1871     # examine PID 1871
       dtruss -n tar      # examine all processes called "tar"
       dtruss -f test.sh  # run test.sh and follow children

Anyway, I have played with Xcode and it looks like something is going wrong around net.c line 340 (http://tinyurl.com/l8b5akn):

if (bind(s, (struct sockaddr *) &local, sizeof (local))) {   //here we get the error 22 - ?EINVAL?
    display_clear();
    perror("bind()");
    exit(EXIT_FAILURE);
}

By modifying this code to:

if (bind(s, (struct sockaddr *) &local, sizeof (struct sockaddr))) {
    display_clear();
    perror("bind()");
    exit(EXIT_FAILURE);
}

Bind() seems happy, but I don't have any hop printed in the output, just:

matrix:mtr dpm$ sudo ./mtr --tcp --report --port 80 8.8.8.8
Start: Fri Apr 25 12:38:23 2014
HOST: pc.lan                  Loss%   Snt   Last   Avg  Best  Wrst StDev

yvs2014 added a commit to yvs2014/mtr085 that referenced this issue Jun 14, 2014
Looks like the same issue:
traviscross#28
https://bugs.launchpad.net/mtr/+bug/1273486
https://bugs.launchpad.net/mtr/+bug/1327036

 % uname -srm; mtr -r --tcp localhost
 FreeBSD 10.0-RELEASE-p4 amd64
 Start: Fri Jun  6 22:20:00 2014
 bind(): Invalid argument
 %

 % uname -srm; mtr -r --tcp localhost
 NetBSD 6.1.3_PATCH i386
 Start: Fri Jun  6 22:23:13 2014
 bind(): Invalid argument
 %

 % uname -a; % mtr -r --tcp localhost
 SunOS opensolaris 5.11 oi_151a9 i86pc i386 i86pc
 Start: Wed Jun 11 19:49:28 2014
 bind(): Invalid argument
 %

OpenSolaris/OI: compile fails with FIONBIO

Looks like the same issue:
traviscross#27
traviscross#35
https://bugs.launchpad.net/mtr/+bug/1273486

 % make
 make  all-recursive
 Making all in img
 depbase=`echo net.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
 gcc -DHAVE_CONFIG_H  -I.       -g -O2 -Wall -MT net.o -MD -MP -MF $depbase.Tpo -c -o net.o net.c &&\
 mv -f $depbase.Tpo $depbase.Po
 net.c: In function `net_send_tcp':
 net.c:360: error: `FIONBIO' undeclared (first use in this function)
 net.c:360: error: (Each undeclared identifier is reported only once
 net.c:360: error: for each function it appears in.)
 *** Error code 1

fix: #define BSD_COMP
yvs2014 added a commit to yvs2014/mtr085 that referenced this issue Jun 14, 2014
Looks like the same issue at the bottom of this report:
traviscross#28

 % uname -a; ./mtr -rT localhost; echo $?
 Start: Thu Jun 12 17:09:11 2014
 SunOS opensolaris 5.11 oi_151a9 i86pc i386 i86pc
 141

 % uname -srm; ./mtr -4rT localhost; echo $?
 NetBSD 6.1.3_PATCH i386
 Start: Thu Jun 12 16:14:47 2014
 141

 % gdb ./mtr
 (gdb) set args -4rT localhost
 (gdb) r
 Start: Thu Jun 12 16:15:36 2014
 Program received signal SIGPIPE, Broken pipe.
 [Switching to LWP 1]
 0xbb767387 in write () from /usr/lib/libc.so.12

 (gdb) bt
 #0  0xbb767387 in write () from /usr/lib/libc.so.12
 traviscross#1  0xbb470263 in write () from /usr/lib/libpthread.so.1
 traviscross#2  0x0804fea6 in net_process_fds (writefd=0xbfbfeaac) at net.c:1542
 traviscross#3  0x080544a7 in select_loop () at select.c:264
 traviscross#4  0x0804d5ee in main (argc=3, argv=0xbfbfec58) at mtr.c:719

 (gdb) list net.c:1541,1543
 1541        if (fd > 0 && FD_ISSET(fd, writefd)) {
 1542          r = write(fd, "G", 1);
 1543          /* if write was successful, or connection refused we have
 (gdb)

fix:
 If the socket is connected, getpeername() will return 0.
 Else getpeername() will return ENOTCONN, and read(,,1) will produce the right errno.
 This is a combination of suggestions from Douglas C. Schmidt and Ken Keys.
@uhle
Copy link

uhle commented Jun 20, 2014

@numericillustration
Copy link

though this issue is closed, its still happening, so in case it is useful I've created a gist of the dtruss output from running the following in a separate terminal

sudo dtruss -flesal -n mtr  

during the crash from running: sudo mtr --tcp 8.8.8.8

@QwertyZW
Copy link

QwertyZW commented Jul 5, 2018

Running into this with

~% mtr --version
mtr 0.86

@chrcoluk
Copy link

I have this problem FreeBSD 11.2, TCP mode with no other flags specified

mtr --version

mtr UNKNOWN

O_o, it is 0.92 tho.

@rewolff
Copy link
Collaborator

rewolff commented Dec 4, 2018

You can help me reproduce it by restating the command I need to try to get what you see. Maybe a cut-and-paste of both the command and the crashing output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants