Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTR does not play nice with nohup #231

Open
svdasein opened this issue Nov 20, 2017 · 20 comments
Open

MTR does not play nice with nohup #231

svdasein opened this issue Nov 20, 2017 · 20 comments

Comments

@svdasein
Copy link

I've come across two instances - one on an older ubuntu intel and on an a very old sparc solaris - where attempting to run a loop around "mtr -c 1" in a backgrounded nohupped job with output & err > /dev/null and input < /dev/null leads to mtr sitting in the background doing apparently nothing (save for I think periodically checking a filehandle with a select call). I've tried all kinds of permutations and tricks & am becoming really puzzled by this. Am I doing something wrong or is there something keeping it from working in that context?

@svdasein
Copy link
Author

Additional detail:

I tried this on a new distribution - same problem.

Replicate with:

nohup mtr -c 1 --split 8.8.8.8

This will never return. Strace shows:

select(10, [0 6 8 9], [], NULL, {0, 0}) = 1 (in [0], left {0, 0})
[repeating - very fast]

If you run:

nohup mtr -c 1 --split 8.8.8.8 < /dev/null > /dev/null 2>&1 &

you get the same strace pattern and it never returns.

@rewolff
Copy link
Collaborator

rewolff commented Nov 21, 2017

I've fixed a bug that might present itself just like this "for macos". But I suspect I may have fixed it for you as well. Are you running the latest version?

When select returns '1 (in[0])' the code can only do one thing, and that is issue a READ to filedescriptor zero.
read (0, "", 1) = -1 Exxx
You're seeing those as well? Right?

@rewolff
Copy link
Collaborator

rewolff commented Nov 21, 2017

I've looked at the "split.c" code, and it had a similar bug to what caused problems on macos. Fixed now. (I like single-line-fixes).

Still I can't reproduce your failure mode with the new or before-fix code.

@svdasein
Copy link
Author

There was a circumstance in which I was also seeing a read(0...) but I can't seem to recreate that at the moment. I've tried with two versions of mtr: 0.86 and 0.92.25-fd41 (which is latest). The fast loop around a select is a constant though.

@svdasein
Copy link
Author

I'll spin something fresh up & see if I can reproduce on that - if I can I'll send you creds to log in.

@svdasein
Copy link
Author

I have set up a vm on which I've been able to re-create the problem with both the latest (from git) code and with the distribution's default package.

The older version is just a fast loop around:

select(10, [0 6 8 9], [], NULL, {0, 0}) = 1 (in [0], left {0, 0})

The latest version presents with two pids - one a child of another. The one that's taking a bunch of cpu is doing this:

select(8, [0 5 7], [], NULL, {0, 0}) = 1 (in [0], left {0, 0})
select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
read(0, "", 1) = 0

I have set up a login on the host for you with sudo. I will send a private message with credentials. Feel free to login at your convenience.

@svdasein
Copy link
Author

svdasein commented Nov 21, 2017

Hm ok I guess I can't send a private message - is there some other way?

Edit/answer by REW: If you do a git clone, you can look in the commit messages to find Email addresses.

@svdasein
Copy link
Author

rewolff - if you're on freenode, I'm dparker & I'm in #mtr right now - If you go there I'll send you the creds in a pm

@svdasein
Copy link
Author

I was just looking at split.c - if that's where this is looping, I think maybe what's required is a new command line switch to just run without any option for interaction.

The use case is this:

I need to run a daemon (background, nohup) that repeatedly traces to an IP (-c 1). I'm taking the stdout from that and sending to logstash and from there to elasticsearch. This helped me track down a problem with thrashing routes and probably will be a useful tool going forward. However the inability to run mtr under nohup etc keeps me from being able to fire and forget - I have to leave a console open.

@rewolff
Copy link
Collaborator

rewolff commented Nov 21, 2017

Can you try again with the LATEST git. :-)
(as of 3 seconds ago. Has not even passed the "it compiles ship it" test. )

@svdasein
Copy link
Author

I pulled and rebuilt:

rewolff@mtrtest:~$ /opt/mtr/sbin/mtr --version
mtr 0.92.26-ebdb

Foreground:

rewolff@mtrtest:~$ /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt 
rewolff@mtrtest:~$ cat test.txt
1 23.92.24.3 0 1 1 0 0 0
2 173.230.159.2 0 1 1 0 0 0
3 206.197.187.50 0 1 1 4 4 4
3 as15169.sfmix.org 0 1 1 4 4 4
4 108.170.242.81 0 1 1 3 3 3
5 216.239.63.65 0 1 1 5 5 5
6 8.8.8.8 0 1 1 4 4 4
6 google-public-dns-a.google.com 0 1 1 4 4 4

nohup foreground:

rewolff@mtrtest:~$ nohup /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt 
nohup: ignoring input and redirecting stderr to stdout
rewolff@mtrtest:~$ jobs
rewolff@mtrtest:~$ cat test.txt
rewolff@mtrtest:~$ ps auxw | grep mtr
rewolff  20102  0.0  0.0  14224   944 pts/0    S+   16:34   0:00 grep --color=auto mtr

nohup background:

rewolff@mtrtest:~$ nohup /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt  2>&1  < /dev/null &
[1] 20114
rewolff@mtrtest:~$ 
[1]+  Done                    nohup /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt 2>&1 < /dev/null
rewolff@mtrtest:~$ 
rewolff@mtrtest:~$ cat test.txt
rewolff@mtrtest:~$ 

@svdasein
Copy link
Author

rewolff I sent you an email

@yvs2014
Copy link

yvs2014 commented Nov 21, 2017

$ nohup /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt
$ nohup /opt/mtr/sbin/mtr -c 1 --split 8.8.8.8 > test.txt 2>&1 < /dev/null &

In split-mode, mtr monitors the input for some commands, and you are trying to make it read from an unreadable source. If it was supposed to get the output in a file, you would try that in report/raw modes.

@svdasein
Copy link
Author

yvs2014 ah ok - I guess that was what I was missing (it doesn't really say that in the man page). I'll rework my stuff to use the raw format. Thanks - closing.

@rewolff
Copy link
Collaborator

rewolff commented Nov 22, 2017

WAIT WAIT!

You, if I analyzed this correctly, made MTR hang instead of doing the right thing.

The way I fixed it, I think it will now quit, correct? Are you seeing that too? Even if you're going to make use of a different mode, I would like this bug fixed anyway.

@svdasein
Copy link
Author

rewolff - it did in fact quit, yes. The series of tests I showed illustrate that. So yes I think with your change it's now doing the right thing.

@svdasein svdasein reopened this Nov 22, 2017
@AlexTan-b-z
Copy link

Hi, did you solve this problem? How?
I have the same problem.
Can you help me? Thank you!

@svdasein
Copy link
Author

svdasein commented Jan 6, 2018 via email

@rewolff
Copy link
Collaborator

rewolff commented Jan 6, 2018

As far as I know, --report is meant to run an mtr scan say every hour from cron. So why are you saying otherwise?

@AlexTan-b-z
Copy link

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants