Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker crashes, running out of file descriptors #55

Closed
RalfJung opened this issue Oct 30, 2017 · 5 comments
Closed

Broker crashes, running out of file descriptors #55

RalfJung opened this issue Oct 30, 2017 · 5 comments

Comments

@RalfJung
Copy link
Member

After 5-6h of uptime, the tunneldigger broker quits with the following error:

Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: Traceback (most recent call last):
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: "__main__", fname, loader, pkg_name)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: exec code in run_globals
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/main.py", line 113, in <module>
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: event_loop.start()
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/eventloop.py", line 59, in start
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: pollable.read(file_object)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/network.py", line 98, in read
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: callback()
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/tunnel.py", line 230, in pmtu_di
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: self.create_timer(self.pmtu_discovery, timeout=random.randrange(2, 5))
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/network.py", line 83, in create_
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: timer = timerfd.create(timerfd.CLOCK_MONOTONIC)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/timerfd.py", line 117, in create
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: ret = libc.timerfd_create(clock_id, flags)
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: File "/opt/tunneldigger/local/lib/python2.7/site-packages/tunneldigger_broker-0.3.0-py2.7-linux-x86_64.egg/tunneldigger_broker/timerfd.py", line 103, in errche
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: raise OSError(errno, os.strerror(errno))
Oct 30 03:24:41 gw3.saar.freifunk.net python[9594]: OSError: [Errno 24] Too many open files

This has now happened twice. The first time, there were 150 tunnels connected constantly. The second time, the number of tunnels slowly went up from 0 to 70 before the crash.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 30, 2017

Looking at the process (/proc/$PID/fd) right now (with about 70 active connections) shows 1019 file descriptors. 144 are anon_inode:[timerfd], and 800 of them are pipe:[...]. 75 are socket:[...]. So it seems these pipes (whatever they are) are actually much worse than the timerfds.

@RalfJung
Copy link
Member Author

Grepping for pipes shows that the hooks use a pipe. And they never seem to close it.

What is the reason not to use subprocess.check_output?

@kostko
Copy link
Member

kostko commented Oct 30, 2017

What is the reason not to use subprocess.check_output?

That would block the event loop. We need non-blocking file descriptors, which we register in the event loop, which uses epoll.

It looks like HookProcess.close doesn't close the pipe file descriptors after unregistering them from the event loop. Could you try adding self.process.stdout.close() (and the same for stderr) at the end of close?

@RalfJung
Copy link
Member Author

Could you try adding self.process.stdout.close() (and the same for stderr) at the end of close?

Already done and rolled out to one server and confirmed to get rid of the hundreds of pipes. :) See #59.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 1, 2017

After 12h of uptime, we now have the following FD usage:

# ls -lah | fgrep socket -c
143
root@gw3:/proc/6638/fd# ls -lah | fgrep timer -c
278

Looks like indeed the big leak got fixed.

@RalfJung RalfJung closed this as completed Nov 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants