Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodebot spins if it doesn't detect the remote end is disconnected #12

Open
thwarted opened this issue Apr 12, 2011 · 1 comment
Open

Comments

@thwarted
Copy link
Owner

Using 100% CPU.

lsof output:

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE      NODE NAME
nodebot 13526 nagios  cwd    DIR    9,0    4096         2 /
nodebot 13526 nagios  rtd    DIR    9,0    4096         2 /
nodebot 13526 nagios  txt    REG    9,0 1387928    556499 /usr/bin/python2.5
nodebot 13526 nagios  mem    REG    9,0   80760    158744 /lib/libresolv-2.7.so
nodebot 13526 nagios  mem    REG    9,0   22856    158737 /lib/libnss_dns-2.7.so
nodebot 13526 nagios  mem    REG    9,0   47528    158738 /lib/libnss_files-2.7.so
nodebot 13526 nagios  mem    REG    9,0   31488    563303 /usr/lib/python2.5/lib-dynload/_struct.so
nodebot 13526 nagios  mem    REG    9,0   33984    174331 /usr/lib/python-support/python-ssl/python2.5/ssl/_ssl2.so
nodebot 13526 nagios  mem    REG    9,0   11568    564151 /usr/lib/python2.5/lib-dynload/resource.so
nodebot 13526 nagios  mem    REG    9,0   13504    563186 /usr/lib/python2.5/lib-dynload/_random.so
nodebot 13526 nagios  mem    REG    9,0   21224    563309 /usr/lib/python2.5/lib-dynload/binascii.so
nodebot 13526 nagios  mem    REG    9,0   18072    563313 /usr/lib/python2.5/lib-dynload/math.so
nodebot 13526 nagios  mem    REG    9,0   28032    563316 /usr/lib/python2.5/lib-dynload/strop.so
nodebot 13526 nagios  mem    REG    9,0   20568    563319 /usr/lib/python2.5/lib-dynload/time.so
nodebot 13526 nagios  mem    REG    9,0   17536    563315 /usr/lib/python2.5/lib-dynload/select.so
nodebot 13526 nagios  mem    REG    9,0   93536    556958 /usr/lib/libz.so.1.2.3.3
nodebot 13526 nagios  mem    REG    9,0 1560040    561136 /usr/lib/libcrypto.so.0.9.8
nodebot 13526 nagios  mem    REG    9,0  321280    561137 /usr/lib/libssl.so.0.9.8
nodebot 13526 nagios  mem    REG    9,0   21016    564145 /usr/lib/python2.5/lib-dynload/_ssl.so
nodebot 13526 nagios  mem    REG    9,0   62024    563187 /usr/lib/python2.5/lib-dynload/_socket.so
nodebot 13526 nagios  mem    REG    9,0 1436976    158715 /lib/libc-2.7.so
nodebot 13526 nagios  mem    REG    9,0  526560    158733 /lib/libm-2.7.so
nodebot 13526 nagios  mem    REG    9,0   10584    158748 /lib/libutil-2.7.so
nodebot 13526 nagios  mem    REG    9,0   14624    158732 /lib/libdl-2.7.so
nodebot 13526 nagios  mem    REG    9,0  130224    158743 /lib/libpthread-2.7.so
nodebot 13526 nagios  mem    REG    9,0  127480    158564 /lib/ld-2.7.so
nodebot 13526 nagios  mem    REG    9,0  254076    573334 /usr/lib/locale/en_US.utf8/LC_CTYPE
nodebot 13526 nagios  mem    REG    9,0   25700    556556 /usr/lib/gconv/gconv-modules.cache
nodebot 13526 nagios    0u   CHR    1,3              7187 /dev/null
nodebot 13526 nagios    1u   CHR    1,3              7187 /dev/null
nodebot 13526 nagios    2u   CHR    1,3              7187 /dev/null
nodebot 13526 nagios    3u  sock    0,4         180591266 can't identify protocol
nodebot 13526 nagios    4u  sock    0,4         180591268 can't identify protocol

strace was a continual loop of:

gettimeofday({1302585555, 789184}, NULL) = 0
gettimeofday({1302585555, 789211}, NULL) = 0
select(5, [4], [4], [], {10, 0})        = 1 (in [4], left {10, 0})
read(4, "", 5)                          = 0
gettimeofday({1302585555, 789326}, NULL) = 0
gettimeofday({1302585555, 789352}, NULL) = 0
select(5, [4], [4], [], {10, 0})        = 1 (in [4], left {10, 0})
read(4, "", 5)                          = 0
gettimeofday({1302585555, 789466}, NULL) = 0
gettimeofday({1302585555, 789493}, NULL) = 0
select(5, [4], [4], [], {10, 0})        = 1 (in [4], left {10, 0})
read(4, "", 5)                          = 0

So it looks like the remote end is disconnected but because it's in non-blocking mode, the return value of 0 from read is valid (not interpreted as close connection).

Possible fixes:

  • put the file descriptors in the exception list
  • is a signal being blocked/ignored that should have been delivered when disconnected?
  • periodically call getpeername and verify it returns something sane (should fail in this case since the socket isn't connected).
@Roguelazer
Copy link

One other option might be to periodically disconnect and reconnect. This would additionally cause nodebot to better handle load-balanced IRC connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants