New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wavemon 0.7.6-2 and github master (latest commit 62949b3) stops responding. #21
Comments
Additional info - when wavemon "stops responding", even if the heavy UDP traffic stops it's still not responding. This problem takes some time to reproduce (wavemon has to work for serveral minutes). |
to find out where it "hangs", build wavemon in debug mode and once it is hanging, attach gdb to the running process with the -pid= parameter (from another terminal), then press CTRL-C and enter |
Hi @rofl0r, New information: no heavy UDP traffic is needed for "hanging", no 2nd wavemon simultultanously is needed. The experiments below were made with github master (latest commit 62949b3) compiled with -g switch root@ev3dev:~/wavemon# gdb -pid=768
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 768
Reading symbols from /root/wavemon/wavemon...done.
Reading symbols from /lib/arm-linux-gnueabi/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/arm-linux-gnueabi/libncurses.so.5
Reading symbols from /lib/arm-linux-gnueabi/libtinfo.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/arm-linux-gnueabi/libtinfo.so.5
Reading symbols from /lib/arm-linux-gnueabi/libm.so.6...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/libm-2.19.so...done.
done.
Loaded symbols for /lib/arm-linux-gnueabi/libm.so.6
Reading symbols from /lib/arm-linux-gnueabi/libnl-genl-3.so.200...(no debugging symbols found)...done.
Loaded symbols for /lib/arm-linux-gnueabi/libnl-genl-3.so.200
Reading symbols from /lib/arm-linux-gnueabi/libnl-3.so.200...(no debugging symbols found)...done.
Loaded symbols for /lib/arm-linux-gnueabi/libnl-3.so.200
Reading symbols from /lib/arm-linux-gnueabi/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabi/libthread_db.so.1".
Loaded symbols for /lib/arm-linux-gnueabi/libpthread.so.0
Reading symbols from /lib/arm-linux-gnueabi/libc.so.6...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/libc-2.19.so...done.
done.
Loaded symbols for /lib/arm-linux-gnueabi/libc.so.6
Reading symbols from /lib/arm-linux-gnueabi/libdl.so.2...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/libdl-2.19.so...done.
done.
Loaded symbols for /lib/arm-linux-gnueabi/libdl.so.2
Reading symbols from /lib/ld-linux.so.3...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/ld-2.19.so...done.
done.
Loaded symbols for /lib/ld-linux.so.3
Reading symbols from /lib/arm-linux-gnueabi/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/arm-linux-gnueabi/libnss_files-2.19.so...done.
done.
Loaded symbols for /lib/arm-linux-gnueabi/libnss_files.so.2
0xb6d96f98 in __lll_lock_wait_private (
futex=futex@entry=0xb6df84d4 <main_arena>)
at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:31
31 ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c: No such file or directory.
(gdb) bt
#0 0xb6d96f98 in __lll_lock_wait_private (
futex=futex@entry=0xb6df84d4 <main_arena>)
at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:31
#1 0xb6d2b070 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>)
at malloc.c:3197
#2 0xb6e3193c in ?? () from /lib/arm-linux-gnueabi/libnl-3.so.200
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) |
Ok, I have also installed libnl-3-200-dbg so we have some more info:
|
And finally full trace: (gdb) bt
#0 0xb6d9ff98 in __lll_lock_wait_private (
futex=futex@entry=0xb6e014d4 <main_arena>)
at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:31
#1 0xb6d34070 in __libc_calloc (n=n@entry=1, elem_size=elem_size@entry=56)
at malloc.c:3197
#2 0xb6e3a93c in __nlmsg_alloc (len=4096)
at /build/libnl3-xnxzo3/libnl3-3.2.24/./lib/msg.c:268
#3 0xb6e3ac28 in nlmsg_alloc ()
at /build/libnl3-xnxzo3/libnl3-3.2.24/./lib/msg.c:301
#4 0x0001b85c in handle_cmd (cmd=cmd@entry=0x313e8 <cmd_linkstat>)
at iw_nl80211.c:55
#5 0x0001c3c0 in iw_nl80211_get_linkstat (ls=ls@entry=0x317d8 <ls>)
at iw_nl80211.c:550
#6 0x0001596c in sampling_do_poll () at info_scr.c:46
#7 redraw_stat_levels (signum=<optimized out>) at info_scr.c:673
#8 <signal handler called>
#9 _int_free (av=0xb6e014d4 <main_arena>, p=0x469c8, have_lock=110)
at malloc.c:3952
#10 0xb6d1d138 in __fopen_internal (filename=0xb6b0db6c "/etc/ethers",
mode=0xb6b0da0c "rce", is32=is32@entry=1) at iofopen.c:94
#11 0xb6d1d158 in _IO_new_fopen (filename=<optimized out>,
mode=<optimized out>) at iofopen.c:103
#12 0xb6b0acd8 in internal_setent (stayopen=0) at nss_files/files-XXX.c:78
---Type <return> to continue, or q <return> to quit---
#13 0xb6b0b4e8 in _nss_files_getntohost_r (addr=0xbeb554a4, result=0xbeb55064,
buffer=0xbeb55070 "", buflen=1024, errnop=0xb6f8f4d0)
at nss_files/files-ethers.c:59
#14 0xb6da9eb0 in ether_ntohost (hostname=0x0,
hostname@entry=0x35c28 <hostname> "", addr=addr@entry=0xbeb554a4)
at ether_ntoh.c:72
#15 0x0001d674 in ether_lookup (ea=0xbeb554a4, ea@entry=0xbeb5549c)
at utils.c:44
#16 0x000136b0 in display_netinfo (w_net=0x451e8) at info_scr.c:632
#17 0x00015b48 in scr_info_loop (w_menu=0x41b80) at info_scr.c:705
#18 0x0001294c in main (argc=<optimized out>, argv=<optimized out>)
at wavemon.c:211 |
This may be important - I am unable to "hang" Wavemon 0.7.6-2 (WEXT) if it works alone. So we have: |
Thank you for the helpful details. What processor architecture is this based on? I am seeing ARM above, in issue #16 it was found that |
Hi @joerg-krause,
I am just observing top output. The memory usage of wavemon stays constantly at 4.5% from the beginning to "hanging". Also the system available/used memory is near constant. The system is responsive all the time as usual. After hanging wavemon uses 0% cpu, and the same 4.5% memory. The wavemon process is marked by S in top output which means it is sleeping. There are 64 MB RAM on this system, 4,5% gives around 3 MB. |
Can you look at the free memory output in top? |
A word of caution - we are using zram in kernel for compressed swap in RAM because it turns out to be much faster than swapping on SD card (not to mention that it's killing it slowly). This probably will affect RAM/swap statistics. It doesn't seem as if it was leaking unless it's already hanged from Meantime 6 but not yet reflected in top CPU usage. But maybe you can deduce something more from it. Before: top - 10:13:54 up 9 min, 2 users, load average: 0.01, 0.30, 0.27
Tasks: 64 total, 1 running, 63 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.7 us, 4.5 sy, 0.0 ni, 92.4 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
KiB Mem: 58660 total, 56840 used, 1820 free, 4436 buffers
KiB Swap: 98300 total, 980 used, 97320 free. 30652 cached Mem Meantime 1: top - 10:14:48 up 10 min, 2 users, load average: 0.15, 0.29, 0.27
Tasks: 65 total, 1 running, 64 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.6 us, 20.3 sy, 0.0 ni, 65.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 58660 total, 57172 used, 1488 free, 4340 buffers
KiB Swap: 98300 total, 988 used, 97312 free. 30752 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2760 2532 S 14.6 4.7 0:05.10 wavemon Meantime 2: top - 10:16:31 up 12 min, 2 users, load average: 0.57, 0.38, 0.30
KiB Mem: 58660 total, 57188 used, 1472 free, 4340 buffers
KiB Swap: 98300 total, 988 used, 97312 free. 30752 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2776 2532 S 14.8 4.7 0:20.42 wavemon Meantime 3: top - 10:19:01 up 14 min, 2 users, load average: 0.56, 0.47, 0.35
KiB Mem: 58660 total, 57188 used, 1472 free, 4340 buffers
KiB Swap: 98300 total, 988 used, 97312 free. 30752 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2780 2532 S 14.7 4.7 0:42.41 wavemon Meantime 4: top - 10:20:18 up 15 min, 2 users, load average: 0.57, 0.48, 0.37
KiB Mem: 58660 total, 57212 used, 1448 free, 5948 buffers
KiB Swap: 98300 total, 1060 used, 97240 free. 29024 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2780 2532 S 15.0 4.7 0:53.70 wavemon Meantime 5: top - 10:21:06 up 16 min, 2 users, load average: 0.60, 0.52, 0.38
KiB Mem: 58660 total, 57220 used, 1440 free, 5948 buffers
KiB Swap: 98300 total, 1060 used, 97240 free. 29024 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2780 2532 S 15.7 4.7 1:00.75 wavemon Meantime 6: top - 10:21:34 up 17 min, 2 users, load average: 0.80, 0.56, 0.40
KiB Mem: 58660 total, 56956 used, 1704 free, 5944 buffers
KiB Swap: 98300 total, 1076 used, 97224 free. 28752 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2780 2532 S 14.4 4.7 1:05.08 wavemon Meantime 7: top - 10:22:54 up 18 min, 2 users, load average: 0.62, 0.56, 0.41
KiB Mem: 58660 total, 56948 used, 1712 free, 5944 buffers
KiB Swap: 98300 total, 1076 used, 97224 free. 28752 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
564 root 20 0 5032 2780 2532 R 14.1 4.7 1:16.78 wavemon Crash, hang, not responding: top - 10:23:47 up 19 min, 2 users, load average: 0.27, 0.47, 0.39
KiB Mem: 58660 total, 56948 used, 1712 free, 5944 buffers
KiB Swap: 98300 total, 1076 used, 97224 free. 28752 cached Mem
564 root 20 0 5032 2780 2532 S 0.0 4.7 1:17.03 wavemon |
The problem occurred in the signal handler. Sampling is done by calling To confirm whether this is the cause, test if the problem "goes away" with a very long update interval ( I have not yet had time to replace the interval timer with its own pthread (which is the better solution and does not suffer from potential overlap). It is a bit more work, since the code is shared between the info and the histogram screen. |
In the location I am now I don't have the hardware combination with Edimax EW-7811Un (Realtek RTL8188CUS chipset). Polling frequency test, let's call it The closest I have is Edimax EW-7612UAn V2 (Realtek RTL8192SU chipset) wchich also works with 8192cu kernel module (I can see that in lsmod output) Interestingly I was unable to reproduce the problem over last 45 minutes with this configuration, it doeesn't seem like there is a problem with this device. I will continue to run wavemon as I am working with this device to make sure. |
@bmegli I'm not sure if the values in top does represent the free memory, but the memory used by zram. |
doing anything non-trivial (such as memory allocation) from a signal handler is UB. |
This is a test implementation, to help resolve issue #21. It replaces the interval timer with a pthread. The implementation is not complete, since handling the screen updates within a separate pthread is suboptimal (screen not cleared correctly at some times). If this fixes issue #21, further refactoring to follow.
@bmegli - if your time permits, could you check the _development_ branch: I added experimental support for using a pthread instead of the interval timer. If that turns out to fix the issues experienced on the ARM architecture, it proves that the problems have to do with signal handling. The commit is not perfect, it would need a little more work to improve screen updates. What is strange is that
This does not immediately point at signal handling, perhaps implicitly. |
First, I am sorry for misunderstanding:
When I wrote together, I meant 0.7.6-2 + github master. I don't think there is a problem with running two 0.7.6-2, I haven't even tried that |
Compiled and "queued" for tommorow, time deficit today. |
Main Information:I have been running wavemon develop (commit) for around 2 hours without a problem. It seems that this solves the problem. Additional InformationI have two nearly identical devices and only on one of them I am getting problem with Wavemon github master (latest commit 62949b3, nl80211) Those are exactly the same devices, I am running them from the same SD card (100% matching OS). The difference lies only in peripherals and power source. The one that all wavemons are ok with has nothing plugged in, apart from WiFi adapter, and is powered from batteries. The one that wavemon github master (latest commit 62949b3, nl80211) hangs on has:
Now - the wavemon hangs even if those devices are not generating data/operating. Still they affect the system, we have autodetection mechanism for devices in ports, etc. They also affect the power on USB bus. The two of motors encoders are monitored by the system. |
Thank you for testing, and the excellent information. Interesting that a different hardware configuration causes the issue to appear. It could be different timings, not sure. The real fix will take a bit longer, to clean up the graphics. Will do that as soon as I get to it. Thanks again. |
You're welcome! Actually wavemon implementation helped me a lot in writing nl80211 library I need for another project. Kind regards! |
Nice job! I like in particular the use of libmnl. |
This is a test implementation, to help resolve issue #21. It replaces the interval timer with a pthread. The implementation is not complete, since handling the screen updates within a separate pthread is suboptimal (screen not cleared correctly at some times). If this fixes issue #21, further refactoring to follow.
@bmegli - Can I ask one last favour, please? Could you check if the current |
@grrtrr - no problem but I will report back during the week (e.g. monday or tuesday) when I am at the location where the hardware is |
wavemon master (affa6d8) was running for 3 h 20 minutes without a problem, I stopped the test. I am checking now the previous version (master 62949b3) that had problem. We had kernel update in the meantime so I have to make sure it's not related in any way (and I still get "hanging" in the old version). |
Thanks a lot for checking, really appreciated. I will wait until you get back before doing anything with this issue. |
Thank you very much for the diligent testing. My testing has been limited to the hardware I have (x86 laptop). In combination with the earlier test (initial patch on the Unless you have any other comments or issues, I will close the issue. Thanks again. |
I am happy, wavemon is again working flawlessly on all my machines. Thanks. |
The Hardware & Software are the same as in #20.
Problem
If I use wavemon under heavy UDP traffic it stops responding. UI does not update, Tx/Rx are not updated and it doesn't react to keys, only ctrl+C helps (kills it).
I have seen it a lot of times with github master (latest commit 62949b3, this is nl80211 version) and sometimes with wavemon 0.7.6-2 (wext based version).
The ssh sessions running wavemons are alive (I can kill "not responding" wavemons with ctrl+C).
I have seen it when running 2 wavemons simultanously. I can check if this happens with just single running.
The text was updated successfully, but these errors were encountered: