New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does WiFiDog eat so much memory? #34

Open
Existed opened this Issue Nov 12, 2014 · 30 comments

Comments

Projects
None yet
5 participants
@Existed

Existed commented Nov 12, 2014

OS: OpenWrt Attitude Adjustment 12.09-rc1
Router: Huawei hg255d
CPU: Ralink RT3052 id:1 rev:3
RAM: 64 MB

Mem: 45232K used, 17220K free, 0K shrd, 4300K buff, 17768K cached
CPU: 73% usr 19% sys 0% nic 0% idle 0% io 0% irq 7% sirq
Load average: 3.47 3.34 2.87 3/69 20528
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
10576 4848 root S 49020 78% 15% /bin/wifidog -d 7 -f -c ./etc/wdev.conf

root@OpenWRT:/usr/local/wifidog# cat /proc/pgrep wifidog/status
Name: wifidog
State: S (sleeping)
Tgid: 10576
Pid: 10576
PPid: 4848
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups:
VmPeak: 110576 kB
VmSize: 49020 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 2032 kB
VmRSS: 1540 kB
VmData: 48084 kB
VmStk: 136 kB
VmExe: 84 kB
VmLib: 664 kB
VmPTE: 80 kB
VmSwap: 0 kB
Threads: 7
SigQ: 0/486
SigPnd: 00000000000000000000000000000000
ShdPnd: 00000000000000000000000000000000
SigBlk: 00000000000000000000000000000000
SigIgn: 00000000000000000000000000001000
SigCgt: 00000000000000000000000180024006
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
Cpus_allowed: 1
Cpus_allowed_list: 0
voluntary_ctxt_switches: 97533
nonvoluntary_ctxt_switches: 206638
root@OpenWRT:/usr/local/wifidog#

Why does WiFiDog eat so much memory?

@acv

This comment has been minimized.

Contributor

acv commented Mar 12, 2015

It's using 1540 kB of RAM in your example. That does not seem excessive. VmRSS is the actual used ram. The larger numbers are mostly virtual memory.

At peak, it used 2032 kB (VmHWM, RSS high water mark).

@acv

This comment has been minimized.

Contributor

acv commented Mar 12, 2015

Derp, I didn't see the VmData... There's a leak somewhere to get that high.

@acv

This comment has been minimized.

Contributor

acv commented Mar 12, 2015

Someone with a decent usage volume and a fast x86/amd64 machine should try running wifidog under valgrind.

@acv

This comment has been minimized.

Contributor

acv commented Mar 12, 2015

Pull request #71 fixes a memory leak in the code that handles all the HTTP redirect, a slow leak of one mac address string (18 bytes) worth of ram per request.

@acv

This comment has been minimized.

Contributor

acv commented Mar 12, 2015

I'm basically going to use this Issue has a "let's audit all use of malloc and friends (safe_*) and make sure free is eventually called once and only once."

@acv acv added the bug label Mar 14, 2015

@acv acv self-assigned this Mar 14, 2015

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 15, 2015

Gcc/clang address sanitizer might come in handy here: https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer

It's probably a bit faster than valgrind.

@mhaas mhaas added this to the 1.2.0 milestone Mar 18, 2015

@sinkcup sinkcup modified the milestones: 1.3.0, 1.2.0 Mar 19, 2015

@acv

This comment has been minimized.

Contributor

acv commented Mar 22, 2015

Further memory leak found in http.c that could leak the url for every redirects by the gateway. See PR #136

@acv

This comment has been minimized.

Contributor

acv commented Mar 24, 2015

It's unclear that there are any outstanding memory leaks. It would be interesting to know if someone with a busy access point could run the latest code for a few days/weeks are report daily snapshot of the memory map to see if there's any unbounded growth.

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 24, 2015

I would love to see a test script which exercises wifidog. Just connect users, disconnect users etc. Probably with an authorized server which randomly sends auth: 1 and auth: 0. Then we could run valgrind.

@acv

This comment has been minimized.

Contributor

acv commented Mar 24, 2015

The need for unique ip/mac pairs makes that hard to do.

@acv

This comment has been minimized.

Contributor

acv commented Mar 24, 2015

I might look at using Scapy to do this... Spin up and maintain a couple hundred clients.

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 24, 2015

Scapy sounds like a lot of fun. I presume it requires root?

We could also hook up fake IPs/mac addresses into the code somehow. I guess wifidog relies on the IP adress reported by libhttpd. For the mac, just use a regular file instead of /proc/net/arp.

@acv

This comment has been minimized.

Contributor

acv commented Mar 24, 2015

If the code base was less coupled, we could try unit tests, I did some test program for pstring.c and I had to link in every single .o to satisfy the compiler (renaming the original main())!

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 25, 2015

I was thinking more along the lines of integration tests, where pretty much the whole code base is exercised.. And we wouldn't necessarily check for correctness, just for memory leaks. But we could also make sure the firewall rules fire correctly etc.

Re unit tests, there is no problem linking in all files. Re main, see
http://stackoverflow.com/questions/3097825/is-there-a-gcc-compiler-linker-option-to-change-the-name-of-main

@mhaas mhaas closed this Mar 25, 2015

@mhaas mhaas reopened this Mar 25, 2015

@acv

This comment has been minimized.

Contributor

acv commented Mar 25, 2015

I was looking at the check framework and they simply recommend extracting a stub main() to a C file that will only call the real entry point and packaging the rest of the .o's in a library. You can then link the tests against the library. That seems reasonable and when I get a few hours, I'll probably try that on a fork and start writing unit tests for various data structure handling functions (client_list, config, AuthServer, pstring, etc.)

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 25, 2015

Sweet! Do you have a test framework in mind or just use exit(1) on failure?

I'll probably try that on a fork and start writing unit tests
for various data structure handling functions (client_list, config,
AuthServer, pstring, etc.)

@acv

This comment has been minimized.

Contributor

acv commented Mar 25, 2015

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 26, 2015

@acv

This comment has been minimized.

Contributor

acv commented Mar 26, 2015

That looks promising!

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 26, 2015

Yeah, it works.. almost. The vlanmac interfaces do not show up in /proc/net/arp, so I am using a custom arp table. No big problem. Almost got it working ;)

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 26, 2015

There you go. It's crusty, but works: https://github.com/mhaas/wifidog-gateway/tree/load-tester - in contrib/!

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 26, 2015

What is interesting here.. commit 6d36e51 will always log in the IP/MAC and then log it out again in 1/3 of all runs. In this settings, something seems to block inside wifidog and fire_queries.py slows down, probably because it's running into timeouts.

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 26, 2015

Valgrind log: http://pastebin.ca/2964645 - killed with CTRL+C

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 29, 2015

Heh, looks like my script got something nice:

127.0.0.1 - - [29/Mar/2015 08:55:35] code 400, message Bad request syntax ('GET /auth/?stage=counters&ip=\x88to\xb7\x88to\xb7\x10\xb4q\tx\xb5q\t(\xbcq\t(\xbcq\t\xa0to\xb7\xa0to\xb7\xa8to\xb7\xa8to\xb7\xb0to\xb7\xb0to\xb7\xb8to\xb7\xb8to\xb7\xf8\xb8q\t\xf8\xb8q\t\x80\xbdq\t\x80\xbdq\t\xe0\xbcq\tX\xb3q\t\xd8to\xb7\xd8to\xb7\xe0to\xb7\xe0to\xb7\xe8to\xb7\xe8to\xb7\xf0to\xb7\xf0to\xb7\xf8to\xb7\xf8to\xb7&mac=x\xbcq\t1:f9:5c:39:f6&token=&incoming=0&outgoing=0&gw_id=FE78B6F39D50 HTTP/1.0')
127.0.0.1 - - [29/Mar/2015 08:55:35] "GET /auth/?stage=counters&ip=�to��to���q x�q (�q (�q �to��to��to��to��to��to��to��to���q ��q ��q ��q ��qX�q �to��to��to��to��to��to��to��to��to��to�&mac=x�q 1:f9:5c:39:f6&token=&incoming=0&outgoing=0&gw_id=FE78B6F39D50 HTTP/1.0" 400

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 29, 2015

I can only reproduce this if I build with clang and CFLAGS="-O2 -g3", and then only sometimes. Notably, with -fno-omit-frame-pointer I only get requests running into timeouts. This looks race-condition-ish.

@mhaas

This comment has been minimized.

Contributor

mhaas commented Mar 29, 2015

Meh, reproducing this is hard. It really only works with some flags, otherwise I apparently get a deadlock.

It seems that the client gets logged out properly on a manual requests and the client struct is deleted. In firewall.c, fw_sync_with_authserver subsequently re-uses the client and it crashes. I don't see the problem as all calls are properly locking the list.

I have now compiled with -fsanitize=address and that seems to give me some saner output.

@acv

This comment has been minimized.

Contributor

acv commented Mar 29, 2015

Yep, hence why I filed #84

@florida63

This comment has been minimized.

Contributor

florida63 commented Mar 29, 2015

On WIFIDOG-AUTH, the logout disconnecter a client (first) on the server. It transmits info at gateway only when counter update. This creates a certain latency.

Indeed, that poses problem if the client reconnect meantime because iptables rule are doubled.

@Existed

This comment has been minimized.

Existed commented Apr 11, 2015

Thanks, guys.
You're awesome. ;-)

@acv

This comment has been minimized.

Contributor

acv commented Apr 14, 2015

@Existed if you're trying the new code can you report back on impact on memory usage and any issue(s) you encounter? Always eager for feedback from people with higher volume installs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment