Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xl2tpd 1.3.1 possible memory leak #23

Closed
dkorzhevin opened this issue May 14, 2013 · 18 comments
Closed

xl2tpd 1.3.1 possible memory leak #23

dkorzhevin opened this issue May 14, 2013 · 18 comments

Comments

@dkorzhevin
Copy link

Hi,

I use xl2tpd-1.3.1 on Debian 6.0.7 x86_64 and it uses too much memory (already 1199808 kB)

On another server - about 6 Gb of memory, before server hangs.

Here is some info:

xl2tpd version: xl2tpd-1.3.1

pmap -x 2413
2413: /usr/sbin/xl2tpd
Address Kbytes RSS Dirty Mode Mapping
0000000000400000 0 92 0 r-x-- xl2tpd
000000000061a000 0 4 4 rw--- xl2tpd
000000000061b000 0 12 12 rw--- [ anon ]
0000000001b16000 0 1187564 1187564 rw--- [ anon ]
00007f2472d36000 0 16 0 r-x-- libnss_files-2.11.3.so
00007f2472d42000 0 0 0 ----- libnss_files-2.11.3.so
00007f2472f41000 0 4 4 r---- libnss_files-2.11.3.so
00007f2472f42000 0 4 4 rw--- libnss_files-2.11.3.so
00007f2472f43000 0 20 0 r-x-- libnss_nis-2.11.3.so
00007f2472f4d000 0 0 0 ----- libnss_nis-2.11.3.so
00007f247314c000 0 4 4 r---- libnss_nis-2.11.3.so
00007f247314d000 0 4 4 rw--- libnss_nis-2.11.3.so
00007f247314e000 0 24 0 r-x-- libnsl-2.11.3.so
00007f2473163000 0 0 0 ----- libnsl-2.11.3.so
00007f2473362000 0 4 4 r---- libnsl-2.11.3.so
00007f2473363000 0 4 4 rw--- libnsl-2.11.3.so
00007f2473364000 0 0 0 rw--- [ anon ]
00007f2473366000 0 12 0 r-x-- libnss_compat-2.11.3.so
00007f247336d000 0 0 0 ----- libnss_compat-2.11.3.so
00007f247356c000 0 4 4 r---- libnss_compat-2.11.3.so
00007f247356d000 0 4 4 rw--- libnss_compat-2.11.3.so
00007f247356e000 0 448 0 r-x-- libc-2.11.3.so
00007f24736c7000 0 0 0 ----- libc-2.11.3.so
00007f24738c6000 0 16 16 r---- libc-2.11.3.so
00007f24738ca000 0 4 4 rw--- libc-2.11.3.so
00007f24738cb000 0 20 20 rw--- [ anon ]
00007f24738d0000 0 92 0 r-x-- ld-2.11.3.so
00007f2473adf000 0 12 12 rw--- [ anon ]
00007f2473aeb000 0 8 8 rw--- [ anon ]
00007f2473aed000 0 4 4 r---- ld-2.11.3.so
00007f2473aee000 0 4 4 rw--- ld-2.11.3.so
00007f2473aef000 0 4 4 rw--- [ anon ]
00007fff0d0c3000 0 16 16 rw--- [ stack ]
00007fff0d1ff000 0 4 0 r-x-- [ anon ]
ffffffffff600000 0 0 0 r-x-- [ anon ]


total kB 1199808 1188408 1187700

xl2tpd.conf:

[global]
port = 1701
[lns default]
ip range = 10.3.0.2-255
assign ip = yes
local ip = 10.3.0.1
require authentication = yes
pppoptfile = /etc/ppp/options.xl2tpd

options.xl2tpd:

ms-dns 8.8.8.8
ms-dns 8.8.4.4

nomppe

require-mppe-128

+mschap-v2
+mschap
debug
plugin radius.so
plugin radattr.so

I can provide any other info to help debug/solve this problem, maby using valgrind. This is very important.

@jimdigriz
Copy link

Running 37a7cc3 I get a honking memory leak, 1.6GB for xl2tpd....nice. Was running for about four days. Affects both our LNS and LAC installations.

Here have some valgrind lovin'

xl2tpd[6042]: death_handler: Fatal signal 2 received
xl2tpd[6042]: Terminating pppd: sending TERM signal to pid 12199
xl2tpd[6042]: Connection 47859 closed to 1.2.3.4, port 1701 (Server closing)
xl2tpd[6042]: Terminating pppd: sending TERM signal to pid 12192
xl2tpd[6042]: Connection 65441 closed to 2.3.4.5, port 1701 (Server closing)
xl2tpd[6042]: Terminating pppd: sending TERM signal to pid 6200
xl2tpd[6042]: Connection 36888 closed to 2.3.4.5, port 1701 (Server closing)
xl2tpd[6042]: Terminating pppd: sending TERM signal to pid 9614
xl2tpd[6042]: Connection 28227 closed to 1.2.3.4, port 1701 (Server closing)
==6042==
==6042== HEAP SUMMARY:
==6042== in use at exit: 1,404,854,988 bytes in 674,307 blocks
==6042== total heap usage: 1,543,605 allocs, 869,298 frees, 3,120,653,130 bytes allocated
==6042==
==6042== 60 bytes in 1 blocks are indirectly lost in loss record 1 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40CEC4: new_call (call.c:557)
==6042== by 0x40BA47: message_type_avp (avp.c:322)
==6042== by 0x40C101: handle_avps (avp.c:1736)
==6042== by 0x40930D: handle_packet (control.c:1778)
==6042== by 0x40DE98: network_thread (network.c:637)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 72 bytes in 3 blocks are still reachable in loss record 2 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x4108AC: set_range (file.c:881)
==6042== by 0x410AFE: set_lac (file.c:982)
==6042== by 0x410FB6: parse_one_option (file.c:1474)
==6042== by 0x4111A8: parse_config (file.c:1450)
==6042== by 0x411690: init_config (file.c:74)
==6042== by 0x404953: init (xl2tpd.c:1494)
==6042== by 0x401D98: main (xl2tpd.c:1555)
==6042==
==6042== 128 bytes in 1 blocks are still reachable in loss record 3 of 18
==6042== at 0x4C272B8: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x401DA7: main (xl2tpd.c:1556)
==6042==
==6042== 576 bytes in 8 blocks are still reachable in loss record 4 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40511E: new_buf (misc.c:83)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 624 bytes in 1 blocks are still reachable in loss record 5 of 18
==6042== at 0x4C272B8: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x41166A: init_config (file.c:53)
==6042== by 0x404953: init (xl2tpd.c:1494)
==6042== by 0x401D98: main (xl2tpd.c:1555)
==6042==
==6042== 1,632 bytes in 3 blocks are still reachable in loss record 6 of 18
==6042== at 0x4C272B8: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40F55F: new_lns (file.c:83)
==6042== by 0x41141E: parse_config (file.c:1356)
==6042== by 0x411690: init_config (file.c:74)
==6042== by 0x404953: init (xl2tpd.c:1494)
==6042== by 0x401D98: main (xl2tpd.c:1555)
==6042==
==6042== 1,800 bytes in 25 blocks are possibly lost in loss record 7 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40511E: new_buf (misc.c:83)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 2,100 bytes in 35 blocks are indirectly lost in loss record 8 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40CEC4: new_call (call.c:557)
==6042== by 0x4035BB: new_tunnel (xl2tpd.c:877)
==6042== by 0x40D0B9: get_call (call.c:672)
==6042== by 0x40DE3B: network_thread (network.c:597)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 2,340 bytes in 39 blocks are definitely lost in loss record 9 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40CEC4: new_call (call.c:557)
==6042== by 0x40BA47: message_type_avp (avp.c:322)
==6042== by 0x40C101: handle_avps (avp.c:1736)
==6042== by 0x40930D: handle_packet (control.c:1778)
==6042== by 0x40DE98: network_thread (network.c:637)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 4,096 bytes in 1 blocks are indirectly lost in loss record 10 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x405139: new_buf (misc.c:87)
==6042== by 0x40DABA: network_thread (network.c:447)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 4,168 (72 direct, 4,096 indirect) bytes in 1 blocks are definitely lost in loss record 11 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40511E: new_buf (misc.c:83)
==6042== by 0x40DABA: network_thread (network.c:447)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 6,960 bytes in 116 blocks are definitely lost in loss record 12 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40CEC4: new_call (call.c:557)
==6042== by 0x4035BB: new_tunnel (xl2tpd.c:877)
==6042== by 0x40D0B9: get_call (call.c:672)
==6042== by 0x40DE3B: network_thread (network.c:597)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 8,336 (4,096 direct, 4,240 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x405139: new_buf (misc.c:87)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 45,056 bytes in 11 blocks are still reachable in loss record 14 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x405139: new_buf (misc.c:87)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 84,096 bytes in 1,168 blocks are indirectly lost in loss record 15 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40511E: new_buf (misc.c:83)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 3,764,224 bytes in 919 blocks are possibly lost in loss record 16 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x405139: new_buf (misc.c:87)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 1,376,755,712 bytes in 336,122 blocks are indirectly lost in loss record 17 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x405139: new_buf (misc.c:87)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== 1,401,019,072 (24,181,344 direct, 1,376,837,728 indirect) bytes in 335,852 blocks are definitely lost in loss record 18 of 18
==6042== at 0x4C28BED: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6042== by 0x40511E: new_buf (misc.c:83)
==6042== by 0x40E1D9: network_thread (network.c:620)
==6042== by 0x401DB5: main (xl2tpd.c:1557)
==6042==
==6042== LEAK SUMMARY:
==6042== definitely lost: 24,194,812 bytes in 336,009 blocks
==6042== indirectly lost: 1,376,846,064 bytes in 337,327 blocks
==6042== possibly lost: 3,766,024 bytes in 944 blocks
==6042== still reachable: 48,088 bytes in 27 blocks
==6042== suppressed: 0 bytes in 0 blocks
==6042==
==6042== For counts of detected and suppressed errors, rerun with: -v
==6042== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 4 from 4)

@xelerance
Copy link
Collaborator

Can you try valgrind with 95445f to see if it appears there too ? If not, then you and I can start bisecting this problem to see the commit that introduced it.

@jimdigriz
Copy link

Knew there was something else I wanted to mention, I do not see the problem on 1.3.1+dfsg-1 (Debian wheezy's ) release. So somewhere between and 95445f is the problem. So already tested... :)

@xelerance
Copy link
Collaborator

Can you give me the instructions to test the same way you are testing ? I will then test between 37a7cc3 and v1.3.6

@jimdigriz
Copy link

Nothing exotic in our configuration (need to use the latest git tree though as we are doing multipath'd multilink PPP to bundle the throughput of our two uplinks):

[global]
debug state = yes
debug tunnel = yes
force userspace = yes

[lns london]
lac = 1.2.3.4
exclusive = no
assign ip = no
local ip = 10.90.0.0
pppoptfile = /etc/ppp/options.l2tpd.hed

[lns usa]
lac = 4.3.2.1
exclusive = no
assign ip = no
local ip = 10.90.0.0
pppoptfile = /etc/ppp/options.l2tpd.hed

[lns ny]
lac = 5.6.7.8
exclusive = no
assign ip = no
local ip = 10.90.0.0
pppoptfile = /etc/ppp/options.l2tpd.hed

Our PPP options file is just:

debug

noauth

nodefaultroute
mtu 1300
mru 1300
lock

novj
novjccomp
nopcomp
nodeflate
noccp

lcp-echo-interval 5
lcp-echo-failure 3

multilink
mrru 1300

This is wrapped up all in a rather standard IPsec transport:

conn common
        keyexchange=ikev2
        type=transport
        authby=psk
        reauth=no
        auto=route
        forceencaps=yes

conn london1
        also=common
        left=....
        right=....

conn london2
       also=common
       left=....
       right=....

There is definately something that triggers xl2tpd to start leaking memory as sometimes it runs for a day or so with no problems, then starts to grow, other times (like now) our LNS has been running fault free for over ten days and using only ~4MB of RAM.

I suspect that the IPsec layer locks up and after a while naturally recovers, but of course xl2tpd is probably receiving a whole pile of errno's when sendmsg()'ing during the period.

For reference, our LAC config is just:

[global]
ipsec saref = yes
force userspace = yes

[lac foobar]
lns = 1.2.3.4
local ip = 10.90.0.1
redial = yes
redial timeout = 15
max redials = 10000
pppoptfile = /etc/ppp/options.l2tpd.hed

So, nothing exciting.

However, I am convinced the RAM leak kicks off when there is a network interuption (probably an IPsec less setup would rarely if ever show the effect).

@xelerance
Copy link
Collaborator

I have a patch ready against 1.3.6. I will create a devel branch and push my patch there. I will also tag a devel version. Would you be willing to test it ?

@jimdigriz
Copy link

Sure!

@xelerance
Copy link
Collaborator

Please test v1.3.7dev1 and report back.

I don't expect all leaks to be gone, but I think I got the main loop one, so that should be 90% of your report.

@jimdigriz
Copy link

The LAC is grumbling...

(gdb) run
Starting program: /usr/sbin/xl2tpd -D
xl2tpd[8452]: Enabling IPsec SAref processing for L2TP transport mode SAs
xl2tpd[8452]: IPsec SAref does not work with L2TP kernel mode yet, enabling force userspace=yes
xl2tpd[8452]: setsockopt recvref[30]: Protocol not available
xl2tpd[8452]: Not looking for kernel support.
xl2tpd[8452]: xl2tpd version xl2tpd-1.3.6 started on vpn-2-lcy PID:8452
xl2tpd[8452]: Written by Mark Spencer, Copyright (C) 1998, Adtran, Inc.
xl2tpd[8452]: Forked by Scott Balmos and David Stipp, (C) 2001
xl2tpd[8452]: Inherited by Jeff McAdams, (C) 2002
xl2tpd[8452]: Forked again by Xelerance (www.xelerance.com) (C) 2006
xl2tpd[8452]: Listening on IP address 0.0.0.0, port 1701
xl2tpd[8452]: Connecting to host 79.173.146.162, port 1701
xl2tpd[8452]: Maximum retries exceeded for tunnel 46125.  Closing.
xl2tpd[8452]: Connection 0 closed to 79.173.146.162, port 1701 (Timeout)
xl2tpd[8452]: Unable to deliver closing message for tunnel 46125. Destroying anyway.
xl2tpd[8452]: Will redial in 15 seconds
xl2tpd[8452]: Connecting to host 79.173.146.162, port 1701
xl2tpd[8452]: Maximum retries exceeded for tunnel 38794.  Closing.
xl2tpd[8452]: Connection 0 closed to 79.173.146.162, port 1701 (Timeout)
xl2tpd[8452]: Unable to deliver closing message for tunnel 38794. Destroying anyway.
xl2tpd[8452]: Will redial in 15 seconds
xl2tpd[8452]: Connecting to host 79.173.146.162, port 1701

Program received signal SIGSEGV, Segmentation fault.
schedule (tv=..., func=func@entry=0x40d5f0 <control_xmit>, data=data@entry=0x61f410) at scheduler.c:101
101     scheduler.c: No such file or directory.
(gdb) where
#0  schedule (tv=..., func=func@entry=0x40d5f0 <control_xmit>, data=data@entry=0x61f410) at scheduler.c:101
#1  0x000000000040d6e5 in control_xmit (b=0x61f410) at network.c:271
#2  0x0000000000406a2e in control_finish (t=0x621120, c=c@entry=0x6215c0) at control.c:793
#3  0x0000000000402f42 in l2tp_call (host=0x61fdb0 "79.173.146.162", port=<optimized out>, lac=0x61fb30, lns=0x0)
    at xl2tpd.c:695
#4  0x000000000040ed97 in process_schedule (ptv=ptv@entry=0x7fffffffe3e0) at scheduler.c:47
#5  0x000000000040d8ec in network_thread () at network.c:457
#6  0x0000000000401be6 in main (argc=<optimized out>, argv=<optimized out>) at xl2tpd.c:1558

@jimdigriz
Copy link

meanwhile, the LNS side is just as unhappy...

xl2tpd[15727]: check_control: Received out of order control packet on tunnel 58837 (got 6, expected 7)
xl2tpd[15727]: handle_packet: bad control packet!
xl2tpd[15727]: network_thread: bad packet
xl2tpd[15727]: check_control: Received out of order control packet on tunnel 15095 (got 6, expected 7)
xl2tpd[15727]: handle_packet: bad control packet!
xl2tpd[15727]: network_thread: bad packet
xl2tpd[15727]: control_finish: message type is Stop-Control-Connection-Notification(4).  Tunnel is 58837, call is 0.
xl2tpd[15727]: control_finish: Connection closed to 67.23.55.73, port 1701 (Timeout), Local: 56000, Remote: 58837
xl2tpd[15727]: build_fdset: closing down tunnel 56000
*** glibc detected *** /usr/sbin/xl2tpd: double free or corruption (!prev): 0x000000000061f290 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76d76)[0x7ffff7ac9d76]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c)[0x7ffff7aceaac]
/usr/sbin/xl2tpd[0x40d7cd]
/usr/sbin/xl2tpd[0x40d8e0]
/usr/sbin/xl2tpd[0x401be6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7ffff7a71ead]
/usr/sbin/xl2tpd[0x401c19]
======= Memory map: ========
00400000-0041b000 r-xp 00000000 fd:02 17294                              /usr/sbin/xl2tpd
0061b000-0061c000 rw-p 0001b000 fd:02 17294                              /usr/sbin/xl2tpd
0061c000-00640000 rw-p 00000000 00:00 0                                  [heap]
7ffff0000000-7ffff0021000 rw-p 00000000 00:00 0
7ffff0021000-7ffff4000000 ---p 00000000 00:00 0
7ffff783d000-7ffff7852000 r-xp 00000000 fd:00 20                         /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7852000-7ffff7a52000 ---p 00015000 fd:00 20                         /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a52000-7ffff7a53000 rw-p 00015000 fd:00 20                         /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a53000-7ffff7bd3000 r-xp 00000000 fd:00 56                         /lib/x86_64-linux-gnu/libc-2.13.so
7ffff7bd3000-7ffff7dd3000 ---p 00180000 fd:00 56                         /lib/x86_64-linux-gnu/libc-2.13.so
7ffff7dd3000-7ffff7dd7000 r--p 00180000 fd:00 56                         /lib/x86_64-linux-gnu/libc-2.13.so
7ffff7dd7000-7ffff7dd8000 rw-p 00184000 fd:00 56                         /lib/x86_64-linux-gnu/libc-2.13.so
7ffff7dd8000-7ffff7ddd000 rw-p 00000000 00:00 0
7ffff7ddd000-7ffff7dfd000 r-xp 00000000 fd:00 59                         /lib/x86_64-linux-gnu/ld-2.13.so
7ffff7ff1000-7ffff7ff4000 rw-p 00000000 00:00 0
7ffff7ff8000-7ffff7ffa000 rw-p 00000000 00:00 0
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 0001f000 fd:00 59                         /lib/x86_64-linux-gnu/ld-2.13.so
7ffff7ffd000-7ffff7ffe000 rw-p 00020000 fd:00 59                         /lib/x86_64-linux-gnu/ld-2.13.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Program received signal SIGABRT, Aborted.
0x00007ffff7a85475 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  0x00007ffff7a85475 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff7a886f0 in *__GI_abort () at abort.c:92
#2  0x00007ffff7ac052b in __libc_message (do_abort=<optimized out>, fmt=<optimized out>)
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00007ffff7ac9d76 in malloc_printerr (action=3, str=0x7ffff7ba2248 "double free or corruption (!prev)",
    ptr=<optimized out>) at malloc.c:6283
#4  0x00007ffff7aceaac in *__GI___libc_free (mem=<optimized out>) at malloc.c:3738
#5  0x000000000040d7cd in build_fdset (readfds=readfds@entry=0x7fffffffdf00) at network.c:403
#6  0x000000000040d8e0 in network_thread () at network.c:456
#7  0x0000000000401be6 in main (argc=<optimized out>, argv=<optimized out>) at xl2tpd.c:1558

@xelerance
Copy link
Collaborator

Dammit, I really thought I had it licked. The problem is that the code never free()s at the same level as the malloc(), so it's convoluted as hell. Sometimes I wonder why I just don't recode this whole thing using Python...

Historically, it's been nice to have this in C to keep it all nice and small for routers, but show me a router that doesn't have 64MB of RAM now.

OK, I will remove each extra free() I put in and give you a nudge for testing.

@HouzuoGuo
Copy link

=D shall we look forward to 1.3.7 very soon?

@dkorzhevin
Copy link
Author

Hi,

I confirm that i also can help with testing and debug

@jimdigriz
Copy link

For those wanting something that does not kill your boxen, simply dump xl2tpd under runit and go back to drinking beer:

aclouter@vpn-2-lcy:~# cat /etc/service/xl2tpd/run
#!/bin/sh

set -eu

# limit RAM to xl2tpd (128MB is enough for some ppp/ntlm action)
# https://github.com/xelerance/xl2tpd/issues/23
MEM=$((128*1024*1024))

mkdir -p /var/run/xl2tpd

exec 2>&1
exec chpst -m $MEM /usr/sbin/xl2tpd -D -c /etc/xl2tpd/xl2tpd.conf

@zorun
Copy link

zorun commented Jan 28, 2015

I'm also seeing a memory leak on OpenWRT, version 1.3.6-5619e1771048e74b729804e8602f409af0f3faea, where xl2tpd is running as LAC.

It does not seem to be a huge memory leak, because it takes several days for xl2tpd to get OOM-killed (at which point it uses 16 MB of RAM, out of the 32 MB available on the router).

@mcr
Copy link
Collaborator

mcr commented Jan 28, 2015

Can you ulimit it to a smaller amount of ram so that it gets a malloc failure? That way we can better see what object it is trying to allocate. How have you compiled it?
When you say it is running an OpenWRT as a LAC, I think that you mean that it is connecting to another concentrator as a client. Is it running over IPsec? Does it reconnect a lot? Can you capture the logs and attach them?

@zorun
Copy link

zorun commented Jan 29, 2015

Good idea, I've now limited xl2tpd to 7 MB of RAM, which is about twice as much as it takes normally.

It's an OpenWRT router with a MIPS CPU, and I'm simply using the package for OpenWRT Barrier Breaker: http://downloads.openwrt.org/barrier_breaker/14.07/ar71xx/generic/packages/packages/

It is indeed used as a PPP/L2TP client, without IPsec. You're right, the last OOM-kill happened just after the LNS was not reachable (huge packet loss).

@shussain
Copy link
Collaborator

Closing this ticket since user has work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants