Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xl2tpd 1.3.1 refuse connection after some period of time #1

Closed
dkorzhevin opened this issue Aug 11, 2012 · 19 comments
Closed

xl2tpd 1.3.1 refuse connection after some period of time #1

dkorzhevin opened this issue Aug 11, 2012 · 19 comments

Comments

@dkorzhevin
Copy link

Package: xl2tpd
Version: 1.3.1+dfsg-1
Severity: normal

Hello,

I configured xl2tdp and it accept connection from Android phone, linux and mac os x. But. after some period of time (for example 1 hour) it just stop accept connections. I run xl2tpd in debug mode from server console: "xl2tpd -D" and catch some log when it refuses connections from mac os x:

root@domU-12-31-39-00-8A-6B:~# xl2tpd -D
xl2tpd[16379]: setsockopt recvref[30]: Protocol not available
xl2tpd[16379]: This binary does not support kernel L2TP.
xl2tpd[16379]: xl2tpd version xl2tpd-1.3.1 started on domU-12-31-39-00-8A-6B PID:16379
xl2tpd[16379]: Written by Mark Spencer, Copyright (C) 1998, Adtran, Inc.
xl2tpd[16379]: Forked by Scott Balmos and David Stipp, (C) 2001
xl2tpd[16379]: Inherited by Jeff McAdams, (C) 2002
xl2tpd[16379]: Forked again by Xelerance (www.xelerance.com) (C) 2006
xl2tpd[16379]: Listening on IP address 10.254.141.149, port 1701
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: control_finish: Peer requested tunnel 13 twice, ignoring second one.
xl2tpd[16379]: build_fdset: closing down tunnel 31268
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: control_finish: Peer requested tunnel 13 twice, ignoring second one.
xl2tpd[16379]: build_fdset: closing down tunnel 39311
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: control_finish: Peer requested tunnel 13 twice, ignoring second one.
xl2tpd[16379]: build_fdset: closing down tunnel 64416
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: Maximum retries exceeded for tunnel 40270. Closing.
xl2tpd[16379]: Connection 13 closed to 89.252.56.204, port 64843 (Timeout)
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: control_finish: Peer requested tunnel 13 twice, ignoring second one.
xl2tpd[16379]: build_fdset: closing down tunnel 44389
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: Unable to deliver closing message for tunnel 40270. Destroying anyway.
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: recv packet from 89.252.56.204, size = 85, tunnel = 0, call = 0 ref=0 refhim=0
xl2tpd[16379]: get_call: allocating new tunnel for host 89.252.56.204, port 64843.
xl2tpd[16379]: control_finish: Peer requested tunnel 13 twice, ignoring second one.
xl2tpd[16379]: build_fdset: closing down tunnel 43361
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: Maximum retries exceeded for tunnel 7393. Closing.
xl2tpd[16379]: Connection 13 closed to 89.252.56.204, port 64843 (Timeout)
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: network_thread: select timeout
xl2tpd[16379]: Unable to deliver closing message for tunnel 7393. Destroying anyway.

-- System Information:
Debian Release: wheezy/sid
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-0.bpo.2-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash

Versions of packages xl2tpd depends on:
ii libc6 2.13-31 Embedded GNU C Library: Shared lib
ii libpcap0.8 1.2.1-1 system interface for user-level pa
ii ppp 2.4.5-5 Point-to-Point Protocol (PPP) - da

xl2tpd recommends no packages.

xl2tpd suggests no packages.

-- Configuration Files:
/etc/init.d/xl2tpd changed:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/xl2tpd
NAME=xl2tpd
DESC=xl2tpd
test -x $DAEMON || exit 0
if [ -f /etc/default/xl2tpd ] ; then
. /etc/default/xl2tpd
fi
PIDFILE=/var/run/$NAME.pid
set -e
case "$1" in
start)
echo -n "Starting $DESC: "
test -d ${XL2TPD_RUN_DIR:-/var/run/xl2tpd} || mkdir -p ${XL2TPD_RUN_DIR:-/var/run/xl2tpd}
start-stop-daemon --start --quiet --pidfile $PIDFILE
--exec $DAEMON -- $DAEMON_OPTS
echo "$NAME."
;;
stop)
echo -n "Stopping $DESC: "
start-stop-daemon --oknodo --stop --quiet --pidfile $PIDFILE
--exec $DAEMON
echo "$NAME."
;;
force-reload)
test -d ${XL2TPD_RUN_DIR:-/var/run/xl2tpd} || mkdir -p ${XL2TPD_RUN_DIR:-/var/run/xl2tpd}
# check whether $DAEMON is running. If so, restart
start-stop-daemon --stop --test --quiet --pidfile
$PIDFILE --exec $DAEMON
&& $0 restart
|| exit 0
;;
restart)
test -d ${XL2TPD_RUN_DIR:-/var/run/xl2tpd} || mkdir -p ${XL2TPD_RUN_DIR:-/var/run/xl2tpd}
#rm -fv /var/log/xl2tpd//
echo -n "Restarting $DESC: "
start-stop-daemon --stop --quiet --pidfile
$PIDFILE --exec $DAEMON
sleep 1
start-stop-daemon --start --quiet --pidfile
$PIDFILE --exec $DAEMON -- $DAEMON_OPTS
echo "$NAME."
;;
*)
N=/etc/init.d/$NAME
echo "Usage: $N {start|stop|restart|force-reload}" >&2
exit 1
;;
esac
exit 0

/etc/xl2tpd/xl2tpd.conf changed:
[global] ; Global parameters:
debug network = yes
debug tunnel = yes
port = 1701 ; * Bind to port 1701
listen-addr = 176.9.1.119
auth file = /etc/xl2tpd/l2tp-secrets ; * Where our challenge secrets are
access control = no ; * Refuse connections without IP match
rand source = dev ; Source for entropy for random
[lns default] ; Our fallthrough LNS definition
exclusive = no ; * Only permit one tunnel per host
ip range = 10.3.1.2-10.3.2.255
local ip = 10.3.1.1
refuse authentication = yes ; * Refuse authentication altogether
refuse pap = yes ; * Refuse PAP authentication
refuse chap = yes
ppp debug = yes ; * Turn on PPP debugging
pppoptfile = /etc/ppp/options.l2tpd ; * ppp options file
name = granite.stidia.com
length bit = yes

-- no debconf information

I will provide any info to help fix this problem.

@dkorzhevin
Copy link
Author

I am also filled bugreport on Debian bugtracker:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=684156

@qianguozheng
Copy link

looks this bug have been fixed in http://www.xelerance.com, but now I cannot find xl2tpd project. the bug id is 1305, and I have bug patch for that , if you need contact me, guozhengqian0825@126.com

@dkorzhevin
Copy link
Author

Hello, qianguozheng

Are this bug id (1305) from Debian BTS? Can you please tell me more about your's patch?

@qianguozheng
Copy link

patches have been sent to you , it was from the opensouce

@fluential
Copy link

What is going on with this?

@xelerance
Copy link
Collaborator

I have not received the patch myself, so I cannot integrate it. I have just emailed qianguozheng to request it.

@xelerance
Copy link
Collaborator

I just pulled a patch by wxiaoguang into the testing branch I but I seriously doubt it would fix this problem. I will wait for qianguozheng to provide info.

Check the testing branch, just in case.

@qianguozheng
Copy link

have mailed to you, please have a check.

At 2012-12-26 05:57:07,xelerance notifications@github.com wrote:

I just pulled a patch by wxiaoguang into the testing branch I but I seriously doubt it would fix this problem. I will wait for qianguozheng to provide info.

Check the testing branch, just in case.


Reply to this email directly or view it on GitHub.

@fluential
Copy link

I am not sure if this problem is regarding xl2tpd, take a look at this:
https://www.openswan.org/issues/1263
That is exactly what I am experiencing, ideas?

@qianguozheng
Copy link

have you tried only restart xl2tpd to see if it works right ?

At 2012-12-29 21:16:23,MikeyMike notifications@github.com wrote:

I am not sure if this problem is regarding xl2tpd, take a look at this:
https://www.openswan.org/issues/1263
That is exactly what I am experiencing, ideas?


Reply to this email directly or view it on GitHub.

@fluential
Copy link

Ok finally I found some time and it seems that the problem is exactly as it is described here -> https://www.openswan.org/issues/1263, which means it is regarding openswan not xl2tpd.
I am using:
openswan 1:2.6.28+dfsg-5ubuntu2

Probably we can close this one down?

@fluential
Copy link

Also apparently via http://www.mentby.com/Group/openswan-users/disconnect-causes-failure.html - "Actual bug causing this was found and fixed. 2.6.29", I will try to upgrade and let you guys know.

@xelerance
Copy link
Collaborator

Below is the patch that was sent to me by qianguozheng. No disrespect but I do not like this patch as the same code is present in call.c to handle the same situation. I would prefer if we focused on finding out why xl2tpd.c/magic_lac_dial is not returning proper return codes and why call.c/destroy_call doesn't try looping on error.

Anyone want to try handling that ?

--- a/xl2tpd.c 2012-06-25 23:53:52.448963426 +0400
+++ b/xl2tpd.c 2012-06-26 00:03:05.808569736 +0400
@@ -605,8 +605,8 @@
if (t->lac->redial && (t->lac->rtimeout > 0) && !t->lac->rsched &&
t->lac->active)
{

  •        l2tp_log (LOG_INFO, "Will redial in %d seconds\n",
    
  •             t->lac->rtimeout);
    
  •        l2tp_log (LOG_INFO, "%s: Will redial in %d seconds\n",
    
  •             **FUNCTION**, t->lac->rtimeout);
         tv.tv_sec = t->lac->rtimeout;
         tv.tv_usec = 0;
         t->lac->rsched = schedule (tv, magic_lac_dial, t->lac);
    
    @@ -751,6 +751,7 @@
    void magic_lac_dial (void *data)
    {
    struct lac *lac;
  • struct timeval tv;
    lac = (struct lac *) data;
    if (!lac)
    {
    @@ -771,10 +772,23 @@
    }
    if (!lac->t)
    {
    -#ifdef DEGUG_MAGIC
    +#ifdef DEBUG_MAGIC
    l2tp_log (LOG_DEBUG, "%s : tunnel not up! Connecting!\n", FUNCTION);
    #endif
    magic_lac_tunnel (lac);
  •    if (!lac->t)
    
  •    {
    
  •        if (lac->redial && (lac->rtimeout > 0))
    
  •        {
    
    +#ifdef DEBUG_MAGIC
  •            l2tp_log (LOG_DEBUG, "%s: Will redial in %d seconds\n", 
    
  •                      **FUNCTION**, lac->rtimeout);
    
    +#endif
  •            tv.tv_sec = lac->rtimeout;
    
  •            tv.tv_usec = 0;
    
  •            lac->rsched = schedule (tv, magic_lac_dial, lac);
    
  •        }
    
  •    }
     return;
    
    }
    lac_call (lac->t->ourtid, lac, NULL);

@xelerance
Copy link
Collaborator

Any update ?

@xelerance
Copy link
Collaborator

We believe it is a Openswan problem. Closing until such time as we get an xl2tpd test case.

@fluential
Copy link

Ok, it seems that this is not openswan problem but rather xl2tpd, currently I have one tunnel up (tunnel #2) and some stalled rules in ip xfrm policy , when I am connecting from the ip which is part of the stalled rules I can see" Peer requested tunnel 1 twice, ignoring second one.", not sure what is exactly going on here, I was hoping openswan will clear up the rules via dead peer detection but clearly this is not the case. Any ideas? I will take a look at the debug log.

@fluential
Copy link

So the story goes like that:
Connection established to aa.bb.cc.dd.., 55001. Local: 38413, Remote: 4 (ref=0/0). LNS session is 'default'
PPP also setups properly endpoints:
local IP address 192.168.50.1
remote IP address 192.168.50.10

Openswan has dpd set to send packets every 5 seconds and timeout to 30 seconds with action clear.
All is going good:

Jan 25 06:12:06 koala pppd[1248]: sent [LCP EchoReq id=0x1 magic=0x41b1b7ab]
Jan 25 06:12:06 koala pppd[1248]: rcvd [LCP EchoRep id=0x1 magic=0x66bdd2fc]
Jan 25 06:12:36 koala pppd[1248]: sent [LCP EchoReq id=0x2 magic=0x41b1b7ab]
Jan 25 06:12:36 koala pppd[1248]: rcvd [LCP EchoRep id=0x2 magic=0x66bdd2fc]
(...)
Jan 25 07:31:11 koala pppd[1248]: sent [LCP EchoReq id=0x9f magic=0x41b1b7ab]
Jan 25 07:31:11 koala pppd[1248]: rcvd [LCP EchoRep id=0x9f magic=0x66bdd2fc]
Jan 25 07:31:41 koala pppd[1248]: sent [LCP EchoReq id=0xa0 magic=0x41b1b7ab]
Jan 25 07:31:41 koala xl2tpd[31889]: Maximum retries exceeded for tunnel 38413.  Closing.
Jan 25 07:31:42 koala xl2tpd[31889]: build_fdset: closing down tunnel 38413
Jan 25 07:31:42 koala pppd[1248]: Modem hangup
Jan 25 07:31:42 koala xl2tpd[31889]: Terminating pppd: sending TERM signal to pid 1248
Jan 25 07:31:42 koala xl2tpd[31889]: Connection 4 closed to 82.28.198.106, port 55001 (Timeout)
Jan 25 07:31:42 koala pppd[1248]: Connect time 80.1 minutes.
Jan 25 07:31:42 koala pppd[1248]: Sent 46285822 bytes, received 107178355 bytes.
Jan 25 07:31:42 koala pppd[1248]: Script /etc/ppp/ip-down started (pid 1499)
Jan 25 07:31:42 koala pppd[1248]: Connection terminated.
Jan 25 07:31:42 koala pppd[1248]: Script /etc/ppp/ip-down finished (pid 1499), status = 0x0
Jan 25 07:31:42 koala pppd[1248]: Exit.
Jan 25 07:31:42 koala xl2tpd[31889]: get_call: can't find call 20381 in tunnel 38413#012 (ref=0/0)
Jan 25 07:31:42 koala xl2tpd[31889]: network_thread: no such call 55766 on tunnel 0.  Sending special ZLB
(...)
Jan 25 07:31:47 koala xl2tpd[31889]: Unable to deliver closing message for tunnel 38413. Destroying anyway.
Jan 25 07:31:47 koala xl2tpd[31889]: get_call: can't find call 20381 in tunnel 38413#012 (ref=0/0)
Jan 25 07:31:47 koala xl2tpd[31889]: network_thread: no such call 55766 on tunnel 0.  Sending special ZLB
Jan 25 07:31:47 koala xl2tpd[31889]: build_fdset: closing down tunnel 38413
Jan 25 07:31:47 koala xl2tpd[31889]: Can not find tunnel 38413 (refhim=0)
Jan 25 07:31:47 koala xl2tpd[31889]: network_thread: unable to find call or tunnel to handle packet.  call = 20381, tunnel = 38413 Dumping.
Jan 25 07:31:48 koala xl2tpd[31889]: Can not find tunnel 38413 (refhim=0)
Jan 25 07:31:48 koala xl2tpd[31889]: network_thread: unable to find call or tunnel to handle packet.  call = 20381, tunnel = 38413 Dumping.
(...)
Jan 25 07:32:56 koala xl2tpd[31889]: network_thread: unable to find call or tunnel to handle packet.  call = 20381, tunnel = 38413 Dumping.
Jan 25 07:32:56 koala xl2tpd[31889]: Can not find tunnel 38413 (refhim=0)
Jan 25 07:32:56 koala xl2tpd[31889]: network_thread: unable to find call or tunnel to handle packet.  call = 0, tunnel = 38413 Dumping.

---- THIS IS WHERE ALL THE RELATED LOG MESSAGED ENDS ---
I presume that above messages were caused when user tried to reconnect, at 7:32 he stopped trying (but that is my guess only)

I can see entries regarding this particular tunnel in ip xfrm policy which is a cause for user unable to connect.
Ideas?

@elusive-code
Copy link

I do believe that this is not an openswan problem too, since I'm using strongswan.
Actually the whole setup was done using Streisand on DigitalOcean, so it can be reproduced that way, and you see can how it is configured here:
https://github.com/jlund/streisand/tree/master/playbooks/roles/l2tp-ipsec

I also tried xl2tpd/1.3.6+dfsg-2 from Debian testing, the problem still occurs.

Aug  3 06:58:48 streisand pppd[9021]: rcvd [LCP EchoRep id=0x75 magic=0x68a817bd]
Aug  3 06:59:48 streisand pppd[9021]: sent [LCP EchoReq id=0x76 magic=0x80e977bc]
Aug  3 06:59:48 streisand pppd[9021]: rcvd [LCP EchoRep id=0x76 magic=0x68a817bd]
Aug  3 07:00:48 streisand pppd[9021]: sent [LCP EchoReq id=0x77 magic=0x80e977bc]
Aug  3 07:00:48 streisand pppd[9021]: rcvd [LCP EchoRep id=0x77 magic=0x68a817bd]
Aug  3 07:01:48 streisand xl2tpd[7526]: Maximum retries exceeded for tunnel 57835.  Closing.
Aug  3 07:01:48 streisand xl2tpd[7526]: Terminating pppd: sending TERM signal to pid 9021
Aug  3 07:01:48 streisand xl2tpd[7526]: Connection 21 closed to ###.###.###.###, port 1701 (Timeout)
Aug  3 07:01:48 streisand pppd[9021]: Terminating on signal 15
Aug  3 07:01:48 streisand pppd[9021]: Modem hangup
Aug  3 07:01:48 streisand pppd[9021]: Connect time 120.1 minutes.
Aug  3 07:01:48 streisand pppd[9021]: Sent 243938750 bytes, received 15028994 bytes.
Aug  3 07:01:48 streisand charon: 10[KNL] interface ppp0 deactivated
Aug  3 07:01:48 streisand charon: 10[KNL] 10.9.9.1 disappeared from ppp0
Aug  3 07:01:48 streisand pppd[9021]: Script /etc/ppp/ip-down started (pid 9275)
Aug  3 07:01:48 streisand pppd[9021]: Connection terminated.
Aug  3 07:01:48 streisand pppd[9021]: Script /etc/ppp/ip-down finished (pid 9275), status = 0x0
Aug  3 07:01:48 streisand pppd[9021]: Exit.
Aug  3 07:01:50 streisand ntpd[2697]: Deleting interface #18 ppp0, 10.9.9.1#123, interface stats: received=0, sent=0, dropped=0, active_time=7205 secs
Aug  3 07:01:50 streisand ntpd[2697]: peers refreshed
Aug  3 07:01:53 streisand xl2tpd[7526]: Unable to deliver closing message for tunnel 57835. Destroying anyway.
Aug  3 07:02:06 streisand xl2tpd[7526]: Can not find tunnel 57835 (refhim=0)
Aug  3 07:02:06 streisand xl2tpd[7526]: network_thread: unable to find call or tunnel to handle packet.  call = 33340, tunnel = 57835 Dumping.

Also I'm using mobile connection, so some packet loss might be involved.

yousong added a commit to yousong/xl2tpd that referenced this issue Aug 12, 2015
`dial_no_tmp` should not be freed by close of a single tunnel since it is a
global storage for all tunnels both now present and future ones.

From the following log we know destroy_tunnel() was called twice and the 2nd
one trying to free dial_on_tmp is a double-free.

  <...>
  Wed Aug 12 02:12:00 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 60302
  Wed Aug 12 02:12:00 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 0
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 27590
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 1
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 63807
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: Unable to deliver closing message for tunnel 63807. Destroying anyway.
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: call_close: Actually closing tunnel 63807
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 60302
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 0
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 27590
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: Unable to deliver closing message for tunnel 27590. Destroying anyway.
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: call_close: Actually closing tunnel 27590
  <SEGFAULT>

This GDB session showed that musl-libc intentionally caused a memory access
segment to indicate double-free [1].

  (gdb) bt
  #0  0x7729e9a4 in free () from /lib/ld-musl-mipsel-sf.so.1
  xelerance#1  0x00401f26 in destroy_tunnel (t=0x77274840) at xl2tpd.c:695
  xelerance#2  0x00407e44 in call_close (c=0xc076a0) at call.c:273
  xelerance#3  0x0040873c in control_xmit (b=0x77274cc0) at network.c:260
  xelerance#4  0x00409600 in process_schedule (ptv=ptv@entry=0x7feae89c) at scheduler.c:47
  xelerance#5  0x004088e8 in network_thread () at network.c:457
  xelerance#6  0x00401338 in main (argc=<optimized out>, argv=<optimized out>) at xl2tpd.c:1882
  (gdb) info registers
            zero       at       v0       v1       a0       a1       a2       a3
   R0   00000000 7fead661 00000090 00000001 77274007 00000151 00000032 ffffffee
              t0       t1       t2       t3       t4       t5       t6       t7
   R8   00000000 80c00730 80c00728 32303330 7feae1c8 77322320 2e333031 00000090
              s0       s1       s2       s3       s4       s5       s6       s7
   R16  772740c8 00c076a0 7731b568 7731d0e0 77315000 77294ffc 7feaeb04 77315000
              t8       t9       k0       k1       gp       sp       s8       ra
   R24  00422ff8 7729e930 00000000 00000000 77322320 7feae720 00000000 00401f27
          status       lo       hi badvaddr    cause       pc
        0000b713 000d6585 00000673 00000000 0080000c 7729e9a4
            fcsr      fir  restart
        00000000 00739300 00000000
  (gdb) x/4i 0x7729e9a4
  => 0x7729e9a4 <free+116>:       sb      zero,0(zero)
     0x7729e9a8 <free+120>:       lw      ra,68(sp)
     0x7729e9ac <free+124>:       lw      s8,64(sp)
     0x7729e9b0 <free+128>:       lw      s7,60(sp)

[1] simplify and improve double-free check,
    http://git.musl-libc.org/cgit/musl/commit/?id=ce7c6341d38ecd3af4d1e01032e9ea8b4078aa97

Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
yousong added a commit to yousong/xl2tpd that referenced this issue Aug 12, 2015
`dial_no_tmp` should not be freed by close of a single tunnel since it is a
global storage for all tunnels both now present and future ones.

From the following log we know destroy_tunnel() was called twice and the 2nd
one trying to free dial_on_tmp is a double-free.

  <...>
  Wed Aug 12 02:12:00 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 60302
  Wed Aug 12 02:12:00 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 0
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 27590
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 1
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 63807
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: Unable to deliver closing message for tunnel 63807. Destroying anyway.
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: call_close: Actually closing tunnel 63807
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 60302
  Wed Aug 12 02:12:01 2015 daemon.debug xl2tpd[5711]: control_xmit: Scheduling and transmitting packet 0
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: trying to send control packet to 27590
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: Unable to deliver closing message for tunnel 27590. Destroying anyway.
  Wed Aug 12 02:12:02 2015 daemon.debug xl2tpd[5711]: call_close: Actually closing tunnel 27590
  <SEGFAULT>

This GDB session showed that musl-libc intentionally caused a memory access
segfault to indicate double-free error [1].

  (gdb) bt
  #0  0x7729e9a4 in free () from /lib/ld-musl-mipsel-sf.so.1
  xelerance#1  0x00401f26 in destroy_tunnel (t=0x77274840) at xl2tpd.c:695
  xelerance#2  0x00407e44 in call_close (c=0xc076a0) at call.c:273
  xelerance#3  0x0040873c in control_xmit (b=0x77274cc0) at network.c:260
  xelerance#4  0x00409600 in process_schedule (ptv=ptv@entry=0x7feae89c) at scheduler.c:47
  xelerance#5  0x004088e8 in network_thread () at network.c:457
  xelerance#6  0x00401338 in main (argc=<optimized out>, argv=<optimized out>) at xl2tpd.c:1882
  (gdb) info registers
            zero       at       v0       v1       a0       a1       a2       a3
   R0   00000000 7fead661 00000090 00000001 77274007 00000151 00000032 ffffffee
              t0       t1       t2       t3       t4       t5       t6       t7
   R8   00000000 80c00730 80c00728 32303330 7feae1c8 77322320 2e333031 00000090
              s0       s1       s2       s3       s4       s5       s6       s7
   R16  772740c8 00c076a0 7731b568 7731d0e0 77315000 77294ffc 7feaeb04 77315000
              t8       t9       k0       k1       gp       sp       s8       ra
   R24  00422ff8 7729e930 00000000 00000000 77322320 7feae720 00000000 00401f27
          status       lo       hi badvaddr    cause       pc
        0000b713 000d6585 00000673 00000000 0080000c 7729e9a4
            fcsr      fir  restart
        00000000 00739300 00000000
  (gdb) x/4i 0x7729e9a4
  => 0x7729e9a4 <free+116>:       sb      zero,0(zero)
     0x7729e9a8 <free+120>:       lw      ra,68(sp)
     0x7729e9ac <free+124>:       lw      s8,64(sp)
     0x7729e9b0 <free+128>:       lw      s7,60(sp)

[1] simplify and improve double-free check,
    http://git.musl-libc.org/cgit/musl/commit/?id=ce7c6341d38ecd3af4d1e01032e9ea8b4078aa97

Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
@ghost
Copy link

ghost commented Sep 23, 2015

I'm having the same issue with @fluential.

I'm having ec2 ubuntu instance running. The xl2tp connection hangs after couple of seconds after connection.

Can anyone out there offer some help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants