Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcon hub recv buffer #231

Closed
bdreisbach opened this issue Jul 30, 2020 · 9 comments
Closed

xcon hub recv buffer #231

bdreisbach opened this issue Jul 30, 2020 · 9 comments

Comments

@bdreisbach
Copy link

i was having issues when using xcon in hub mode if there was too much traffic on the hub. the packets would get eaten and the bgp sessions would go down with hold time expired. upping the recieve buffer to 16384 fixed my issue. not sure if anyone else cares about this, if there is interest i can submit a MR.

--- a/vr-xcon/xcon.py
+++ b/vr-xcon/xcon.py
@@ -360,7 +360,7 @@ class TcpHub:
             for i in ir:

                 try:
-                    buf = i.recv(2048)
+                    buf = i.recv(16384)
                 except ConnectionResetError as exc:
                     self.logger.warning("connection dropped, reconnecting to source %s" % self.socket2hostintf[i])
                     try:
@bdreisbach
Copy link
Author

i may have jumped the gun here...this helped but didnt fully fix my problem.

@bdreisbach
Copy link
Author

upping the value to 65535 seems to work(for now). fwiw, the trigger seems to be when my script collects the received routes from the vr-bgp instances.

@plajjan
Copy link
Collaborator

plajjan commented Aug 3, 2020

The hub is implemented in python and probably not in the most efficient way possible... so I wouldn't be surprised if it can't keep up but it shouldn't hang completely (as you indicated per mail). Did you verify that by ping or so? If you run with --debug and it prints packets to log, can you then observe that it hangs when you do this large RIB fetch?

Do you do your RIB fetch in-band? I interact with the vr-bgp API out-of-band, i.e. across docker0 (or whatever network you have underneat) whereas only the BGP traffic is in-band across the vr-xcon overlay. If you are doing it in-band, can you change to out-of-band, so the RIB fetch itself is not going across the hub? Just to try out at least (should be easy to repeat by just curling the API, right!?). How large is the RIB? Full table?

topo-machine uses a single vr-xcon to run all connectivity in the entire topology. You could instead divide it up, so that you have one vr-xcon instance per hub and even one per p2p connection. That should just be a few line changes to topo-machine. If vr-xcon actually hangs, that must be a bug and separating it over multiple instances would probably just be a workaround rather than a true fix. Nonetheless, it could get you up and running again. It might also be that it is actually having multiple hubs / p2p connecitons in the same vr-xcon instance that have us end up in some weird dead lock state.

@bdreisbach
Copy link
Author

after some more in depth investigation, my initial thoughts were incorrect. the trigger seems to be announcing the routes, it just so happens that my script sleeps for 90s to allow for convergence before fetching the rib, which happens to be the hold timer value. it just seemed like fetching the rib was the culprit. rib fetching is done out of band on docker0 indeed. the rib isnt that large, a few hundred routes max(my normal test environment). below is a scaled down version(vr-bgp instance wise) to make debugging easier, and to see if i could get it to break with just 2 neighbors).

i have gotten the setup into a fairly interesting state. there are 2 vr-bgp instances connected to a single hub, and i've broken it in such a way that 1 is reachable and 1 is not reachable. i have turned on all of the debugging, and on the hub i can see traffic coming in and going out to both vr-bgp instances(.2 is the virtual router, .3 and .4 are vr-bgp instances, and .7 is the hub):

17:51:21.898609 IP 172.200.0.2.10002 > 172.200.0.7.53240: Flags [P.], seq 1020:1122, ack 511, win 1452, options [nop,nop,TS val 91738048 ecr 91737900], length 102
17:51:21.898670 IP 172.200.0.7.53240 > 172.200.0.2.10002: Flags [.], ack 1122, win 254, options [nop,nop,TS val 91738048 ecr 91738048], length 0
17:51:21.899688 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [P.], seq 1020:1122, ack 511, win 1444, options [nop,nop,TS val 91738048 ecr 91737899], length 102
17:51:21.899824 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [.], ack 1122, win 1452, options [nop,nop,TS val 91738048 ecr 91738048], length 0
17:51:21.900062 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 1530:1632, ack 1, win 1444, options [nop,nop,TS val 91738048 ecr 91737889], length 102
17:51:21.900137 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 1632, win 1452, options [nop,nop,TS val 91738048 ecr 91738048], length 0
17:51:21.901270 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [P.], seq 511:613, ack 1122, win 1452, options [nop,nop,TS val 91738049 ecr 91738048], length 102
17:51:21.901312 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [.], ack 613, win 1444, options [nop,nop,TS val 91738049 ecr 91738049], length 0
17:51:21.901587 IP 172.200.0.7.53240 > 172.200.0.2.10002: Flags [P.], seq 511:613, ack 1122, win 254, options [nop,nop,TS val 91738049 ecr 91738048], length 102
17:51:21.901651 IP 172.200.0.2.10002 > 172.200.0.7.53240: Flags [.], ack 613, win 1452, options [nop,nop,TS val 91738049 ecr 91738049], length 0
17:51:21.902003 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 1632:1734, ack 1, win 1444, options [nop,nop,TS val 91738049 ecr 91738048], length 102
17:51:21.902054 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 1734, win 1452, options [nop,nop,TS val 91738049 ecr 91738049], length 0

on the vr-bgp instance that is working i can see traffic on both the tap interface and the eth interface.
eth0:

root@3d04b69c3778:/# tcpdump -n -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:52:59.756756 STP 802.1d, Config, Flags [none], bridge-id 829c.00:0c:31:d7:44:00.8097, length 42
17:52:59.898857 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [P.], seq 4210599587:4210599689, ack 360309870, win 1444, options [nop,nop,TS val 91762548 ecr 91762399], length 102
17:52:59.898935 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [.], ack 102, win 1452, options [nop,nop,TS val 91762548 ecr 91762548], length 0
17:52:59.900472 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [P.], seq 1:103, ack 102, win 1452, options [nop,nop,TS val 91762548 ecr 91762548], length 102
17:52:59.900567 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [.], ack 103, win 1444, options [nop,nop,TS val 91762548 ecr 91762548], length 0
17:53:00.268103 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [P.], seq 102:204, ack 103, win 1444, options [nop,nop,TS val 91762640 ecr 91762548], length 102
17:53:00.304980 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [.], ack 204, win 1452, options [nop,nop,TS val 91762650 ecr 91762640], length 0
17:53:00.898981 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [P.], seq 204:306, ack 103, win 1444, options [nop,nop,TS val 91762798 ecr 91762650], length 102
17:53:00.899064 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [.], ack 306, win 1452, options [nop,nop,TS val 91762798 ecr 91762798], length 0
17:53:00.900568 IP 172.200.0.3.10001 > 172.200.0.7.36820: Flags [P.], seq 103:205, ack 306, win 1452, options [nop,nop,TS val 91762798 ecr 91762798], length 102
17:53:00.900651 IP 172.200.0.7.36820 > 172.200.0.3.10001: Flags [.], ack 205, win 1444, options [nop,nop,TS val 91762798 ecr 91762798], length 0

tap0:

root@3d04b69c3778:/# tcpdump -n -i tap0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:53:03.784735 IP 50.5.130.3.40734 > 50.5.130.1.179: Flags [P.], seq 2028910209:2028910228, ack 3293955305, win 229, options [nop,nop,TS val 91763519 ecr 3176], length 19: BGP
17:53:03.792880 IP 50.5.130.1.179 > 50.5.130.3.40734: Flags [.], ack 19, win 32768, options [nop,nop,TS val 3235 ecr 91763519], length 0
17:53:03.904680 IP 50.5.130.1 > 50.5.130.3: ICMP echo request, id 1000, seq 1500, length 64
17:53:03.904742 IP 50.5.130.3 > 50.5.130.1: ICMP echo reply, id 1000, seq 1500, length 64
17:53:04.150267 IP 50.5.130.1.179 > 50.5.130.3.40734: Flags [P.], seq 1:20, ack 19, win 32768, options [nop,nop,TS val 3236 ecr 91763519], length 19: BGP
17:53:04.150329 IP 50.5.130.3.40734 > 50.5.130.1.179: Flags [.], ack 20, win 229, options [nop,nop,TS val 91763611 ecr 3236], length 0
17:53:04.263137 IP 50.5.130.1 > 50.5.130.4: ICMP echo request, id 1001, seq 1493, length 64
17:53:04.901626 IP 50.5.130.1 > 50.5.130.3: ICMP echo request, id 1000, seq 1501, length 64
17:53:04.901705 IP 50.5.130.3 > 50.5.130.1: ICMP echo reply, id 1000, seq 1501, length 64

on the vr-bgp instance that isnt working i see packets on eth0 but not on tap0.
eth0:

root@52f1a44e334d:/# tcpdump -n -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:54:41.865309 STP 802.1d, Config, Flags [none], bridge-id 829c.00:0c:31:d7:44:00.8097, length 42
17:54:41.898009 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 156925484:156925586, ack 3299277167, win 1444, options [nop,nop,TS val 91788048 ecr 91787889], length 102
17:54:41.898068 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 102, win 1452, options [nop,nop,TS val 91788048 ecr 91788048], length 0
17:54:41.899817 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 102:204, ack 1, win 1444, options [nop,nop,TS val 91788048 ecr 91788048], length 102
17:54:41.899850 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 204, win 1452, options [nop,nop,TS val 91788048 ecr 91788048], length 0
17:54:42.262104 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 204:306, ack 1, win 1444, options [nop,nop,TS val 91788139 ecr 91788048], length 102
17:54:42.262159 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 306, win 1452, options [nop,nop,TS val 91788139 ecr 91788139], length 0
17:54:42.898029 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 306:408, ack 1, win 1444, options [nop,nop,TS val 91788298 ecr 91788139], length 102
17:54:42.898092 IP 172.200.0.4.10001 > 172.200.0.7.59874: Flags [.], ack 408, win 1452, options [nop,nop,TS val 91788298 ecr 91788298], length 0
17:54:42.900026 IP 172.200.0.7.59874 > 172.200.0.4.10001: Flags [P.], seq 408:510, ack 1, win 1444, options [nop,nop,TS val 91788298 ecr 91788298], length 102

tap0:

root@52f1a44e334d:/# tcpdump -n -i tap0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:55:26.940184 ARP, Request who-has 50.5.130.1 tell 50.5.130.4, length 28
17:55:27.936979 ARP, Request who-has 50.5.130.1 tell 50.5.130.4, length 28
17:55:28.936984 ARP, Request who-has 50.5.130.1 tell 50.5.130.4, length 28
17:55:29.937090 ARP, Request who-has 50.5.130.1 tell 50.5.130.4, length 28
17:55:30.936947 ARP, Request who-has 50.5.130.1 tell 50.5.130.4, length 28
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel

i also have the docker logs from the vr-bgp instances and the hub instance. the vr-bgp instances both show this:

2020-08-04 17:57:40,896: xcon       DEBUG    received packet from TCP and sending to tap interface
2020-08-04 17:57:40,897: xcon       DEBUG    read 102 bytes from tcp, tcp_buf length 102
2020-08-04 17:57:40,897: xcon       DEBUG    reading size - pkt size: 98
2020-08-04 17:57:40,897: xcon       DEBUG    reading packet - reading 98 bytes
2020-08-04 17:57:40,897: xcon       DEBUG    reading size - less than 4 bytes available in buf; waiting for next spin
2020-08-04 17:57:40,898: xcon       DEBUG    received packet from tap interface and sending to TCP
2020-08-04 17:57:41,096: xcon       DEBUG    received packet from TCP and sending to tap interface
2020-08-04 17:57:41,096: xcon       DEBUG    read 46 bytes from tcp, tcp_buf length 46
2020-08-04 17:57:41,096: xcon       DEBUG    reading size - pkt size: 42

so they appear to at least be getting packets from the hub.

on the hub i see packets coming in and going out to both vr-bgp instances:

2020-08-04 17:59:54,897: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb-ebgp-00/1 -> 5005_r00.dokrtx95.us.bb-ebgp-00/1
2020-08-04 17:59:54,898: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb-ebgp-00/1 -> 5005_r00.dokrtx95.us.bb-ebgp-01/1
2020-08-04 17:59:55,248: xcon       DEBUG    00046 bytes 5005_r00.dokrtx95.us.bb-ebgp-01/1 -> 5005_r00.dokrtx95.us.bb/2
2020-08-04 17:59:55,249: xcon       DEBUG    00046 bytes 5005_r00.dokrtx95.us.bb-ebgp-01/1 -> 5005_r00.dokrtx95.us.bb-ebgp-00/1
2020-08-04 17:59:55,249: xcon       DEBUG    00046 bytes 5005_r00.dokrtx95.us.bb-ebgp-01/1 -> 5005_r00.dokrtx95.us.bb-ebgp-01/1
2020-08-04 17:59:55,258: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb/2
2020-08-04 17:59:55,258: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-00/1
2020-08-04 17:59:55,258: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-01/1
2020-08-04 17:59:55,297: xcon       DEBUG    00064 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb/2
2020-08-04 17:59:55,297: xcon       DEBUG    00064 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-00/1
2020-08-04 17:59:55,297: xcon       DEBUG    00064 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-01/1
2020-08-04 17:59:55,900: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb/2
2020-08-04 17:59:55,900: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-00/1
2020-08-04 17:59:55,901: xcon       DEBUG    00102 bytes 5005_r00.dokrtx95.us.bb/2 -> 5005_r00.dokrtx95.us.bb-ebgp-01/1

at this point im not entirely sure what is going on or what to poke next.

@bdreisbach
Copy link
Author

also for what its worth i wasnt able to break the hub with a single vr-bgp instance connected to it.

@bdreisbach
Copy link
Author

one thing ive noticed on the vr-bgp instance that isnt working is the tcp_buf len(i reverted back to 2048 in the xcon.py code so this isnt me messing with the buffer in any way, both instances are using the same image)

2020-08-04 18:48:04,522: xcon       DEBUG    read 64 bytes from tcp, tcp_buf length 706176
2020-08-04 18:48:04,522: xcon       DEBUG    reading packet - less than remaining bytes; waiting for next spin

on the instance thats working it looks like this:

2020-08-04 18:50:18,667: xcon       DEBUG    read 46 bytes from tcp, tcp_buf length 46
2020-08-04 18:50:18,668: xcon       DEBUG    reading size - pkt size: 42
2020-08-04 18:50:18,668: xcon       DEBUG    reading packet - reading 42 bytes
2020-08-04 18:50:18,668: xcon       DEBUG    reading size - less than 4 bytes available in buf; waiting for next spin
2020-08-04 18:50:18,671: xcon       DEBUG    received packet from TCP and sending to tap interface
2020-08-04 18:50:18,672: xcon       DEBUG    read 64 bytes from tcp, tcp_buf length 64
2020-08-04 18:50:18,672: xcon       DEBUG    reading size - pkt size: 60
2020-08-04 18:50:18,672: xcon       DEBUG    reading packet - reading 60 bytes
2020-08-04 18:50:18,672: xcon       DEBUG    reading size - less than 4 bytes available in buf; waiting for next spin

@klambrec
Copy link

I've been struggling with the TcpHub yesterday as well:

  • I'm using the TcpHub between 3 vrnetlab VMs (in this case Cisco SD-WAN appliances)
  • While these appliances connect to each other I was seeing "random" failures on the hub, no traffic (including ping) could still cross it and it never recovered
  • My problems may or not be similar to those described by @bdreisbach
  • At no point did I find any proof that the TcpHub is actually the cause of the connectivity failure
    • In fact, looking at this with @plajjan we confirmed from a packet capture that the QEMU TCP implementation sent a TCP Reset
    • I have no idea what's wrong on the QEMU side, in my case, it might be that the interface on the VM flaps and is the trigger ...
    • But the hub is pretty awful at detecting and recovering from failures

I've re-implemented the TcpHub, see #237 with several changes:

  • Set connections to be non-blocking
    • I'm pretty sure this was the most significant improvement
    • Without it, the hub would sometimes just seem to "hang" during a failure, I suspect because select.select() never returned
  • Replaced the socket2hostintf dictionary with attributes on a subclass of socket.socket()
    • The old implementation used the socket as a key, that definitely does not work as the __repr__() for a socket changes when it closes
    • @plajjan I don't think you were a big fan of this, but I think it looks a lot cleaner, open to discussion
  • Re-connect now closes the old socket and creates a new one
    • Running a connect() a second time never seemed to work in my trials
    • My Google-fu also seemed to back me up here, the FD is actually closed, so you should close the old socket and create a new one

A potential area of improvement would active socket monitoring. I have seen one rare occurrence where it took up to a full minute for an endpoint on the hub to recover from a failure. That was probably just a TCP socket timing out at the OS level. Can we do anything to speed up detection?

@klambrec
Copy link

@bdreisbach I have since figured out that the real root cause for all of this is likely in QEMU.
Let's say I have 3 VMs on the bridge:

  • VM1 the sender
  • VM2 the receiver
  • VM3 the other one

99% of my traffic flow is between VM1 and VM2 but the problem actually occurs towards VM3.

I have seen in comparing packet captures with a continuous ping between VM1 and VM3 that:

  • At some point QEMU on VM3 doesn't like this situation anymore (it has received too many packets not destined to it)
  • It resets the TCP connection to the hub
  • The hub reconnects to VM3 thanks to my patch
  • But VM3 QEMU hasn't fully recovered, any packets it sends to the hub are received just fine there but packets from the hub to VM3 just disappear somewhere inside of QEMU

Communication can be restored by either:

  • Restarting xcon which resets all TCP connections obviously
  • Waiting a couple minutes, probably 300 seconds as a TCP timeout on QEMU??

@bdreisbach
Copy link
Author

closing in favor of #238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants