Packet flooding and high CPU usage #779

darkain · 2018-06-05T19:22:22Z

I'm trying to create a basic ZT path between two buildings. Each building has a OPNsense 18.1.9 edge router with the ZT 1.2.8 plugin installed.

Building A: LAN 192.168.2.0/24 - ZT 192.168.5.2
Building B: LAN 192.168.3.0/24 - ZT 192.168.5.3
ZT: 192.168.5.0/24
3 ZT managed routes: one for the ZT network, and one for each of the building LANs with the their respective ZT IP listed as their respective gateways.

The two OPNsense nodes are the only nodes in the ZT network. Both have bridging enabled, and auto-assign IP disabled. Flow rules in my.zt are all default. Network is idle other than a Windows box on one building's LAN pinging a Windows box on the other building's LAN (less than 2KiB/sec)

ZT is generating a MASSIVE amount of packets that is spiking the CPU to 100% regularly, yet the packets never go anywhere, and they're not generated from any of the nodes on either network. When this CPU spike happens, all connectivity over ZT is entirely dropped.

Reference: https://drive.google.com/file/d/1NIkdnilV0HSXuytMPn3zHzragyAcEa33/view?usp=sharing

You can see in the screen shot from the OPNsense interface stats that ZT has generated over 600GiB of content total, yet WAN has only transfered around 35GiB and LAN only 21GiB. These stats are for around a 24 hour period.

Nothing is matching the ZT network at all in pfTop or Firewall log, so at this point I'm not sure where next to investigate this particular issue?

adamierymenko · 2018-06-05T19:30:54Z

That is indeed extremely strange.

Can you try building one side with ZT_TRACE enabled? make ZT_TRACE=1

adamierymenko · 2018-06-05T19:33:38Z

Also what are these packets? What happens if you tcpdump the zt interface?

darkain · 2018-06-05T19:35:12Z

Well crap, found the issue actually. It is a dup of #759

So basically, zt-one needs to be aware of managed routes and not attempt to connect to them at all. That would solve this issue. ZT kept bouncing between the remote router's WAN address and private LAN address (which would no longer be accessible once it doesn't know the route because it literally just broke it )

adamierymenko · 2018-06-05T19:36:57Z

Yes, that would be it. I've heard this phenomenon called a "software laser." :)

darkain · 2018-12-21T19:18:02Z

Re-opening, because it is still bugged.

More specific details in my particular case.

I have two OPNsense nodes, both with ZeroTier. They each have static routes pointing to each other's LANs so two different buildings can fully cross-communicate. Static routes have been tried manually, as "managed routes" in the my.zerotier interface, and through OSPF (none of these three make a difference, bug exists regardless of how they're set)

ZeroTier attempts all available IP addresses to find ZeroTier peers. The problem is that this ALSO includes the ZeroTier private IP addresses and LAN addresses as well. ZeroTier is attempting to communicate with the remote ZeroTier instance over ZeroTier itself because it sees access to the remote node's LAN address through the static routes. As soon as this connection is made, the WAN IP address connection is disabled. Because of this, the remote LAN address is no longer available, and the ZeroTeir connection is broken. At this stage, the static route is also unreachable, so ZeroTier reverts back to the proper WAN address, and re-establishes the connection. This flapping back and forth between WAN and LAN addresses is creating an entirely unstable connection while also packet flooding. In the past 12 hours, this has consumed 1TB of bandwidth just attempting to re-establish connections. If I was not already on an unmetered internet connection, this could be literally costing me hundreds of dollars a day in bandwidth.

Example:

ZT-A > WAN > ZT-B (working)
ZT-A > ZT > ZT-B > LAN (seeing the LAN address as available)
ZT link switches from WAN address to LAN address
ZT link breaks
ZT re-establishes on WAN address

This process repeats over and over again generating a massive amount of packets flooding the system and chewing away at CPU cycles in the process as well.

Up until yesterday, "drop dport 9993;" worked by setting it in the my.zerotier interface. This prevented the ZT communication packets from transferring over the ZT interface, stabilizing the connection. No idea what changed, but this no longer functions. Prior to this, I was using a local.conf file on every single node specifying which addresses it was not allowed to connect to, but this defeats half the point of ZeroTier being a centralized management interface. This also becomes a huge pain as new routers/buildings are onboarded, every single other router in the network needs to have its configuration updated to be made aware to not allow LAN addresses from the new router. We switched from IPsec to ZeroTier+OSPF specifically for centralized and automated configuration, just to be put back where we were in the first place.

Config for individual node (note: each time a new building is added, it must be added to ALL other routers)
{"physical": {"192.168.1.0/24":{"blacklist":true}}}

laduke · 2018-12-21T20:56:21Z

ZeroTierOne/service/OneService.cpp

Line 2398 in 52c4385

    
           #if defined(__linux__) || defined(linux) || defined(__LINUX__) || defined(__linux)

Do we need one of these sections for BSDs?

darkain · 2018-12-21T21:22:52Z

The issue at hand is not about binding to a particular interface. In this case, it is binding to WAN and LAN interfaces. The issues is as soon as LAN subnets are bridged between two different locations (via Managed Routes) or otherwise, the two ZT nodes will then attempt to communicate between each other via the LAN instead of WAN addresses. The LAN addresses should still be bound for local nodes.

Instead, I think ZT traffic should be flagged and filtered out from being allowed to be passed over a ZT tunnel. Is there ever a case when a ZT network should be encapsulated inside of another ZT network?

glimberg · 2018-12-22T02:59:12Z

Is there ever a case when a ZT network should be encapsulated inside of another ZT network?

Yes there is. For instance, Google Kubernetes Engine only has link local ipv6 addresses on it's kuberneres nodes, so we use a ZeroTier network to pipe in a routable /64 to kuberneres. This is controller traffic, but it's still ZeroTier packets encapsulated in a ZeroTier network

chacal · 2018-12-29T17:26:04Z

I seem to have this same problem. Instead of having ZT traffic going over itself using IPv4 address of the remote LAN my problem seems to caused by IPv6 address propagated over the ZT link to the remote site.

After setting up ZT between two LANs everything usually works fine for some time, but eventually it ends up to the same state described here earlier: address of the peer fluctuates between proper public IPv4 address and private IPv6 address that was propagated to other side of the ZT link via IPv6 RA. When peer listing shows this private IPv6 address as peer's active address, CPU usage hits 100%, the connection brakes and huge volume of traffic is generated. Strangely the generated traffic has same IPv6 address both as source and destination (the address of remote peer).

Problem with my situation is that even blacklisting the IPv6 network in local.conf doesn't solve this situation, however. :/

Any ideas that might help here?

chacal · 2019-01-04T17:53:33Z

It seems that I was able to fix my problem by adding the bridge interface on the remote LAN end to interfacePrefixBlacklist on local.conf. Now the IPv6 address still propagates there properly, but it doesn't seem to be used for ZT traffic anymore and thus sending ZT traffic "over itself" is avoided.

darkain · 2019-01-29T21:32:05Z

I've switched up to trying to same for now to see how it goes. I have the following local.conf that I'm starting to test as of today:

{ "settings": { "interfacePrefixBlacklist": ["zte"], "allowTcpFallbackRelay": false } }

Right now I'm trying to create a standardized configuration for easier deployment in multiple data centers. I plan on doing a full write up of basically an autonomous multi-network routing system using ZeroTier, essentially a private virtualized internet on top of the internet itself. Hopefully with this simple config, I can now have ZT entirely stable and focus on the other services on top of it!

cferrey · 2019-02-05T12:38:11Z

I'm also having this issue over a bridged setup, and adding "drop dport 9993;" to my flow rules also helped for a few days but no longer works. I'm planning to try the above blacklisting method. Can anyone advise as to where my local.conf file would live on Raspbian/Debian, or where I should create it? I'm pretty new to Linux, and Googling interestingly hasn't helped answer this seemingly straightforward question. Thanks!

chacal · 2019-02-05T12:58:33Z

Here's some information about the local.conf file: https://github.com/zerotier/ZeroTierOne/tree/master/service

On Debian it should be placed to /var/lib/zerotier/local.conf (assuming you have installed ZeroTier from prebuilt .deb package).

cferrey · 2019-02-05T13:04:27Z

Here's some information about the local.conf file: https://github.com/zerotier/ZeroTierOne/tree/master/service

On Debian it should be placed to /var/lib/zerotier/local.conf (assuming you have installed ZeroTier from prebuilt .deb package).

Thank you very much -- I did not have that file, but created it with sudo nano and added this single line:

{ "settings": { "interfacePrefixBlacklist": ["br0"], "allowTcpFallbackRelay": false } }

Unfortunately, this didn't work. I also tried adding my ZT interface instead of the br0 interface, but no luck. Do you have any thoughts on what I'm doing wrong? My br0 interface bridges the ZT and eth0 interfaces, and br0 receives a static IP while eth0 has no IP assigned.

Edit: I also added a physical route blacklist for the common subnet being used on my ZT network and at both remote LANs in my L2 bridged setup. This also did not work. My full local.conf file is below -- hoping someone can point out any issues.

{ "physical": { "10.0.0.0/16": { "blacklist": true } }, "settings": { "interfacePrefixBlacklist": [“br0"], "allowTcpFallbackRelay": false } }

chacal · 2019-02-05T14:33:29Z

Don't know about your specific setup, but for me blacklisting using IP address helped. My local.conf (with IP address obfuscated):

{
  "physical": {
    "2001:2003:xxxx:xxxx::/56": {
      "blacklist": true
    }
  }
}

The mentioned IPv6 network is the one that is propagated to the remote site using IPv6 router advertisements.

darkain · 2019-02-05T17:54:58Z

As an update, the interface blacklist didn't work. Also, I now know why the flow rules for 9993 don't work, but that'll be a separate issue.

cferrey · 2019-02-06T02:55:06Z

Don't know about your specific setup, but for me blacklisting using IP address helped. My local.conf (with IP address obfuscated):
{
  "physical": {
    "2001:2003:xxxx:xxxx::/56": {
      "blacklist": true
    }
  }
}
The mentioned IPv6 network is the one that is propagated to the remote site using IPv6 router advertisements.

Unfortunately I've hit a dead end here. I have no IPv6 addresses in my setup, as I am assigning IPv4 addresses to the bridging devices manually through ZT Central. I don't see any IPv6 addresses when I do listpeers on the bridging devices, so I'm at a loss as to what else to try.

I'll keep an eye on #915. Hope that can be resolved and that it'll fix all these issues, as setting a single flow rule seems much more scalable than editing configs on all ZT clients.

StrikerTwo · 2020-01-09T11:10:39Z

This (and #759) is still broken, if anybody is interested :(
I still have ZT nodes with their INTERNAL ip address in my peer list.

4ccda4xxxx 1.4.6  LEAF      94 DIRECT 0        16406    10.4.0.2/9993

This IP can only be reached over the ZT tunnel itself. Zerotier tries to do just that, using one CPU core to 100% and sending millions of packets that never go anywhere, until something resets and it goes back to normal. The annoying thing is that this causes connections to all other nodes to drop or at least go bad, because the CPU usage causes a general latency spike (up to 1500 ms, then pings time out).

Why does Zerotier not blacklist all ZT interfaces and all internal routes internally as default? Is there any use case for allowing ZT connections over a ZT tunnel?

StrikerTwo · 2020-01-09T11:26:02Z

And this is how a packet spike looks like:

Ethernet Type	IP Protocol	Source Address	Destination Address	Source Port	Destination Port	Service Name	Status	Packets Count	Total Packets Size	Total Data Size	Data Speed	Maximum Data Speed	Average Packet Size	Maximum Packet Size	First Packet Time	Last Packet Time	Duration	Latency	Process ID	Process Filename	TCP Ack	TCP Push	TCP Reset	TCP Syn	TCP Fin	Maximum Segment Size	TCP Window Size	TCP Window Scale	TTL	Source Country	Destination Country
IPv4	UDP	10.0.0.3	10.4.0.2	28053	9993			2.480.905	1.525.624.328	1.456.158.988		10723.3 KiB/Sec	614.9	1460	09.01.2020 11:45:23	09.01.2020 11:50:21	00:04:57.213		1992	zerotier-one_x64.exe	0	0	0	0	0				127

laduke · 2020-01-09T19:19:47Z

@StrikerTwo are you on a BSD?

StrikerTwo · 2020-01-10T08:01:58Z

Nope, Windows Server on both sides (2012 R2 / 2016)

laduke · 2020-01-10T18:32:44Z

Heh looks like it doesn't avoid binding 'zt' interfaces on windows either, but I dunno

rexxfan · 2020-01-22T13:06:24Z

FWIW this issue is affecting me as well. I have a very simple zt network, defined with all defaults. Nothing was customized. I have one Windows 10 PC on the LAN running zt, and one PC with same on another LAN. I use zt for remote access using RDP. Every few days or so the whole LAN grinds to a halt for about a minute and then mysteriously clears up. I traced one such incident using wireshark and there's millions of packets flowing over the LAN heading towards zt nodes. I uninstalled zt and the problem went away. This is a shame. It is such a great product, but this is a fatal flaw.

janjaapbos · 2020-01-22T13:32:42Z

Are you sure this is not valid traffic generated by the windows 10 pc's? E.g. windows update traffic between them? See https://www.digitalcitizen.life/how-set-windows-10-get-updates-local-network-internet

darkain · 2021-08-14T21:52:45Z

It's not exactly what I usually like to do, but is there anything on the horizon for this ?
(assuming one wants to have access between the routers, so some solutions do not seem exactly ideal)

The "work around" is just a few comments above in this thread: #779 (comment)

danmanners · 2021-08-16T14:21:58Z

In my particular case, it is caused by ZeroTier trying to route through flannel over ZeroTier. This is not a ZeroTier bug. My solution was "interfacePrefixBlacklist": [ "flannel", "cni" ].

I'm actually trying to route flannel over ZeroTier for remote k3s hosts, so blacklisting the Flannel/CNI doesn't work for me.

glimberg · 2021-08-16T15:46:30Z

I'm actually trying to route flannel over ZeroTier for remote k3s hosts, so blacklisting the Flannel/CNI doesn't work for me.

Blacklisting the flannel/CNI interface is what you want, then. This prevents ZeroTier from using the flannel interface to transport packets. It does not prevent you from routing flannel packets over ZeroTier.

danmanners · 2021-08-17T02:12:34Z

Holy hell, that totally fixed it for me in both Azure and Google Cloud. Having a hell of a time getting things working in an automated way now, and it totally doesn't work on a reboot by default, but things are operational and stable after manually making all of these changes.

/var/lib/zerotier-one/local.conf:

{
  "settings": {
    "interfacePrefixBlacklist": [ "flannel", "cni" ]
  }
}

/etc/systemd/network/01-zerotier.network:

[Match]
Name=zt... # Actual Network Interface Name here

[Link]
Unmanaged=yes # If this isn't here, Zerotier will never pull an IP Address

[Network]
DHCP=yes
UseDNS=true
DNS=10.45.0.1 # Remote DNS server

laduke · 2021-11-03T01:23:49Z

Can anyone give me some tips on reproducing this easily? On digital ocean or vultr or something simple like that. I have a couple opnsense vms running on my workstation and nothing is happening yet.

darkain · 2021-11-03T04:47:25Z

Can anyone give me some tips on reproducing this easily? On digital ocean or vultr or something simple like that. I have a couple opnsense vms running on my workstation and nothing is happening yet.

There needs to be an available IP route inside of the tunnel that ZT is listening on, where ZT flaps between the normal route, and its own network, which then breaks its own network, and reverts back to the other network. Its a route flapping issue. ZT alone on OPNsense wont cause it. BUT, if you have to OPNsense (or any routers) with ZT on it, and have routable private networks between those two routers, ZT will attempt to use that private network instead of public internet, and thus breaks itself.

laduke · 2021-11-04T17:53:39Z

Right. thanks!

So... This doesn't happen on linux.
I set up 2 freebsd and 2 linux VPSs with private networks behind them and added managed routes for all of them.

Visual aid:

As soon as the bsds start interacting, this happens:

{
  "address": "a80de2684f",
  "isBonded": false,
  "latency": 10,
  "paths": [
   {
    "active": true,
    "address": "207.246.96.60/9993",
    "expired": false,
    "lastReceive": 1636029873423,
    "lastSend": 1636029872458,
    "preferred": false,
    "trustedPathId": 0
   },
   {
    "active": true,
    "address": "10.5.96.3/9993",
    "expired": false,
    "lastReceive": 1636029873423,
    "lastSend": 1636029880943,
    "preferred": true,
    "trustedPathId": 0
   }
  ],
  "role": "LEAF",
  "version": "1.6.6",
  "versionMajor": 1,
  "versionMinor": 6,
  "versionRev": 6
 }

and here's a80de2684f's ifconfig

vtnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 56:00:03:a8:7a:80
        inet 207.246.96.60 netmask 0xfffffe00 broadcast 207.246.97.255
        media: Ethernet 10Gbase-T <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vtnet1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6800bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 5a:00:03:a8:7a:80
        inet 10.5.96.3 netmask 0xff000000 broadcast 255.255.240.0
        media: Ethernet 10Gbase-T <full-duplex>
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
zt1b694qoob1o2i: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 5000 mtu 2800
        options=80000<LINKSTATE>
        ether 32:48:bd:d2:be:6b
        hwaddr 58:9c:fc:10:ff:c9
        inet6 fe80::5a9c:fcff:fe10:ffc9%zt1b694qoob1o2i prefixlen 64 scopeid 0x4
        inet 10.241.201.101 netmask 0xffff0000 broadcast 10.241.255.255
        groups: tap
        media: Ethernet autoselect
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        Opened by PID 4060
#

10.5.96.3 is a physical address, but it's only reachable over zerotier.

This hasn't happened on the debian VMs yet after like 12 hours.

Any Unix folks want to take a look too? Maybe around here

In the meantime, I'm going to go try to get a note added to the opnsense zerotier docs about the local.conf blacklist trick.

joseph-henry · 2021-11-11T19:32:46Z

I haven't fully read up on this read but based on a cursory glance I'd take a look at this function:

ZeroTierOne/service/OneService.cpp

Line 3064 in a7116bc

bool shouldBindInterface(const char* ifname, const InetAddress& ifaddr)

And this block:

ZeroTierOne/service/OneService.cpp

Line 2936 in a7116bc

// Make sure we're not trying to do ZeroTier-over-ZeroTier

joseph-henry · 2021-11-11T20:10:22Z

Spent a little time catching up on this and something came to mind: vanilla ZT will select a path based on its scope. This scope combined with an address family check will determine its preference rank.

See:

ZeroTierOne/node/Path.hpp

Line 175 in a7116bc

inline unsigned int preferenceRank() const

Scopes are defined here:

ZeroTierOne/node/InetAddress.cpp

Line 29 in a7116bc

InetAddress::IpScope InetAddress::ipScope() const

As a result ZT will prefer to switch from a working global path to a working private path. If this private path is part of a managed route and we didn't check for overlap:

OneService.cpp:

		/* Note: I do not think we need to scan for overlap with managed routes
		 * because of the "route forking" and interface binding that we do. This
		 * ensures (we hope) that ZeroTier traffic will still take the physical
		 * path even if its managed routes override this for other traffic. Will
		 * revisit if we see recursion problems. */

We could find ourselves in a non-working and eventually flappalicious state.

darkain · 2021-11-12T01:31:18Z

In addition, it may NOT be a "managed" path as well, but similar ZT path. For instance, RIP, OSPF, BGP, or similar routing protocol on top of ZeroTier to manage more complex routes automatically. In my particular case, I'm using OSPF with redundant routers for each route. So we'd need some way to explicitly tell ZT to not listen on specific routes, which currently for me is to blacklist the entire public IP space in use for all routes. But this also means that redundant routers cannot create direct peer-to-peer links on the same network, so I have a separate network just to handle those links. It gets complicated quite quickly!

Adds some temporary debug output And tries to reject zerotier over zerotier paths via nodePathCheckFunction I've never seen: fprintf(stderr, " HERE2: local: %s remote %s \n", buf1, buf2); get printed. This check has been here since forever. on freebsd, sometimes you'll see: "a zt managed target [10.12.0.0] contains this remote path [10.147.17.2], so" but mostly "a zt managed target [10.11.0.0/24] contains this remote path [10.11.0.1], so" On mac and linux, I've only seen "a zt managed target [10.11.0.0/24] contains this remote path [10.11.0.1], so" This change is probably incorrect and in the wrong level of the system, but it's: - stopping the problem on freebsd - i haven't found it breaking anything yet What is the problem? see: #779 start at the bottom ---- network is like this: 10.147.17.0/24 (LAN) 10.11.0.0/24 via 10.147.17.1 bsd 10.12.0.0/24 via 10.147.17.2 bsd 10.13.0.0/24 via 10.147.17.3 linux 192.168.192.0/24 via 10.147.17.192 mac there's nothing on the subnets except dummy interfaces/addresses on the node itself ip -o -4 a 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 2: ens18 inet 192.168.82.144/24 brd 192.168.82.255 scope global dynamic ens18\ valid_lft 20176sec preferred_lft 20176sec 3: ens19 inet 10.13.0.1/24 brd 10.13.0.255 scope global ens19\ valid_lft forever preferred_lft forever 8: zt5u4uptmb inet 10.147.17.3/24 brd 10.147.17.255 scope global zt5u4uptmb\ valid_lft forever preferred_lft forever 506 ifconfig feth16 create 507 ifconfig feth16 192.168.192.1 netmask 255.255.255.0 up

laduke · 2022-01-04T16:58:55Z

Was looking at this again.

ZeroTierOne/osdep/Binder.hpp

Lines 429 to 441 in 30c77cf

    
           #ifdef __LINUX__ 
        
           					// Bind Linux sockets to their device so routes that we manage do not override physical routes (wish all platforms had this!) 
        
           					if (ii->second.length() > 0) { 
        
           						char tmp[256]; 
        
           						Utils::scopy(tmp, sizeof(tmp), ii->second.c_str()); 
        
           						int fd = (int)Phy<PHY_HANDLER_TYPE>::getDescriptor(udps); 
        
           						if (fd >= 0) 
        
           							setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE, tmp, strlen(tmp)); 
        
           						fd = (int)Phy<PHY_HANDLER_TYPE>::getDescriptor(tcps); 
        
           						if (fd >= 0) 
        
           							setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE, tmp, strlen(tmp)); 
        
           					} 
        
           #endif	 // __LINUX__

I tested commenting out this linux only code and it reproduced the packet flooding and high cpu usage.

There's no SO_BINDTODEVICE api in freebsd though.

Alternatively, it seems like one could do the equivalent of route get 10.11.0.1 and if it's on a zerotier interface, don't use it.

In addition, it may NOT be a "managed" path ... So we'd need some way to explicitly tell ZT to not listen on specific routes

I can't currently imagine a way to get this automatically. But it would be nice to prevent this problem for the basic common case.

michmoor0725 · 2022-01-12T17:14:25Z

So what exactly was done on the opnsense side? create a rule that blocks what exactly?

darkain · 2022-01-13T03:50:09Z

So what exactly was done on the opnsense side? create a rule that blocks what exactly?

The bug isn't in OPNsense. But you can create a ZeroTier local rule to handle this.

This is what I came up with:

The work around for this issue is to block ZeroTier from routing ZeroTier packets over itself. I do this by blocking ZeroTier from listening on OPNsense LAN subnets.

In my particular case, I have 192.168.1.0/24 on one OPNsense router, and 192.168.2.0/24 on another router. I setup a simple range that covers both, plus more (for future expansion). I personally opted for blocking the entirety of 192.168.0.0/16

This is a standard configuration that I deploy on every OPNsense node in my router mesh.
{
	"physical": {
		"192.168.0.0/16": { "blacklist": true }
	}
}

mgiammarco · 2022-05-02T08:00:38Z

I have the same problem: opnsense+zerotier+ospf (frr)
Using

{ "settings": { "interfacePrefixBlacklist": [“br0"], "allowTcpFallbackRelay": false } }
Works only for few hours.
If I use (because I route many prefix of 10.0.0.0):

{ "physical": { "10.0.0.0/8": { "blacklist": true } }, "settings": { "interfacePrefixBlacklist": [“br0"], "allowTcpFallbackRelay": false } }
The entire zerotier stops working (and this is very strange to me).

mgiammarco · 2022-05-08T08:52:31Z

I am still debugging. I have seen that if I disable ospf and I build static routes (obviously I would like to avoid this workaround) among my three OPNSense the problem disappears.
I repeat that in my case the interfacePrefixBlackList does not work.

laduke · 2022-05-09T15:18:36Z

Can you run zerotier-cli info -j on there to check the local.conf config is loading? (The above json is invalid from “br0" but that's probably just from pasting into here.)

If it's the latest version of zerotier, it'll also show what addresses it's listeningOn

mgiammarco · 2022-05-12T07:31:14Z

{
 "address": "6c5e84b564",
 "clock": 1652340455022,
 "config": {
  "settings": {
   "allowTcpFallbackRelay": false,
   "interfacePrefixBlacklist": [
    "zt"
   ],
   "listeningOn": [
    "10.1.3.1/9993",
    "10.129.0.3/9993",
    "10.1.2.184/9993",
    "10.1.3.1/29994",
    "10.129.0.3/29994",
    "10.1.2.184/29994",
    "10.1.3.1/33408",
    "10.129.0.3/33408",
    "10.1.2.184/33408"
   ],
   "portMappingEnabled": true,
   "primaryPort": 9993,
   "secondaryPort": 0,
   "softwareUpdate": "disable",
   "softwareUpdateChannel": "release",
   "tertiaryPort": 0
  }
 },
 "online": true,
 "planetWorldId": 149604618,
 "planetWorldTimestamp": 1644592324813,
 "publicIdentity": "6c5e84b564:0:09ad0117bde4933eeabc5554daefea5813d827cb6bc98136ff672dc97478543572bfae51af6ae6afb2285722296d6f056e34d99a3f1a651a7b5e5fe3e303b673",
 "tcpFallbackActive": false,
 "version": "1.8.6",
 "versionBuild": 0,
 "versionMajor": 1,
 "versionMinor": 8,
 "versionRev": 6
}

mgiammarco · 2022-05-12T07:34:52Z

10.1.2.184 is wan 10.129.0.3 is lan 10.1.3.1 is lan2

darkain · 2022-05-13T04:59:44Z

"The entire zerotier stops working (and this is very strange to me)."

Right, if your WAN is within 10.0.0.0/8 and you blacklist 10.0.0.0/8, you're blacklisting your WAN address. This is a side effect of having a non-public WAN IP address. You'd had to do more fine-grained blocking of just the LAN addresses at that point.

mgiammarco · 2022-05-17T07:11:27Z

"The entire zerotier stops working (and this is very strange to me)."

Right, if your WAN is within 10.0.0.0/8 and you blacklist 10.0.0.0/8, you're blacklisting your WAN address. This is a side effect of having a non-public WAN IP address. You'd had to do more fine-grained blocking of just the LAN addresses at that point.

Sorry I do not want to mislead you: I have a test framework with several OPNSense firewalls. Some are on real hardware, some other on virtual machines. It is unlucky that I have shown the only example with wan on 10.129.0.0. But also other firewalls with wan on 192.168.x.x or public ip stop communication.

Anyway the real problem is that interfaceprefixblacklist and tcpfallback are not enough.

mgiammarco · 2022-05-30T10:13:31Z

I confirm that:

{ "settings": { "interfacePrefixBlacklist": [“zte"], "allowTcpFallbackRelay": false } } is correctly recognized at zerotier startup
the settings above on my OPNSense setup do nothing: zerotier uses interfaces with zte prefix as gateways
Tried with 1.8.9 too.
What can I do to solve this problem that is a showstopper for me?

vadonka · 2022-08-14T16:06:54Z

I have many Opnsense (freebsd based) firewalls where i using zerotier.
This is my experience:
zerotier 1.8.6 or below versions works in every scenario.
zerotier 1.8.9 or 1.10.1 break everything no matter if im on freebsd 13.0 or 13.1.
All these firewalls are vmware VM using with vmxnet adapter.
When i say break it literally break the whole network! Its generate enormous amount of bogus traffic and eventually the state table and the mbuf usage exhausted. When the bogus traffic is generated every member of the network got those packets, around 100Mbit/sec flow, which is zeroed the whole network legit communication! Even more strange, this only happens after the 4th freebsd node is upgraded to 1.8.9 or 1.10.1. Also only happens when the freebsd nodes connected to the same network.

I use this on all node:
{
"physical": {
"10.0.0.0/8": {
"blacklist": true
},
"172.16.0.0/12": {
"blacklist": true
},
"192.168.0.0/16": {
"blacklist": true
}
}
}

This should prevent this behavior, which is working as intended in the version 1.8.6 or below, but broken in every newer versions!
So if you use multiple node multiple cross routing scenarios stay away from any newer version!
I did report this to zerotier because we are a paid user, but they didnt know the reason yet.
It seems its still broken.

vadonka · 2022-08-14T16:10:17Z

You can downgrade zerotier on opnsense to 1.8.6 even in the new 22.7.x version like so:
curl https://pkg.opnsense.org/FreeBSD:13:amd64/22.1/MINT/22.1.6/OpenSSL/Latest/zerotier.txz -o /tmp/zerotier.txz
pkg add -f /tmp/zerotier.txz
pkg lock zerotier

The last command needs because that is locked the package and prevent from upgrading.
I dont see any other solutions for now.
I still waiting for the official zerotier support answer.

laduke · 2022-08-18T21:05:00Z

sorry about that vadonka. still not sure what is causing that.

has anyone tried using multiple routing tables (fibs)? I just came across this old issue from a different search
#580 (comment)

Seems like you could start zerotier in a fib with only the needed routes set up and it won't see any other routes (that it creates).

vadonka · 2022-08-18T21:12:23Z

I dont use any centralized routes. Even the address is given by hand to an interface. I even tried the bind feature so zerotier only listen on a specific IP. We have firewallt with multiple wan IP (virtual IP), so i thought this was the case, but no. If im using 1.8.6 or below no issues, once i upgrade to 1.8.9 or above strange things starts to happen. I have no clue whats going on. What did changed between 1.8.6 and 1.8.9? Any zerotier devs could look into it? Something must have changed if its causing this.

darkain · 2022-08-19T05:20:24Z

@vadonka it sounds like you have a different issue, I'd suggest opening up a new fresh issue and reporting what issues you're seeing, any logs or errors, and what steps you've done to reproduce the issue and attempted to solve it.

masx200 · 2024-03-16T13:17:08Z

If you run VXLAN on zerotier, the traffic loops between the two virtual interfaces indefinitely, and the packets are encapsulated an infinite number of times, and the packets will get bigger and bigger.
I have an idea to check the size of the packet on the zerotier interface and if the upper limit of the size is exceeded, then the packet loss is handled.
The upper limit of the packet size can be set to a few kilobytes.

如果在zerotier上运行vxlan,流量在这两个虚拟接口之间无限循环,那么数据包进过无限次的封装,那么数据包会越来越大.
我有一个想法,可以在zerotier接口上检查数据包的大小,如果超过大小的上限,则进行丢包处理.
数据包大小的上限可以设定为几千字节.

darkain closed this as completed Jun 5, 2018

darkain reopened this Dec 21, 2018

darkain mentioned this issue Feb 5, 2019

settings/primaryPort not fully respected #915

Open

cferrey mentioned this issue Feb 6, 2019

ZT interface missing on Raspbian after editing local.conf; can't bring it back #916

Closed

darkain mentioned this issue Aug 13, 2019

Blacklist no longer works as expected in 1.4.0 #1005

Closed

StrikerTwo mentioned this issue Jan 9, 2020

ZT tries to use internal address for P2P connection to remote node #759

Closed

laduke mentioned this issue Nov 4, 2021

Make a note for a common ZeroTier issue. opnsense/docs#360

Closed

2 tasks

masx200 mentioned this issue Mar 16, 2024

How to solve the traffic loop problem of vxlan over tailscale, and sometimes the cpu usage is very high and there is no speed at all? tailscale/tailscale#11320

Open

Packet flooding and high CPU usage #779

Packet flooding and high CPU usage #779

Comments

darkain commented Jun 5, 2018

adamierymenko commented Jun 5, 2018

adamierymenko commented Jun 5, 2018

darkain commented Jun 5, 2018

adamierymenko commented Jun 5, 2018

darkain commented Dec 21, 2018

laduke commented Dec 21, 2018

darkain commented Dec 21, 2018

glimberg commented Dec 22, 2018 • edited Loading

chacal commented Dec 29, 2018

chacal commented Jan 4, 2019

darkain commented Jan 29, 2019

cferrey commented Feb 5, 2019 • edited Loading

chacal commented Feb 5, 2019

cferrey commented Feb 5, 2019 • edited Loading

chacal commented Feb 5, 2019

darkain commented Feb 5, 2019

cferrey commented Feb 6, 2019

StrikerTwo commented Jan 9, 2020

StrikerTwo commented Jan 9, 2020

laduke commented Jan 9, 2020

StrikerTwo commented Jan 10, 2020

laduke commented Jan 10, 2020

rexxfan commented Jan 22, 2020

janjaapbos commented Jan 22, 2020

darkain commented Aug 14, 2021

danmanners commented Aug 16, 2021

glimberg commented Aug 16, 2021 • edited Loading

danmanners commented Aug 17, 2021

laduke commented Nov 3, 2021

darkain commented Nov 3, 2021

laduke commented Nov 4, 2021 • edited Loading

joseph-henry commented Nov 11, 2021

joseph-henry commented Nov 11, 2021

darkain commented Nov 12, 2021

laduke commented Jan 4, 2022

michmoor0725 commented Jan 12, 2022

darkain commented Jan 13, 2022

mgiammarco commented May 2, 2022 • edited Loading

mgiammarco commented May 8, 2022

laduke commented May 9, 2022 • edited Loading

mgiammarco commented May 12, 2022

mgiammarco commented May 12, 2022

darkain commented May 13, 2022

mgiammarco commented May 17, 2022

mgiammarco commented May 30, 2022

vadonka commented Aug 14, 2022

vadonka commented Aug 14, 2022

laduke commented Aug 18, 2022 • edited Loading

vadonka commented Aug 18, 2022

darkain commented Aug 19, 2022

masx200 commented Mar 16, 2024

glimberg commented Dec 22, 2018 •

edited

Loading

cferrey commented Feb 5, 2019 •

edited

Loading

cferrey commented Feb 5, 2019 •

edited

Loading

glimberg commented Aug 16, 2021 •

edited

Loading

laduke commented Nov 4, 2021 •

edited

Loading

mgiammarco commented May 2, 2022 •

edited

Loading

laduke commented May 9, 2022 •

edited

Loading

laduke commented Aug 18, 2022 •

edited

Loading