Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinuxEthernetTap improperly changing MAC while tap/tun device already up #1314

Closed
sehari24jam opened this issue Dec 5, 2020 · 20 comments · Fixed by #1389
Closed

LinuxEthernetTap improperly changing MAC while tap/tun device already up #1314

sehari24jam opened this issue Dec 5, 2020 · 20 comments · Fixed by #1389

Comments

@sehari24jam
Copy link

sehari24jam commented Dec 5, 2020

ifr.ifr_flags |= IFF_UP;
if (ioctl(sock,SIOCSIFFLAGS,(void *)&ifr) < 0) {
::close(sock);
printf("WARNING: ioctl() failed setting up Linux tap device (bring interface up)\n");
return;
}

I have noticed that in debian Wheezy (tested using qemu with debian-live-7.8.0-amd64-standard.iso), this code will fail setting tun/tap device MAC, and subsequently unable to join zerotier network properly.

To my understanding (tried using iproute2 tools in debian wheezy) code-flow should be:

  1. create tun/tap device (ip tuntap add dev tapx mode tap) -> L205-L209
  2. set MAC (ip link set dev tapx address 1:2:3:4:5:6) -> L224-L229
  3. set MTU (ip link set dev tapx mtu 1234 ) -> L231-L236
  4. change device state to up (ip link set tapx up) -> L210-L215

Current improper code flow is 1 -> 4 -> 2 -> 3, so to rectify L210-L215 shall be relocated to line L237, after setting MTU.

IMHO changing MAC address shall not be done while device in up state.

I did tried modify code to correct its flow (tried in qemu wheezy: git clone, edit code, install gcc-4.9, make all), and the result is ok, vm can join zerotier network.

@jamesfmilne
Copy link

Just chiming in to report the same problem on CentOS/RHEL 6.

Reverting back to ZeroTier 1.4.6 fixes it.

@Nill-R
Copy link

Nill-R commented Mar 1, 2021

Same problem on Ubuntu 16.04 LTS
Reverting to ZeroTier 1.4.6 fixes it.

@myfingerhurt
Copy link

myfingerhurt commented Apr 7, 2021

Same problem with router R7000 rt-ac68u
Linux 2.6.36.4brcmarm
zerotier 1.6.3

Reverting to ZeroTier 1.4.6

opkg remove zerotier
opkg install http://bin.entware.net/armv7sf-k2.6/archive/zerotier_1.4.6-5_armv7-2.6.ipk

@jonathonf
Copy link
Contributor

jonathonf commented Apr 21, 2021

Current improper code flow is 1 -> 4 -> 2 -> 3, so to rectify L210-L215 shall be relocated to line L237, after setting MTU.

I've been having issues on an OpenVZ VPS with anything newer than 1.4.6. For me, this alters the issue symptom from giving "destination unreachable" to 100% packet loss for an outgoing ping to another ZT host (tcpdump on the ZT interface shows activity, but the OpenVZ node just doesn't seem to respond to anything, e.g. will ignore ARP requests for its own IP address).

@jonathonf
Copy link
Contributor

I've now done a proper git bisect and the problematic commit (for me) appears to be d735a1d.

I've opened a new PR to set the MAC address prior to bringing up the TAP device. It seems more reliable to set MTU after bringing up the device.

@luckydevil13
Copy link

please update source code, many users affected

@jonathonf
Copy link
Contributor

jonathonf commented Nov 15, 2021

This may have been re-opened by 357e1ac in 1.8.2 which reverts 9374e45

@glimberg
Copy link
Contributor

@jonathonf unfortunately, the PR breaks other Linux distros intermittently.

@jonathonf
Copy link
Contributor

jonathonf commented Nov 15, 2021

Which distros? Which kernels? How/why is it intermittent?

Without the PR ZeroTier is reliably broken on certain systems, which seems better than intermittently broken on others?

@glimberg
Copy link
Contributor

Debian and red hat based distros. Basically anything not running openvz

@jonathonf
Copy link
Contributor

jonathonf commented Nov 15, 2021

As I said in the PR, ZT with the patch (and therefore 1.8.1) works on all of the distros I tested, which includes Debian, Ubuntu, and Arch (on a variety of metal and virtualised systems). Can this be narrowed down to a more specific set of affected systems, or linked to an Issue?

Edit: This only reported issue I can see with 1.8.1 is #1486 and that was "solved" by leaving and rejoining the network.

@laduke laduke reopened this Nov 15, 2021
@laduke
Copy link
Contributor

laduke commented Nov 15, 2021

hey thanks for reminding about this.
I was seeing the wrong mac address about 1/20 times on 1.8.1
by leaving and rejoining a network, or restarting the service. Bummer.

The 1.8.2 code sets the MAC before and after

if (ioctl(sock,SIOCGIFFLAGS,(void *)&ifr) < 0) {

It may have taken a couple commits to get there. Will look more later.

@jonathonf
Copy link
Contributor

jonathonf commented Nov 15, 2021

A git bisect brings it directly back to 357e1ac as causing an issue on the OpenVZ system. Setting MAC after bringing up the interface results in a broken network adapter.

What system/config/etc. are you seeing the incorrect MAC issue on so I can try to replicate here?

Testing on one of my Debian systems (metal, Bullseye, 4.19.0-18-amd64) with 1.8.1:

# for i in $(seq 1 30); do systemctl restart zerotier-one; sleep 5; ip a show dev zt0 | grep ether; sleep 5; done
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
    link/ether a2:95:d9:a9:45:b7 brd ff:ff:ff:ff:ff:ff
# 

@laduke
Copy link
Contributor

laduke commented Nov 15, 2021

For me, a fresh debian 11 amd64 vm.

@jonathonf
Copy link
Contributor

I can replicate on a Bullseye system

$ uname -a
Linux lemmy 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
$ cat /etc/*release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

using the ZeroTier 1.8.1 binary:

#  for i in $(seq 1 30); do systemctl restart zerotier-one; sleep 5; ip a show dev ztppi3qmf6 | grep ether; sleep 5; done
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether 4e:55:20:80:af:b9 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff                                                                                                           
    link/ether 4e:55:20:80:af:b9 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
...

However, according to the apt sources list, the published Bullseye packages seem to be built against Buster libraries:

elif [ "$dvers" = "10" -o "$dvers" = "11" -o "$dvers" = "sid" -o "$dvers" = "buster" -o "$dvers" = "bullseye" -o "$dvers" = "parrot" ]; then
	echo '*** Found Debian "buster", or "sid" (or similar), creating /etc/apt/sources.list.d/zerotier.list'
	echo "deb ${ZT_BASE_URL_HTTP}debian/buster buster main" >/tmp/zt-sources-list

After compiling 01bf3b8 locally against Bullseye libraries,

$ git clone https://github.com/zerotier/ZeroTierOne.git
$ cd ZeroTierOne
$ git checkout 01bf3b8245e50eac572937833e4ad2f3d7209dcc
$ make -j$(nproc)
# systemctl stop zerotier-one && killall zerotier-one
# ./zerotier-one

the problem seems to be gone:

# for i in $(seq 1 30); do systemctl restart zerotier-one; sleep 5; ip a show dev ztppi3qmf6 | grep ether; sleep 5; done
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff

I note that there is a Bullseye package available under http://download.zerotier.com/debian/bullseye/pool/main/z/zerotier-one/, and yet after installing http://download.zerotier.com/debian/bullseye/pool/main/z/zerotier-one/zerotier-one_1.8.1_amd64.deb the issue also occurs:

# for i in $(seq 1 30); do systemctl restart zerotier-one; sleep 5; ip a show dev ztppi3qmf6 | grep ether; sleep 5; done
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether 4e:55:20:80:af:b9 brd ff:ff:ff:ff:ff:ff
    link/ether 4e:55:20:80:af:b9 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
    link/ether a2:cd:1d:13:2d:37 brd ff:ff:ff:ff:ff:ff
...

This kind of implies the issue lies somewhere in your package build infrastructure?

It might be worth checking that packages are being compiled against the correct library versions for the distro on systems where you're seeing this happen?

@adamierymenko
Copy link
Contributor

We will need to update install.zerotier.com and have people having this issue reinstall to use the bullseye repo. Older versions used buster builds on later Debian versions and there appears to have been a breaking change in libc or some other part of the library internals.

@jonathonf
Copy link
Contributor

jonathonf commented Nov 15, 2021

Actually, after building a 1.8.1 deb in a Bullseye pbuilder environment (using pdebuild),

$ git clone https://github.com/zerotier/ZeroTierOne.git
$ cd ZeroTierOne
$ git checkout 01bf3b8245e50eac572937833e4ad2f3d7209dcc
$ dh_make --createorig -p zerotier-one_1.8.1
$ dch
    (add package revision, e.g. set 1.8.1-0)
$ DIST=bullseye pdebuild

the "incorrect MAC" issue occurs. I've tried with a range of flags in debian/rules (including none), all builds show an issue with an intermittently incorrect MAC. Either there's some over-optimisation happening (which is reordering code) or debhelper is changing something in the build compared to compiling it directly on the system. Or, if this only occurs in Bullseye, then there's a bug in Bullseye... ?

Edit: I'm not convinced that the "incorrect MAC" issue is not a bug somewhere in Bullseye. I switched to the backport 5.14 kernel and used the above pbuilder-compiled 1.8.1 deb - no incorrect MAC issue. 1.8.1 from the ZT repo - incorrect MAC.

@laduke
Copy link
Contributor

laduke commented Nov 18, 2021

hey @jonathonf, just to double check openvz is broken on 1.8.2 and .3?

@jonathonf
Copy link
Contributor

just to double check openvz is broken on 1.8.2 and .3?

Confirm, broken on what I think is OpenVZ 6:

Linux hostname 2.6.32-042stab145.3 #1 SMP Thu Jun 11 14:05:04 MSK 2020 x86_64 GNU/Linux

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Although, maybe this means it's just time to retire these VPS... they also can't be updated to newer Debian because of the outdated host nodes - annnndddd I just noticed OpenVZ 6 was EoL in November 2019, so why the provider is still running this is beyond me.

@jonathonf
Copy link
Contributor

Confirming, 1.8.3 works fine on OpenVZ 7.

I suspect this can be closed, supporting an EoL operating system isn't reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants